Open access peer-reviewed chapter

# Model Testing Based on Regression Spline

Written By

Na Li

Submitted: 03 May 2017 Reviewed: 05 February 2018 Published: 06 June 2018

DOI: 10.5772/intechopen.74858

From the Edited Volume

## Topics in Splines and Applications

Edited by Young Kinh-Nhue Truong and Muhammad Sarfraz

Chapter metrics overview

View Full Metrics

## Abstract

Tests based on regression spline are developed in this chapter for testing nonparametric functions in nonparametric, partial linear and varying-coefficient models, respectively. These models are more flexible than linear regression model. However, one important problem is if it is really necessary to use such complex models which contain nonparametric functions. For this purpose, p-values for testing the linearity and constancy of the nonparametric functions are established based on regression spline and fiducial method. In the application of spline-based method, the determination of knots is difficult but plays an important role in inferring regression curve. In order to infer the nonparametric regression at different smoothing levels (scales) and locations, multi-scale smoothing methods based on regression spline are developed to test the structures of the regression curve and compare multiple regression curves. It could sidestep the determination of knots; meanwhile, it could give a more reliable result in using the spline-based method.

### Keywords

• fiducial method
• multi-scale smoothing method
• nonparametric regression model
• partial linear regression model
• regression spline
• varying-coefficient regression model

## 1. Introduction

It is well known that the model which contains nonparametric functions, such as partial linear model and varying-coefficient model, plays an important role in applications due to its flexible structure. However, in practice, investigators often want to know whether it is really necessary to fit the data with such more complex models rather than a simpler model. This amounts to testing the linearity of nonparametric functions in a regression model. In this chapter, we first consider the following three frequently used regression models.

Nonparametric regression model:

y=fx+ε.E1

Partial linear regression model:

y=Zb+fx+ε.E2

Varying-coefficient model:

y=z1f1x1++zpfpxp+ε.E3

In models (1)(3), y is the response variable, Z=z1zp is a p-dimensional regressor, x and x1,,xp are covariant taking values in a finite interval, ε is the error, b is a parameter vector, and f(x) and fjxj,j=1,2,,p are unknown smooth functions. Usually we suppose that zx and ε are independent and ε˜F/σ, where F is a known cumulate distribution function (cdf) with mean 0 and variance 1; σ is unknown. Without loss of generality, we can suppose that x and x1,,xp take values in [0, 1]. We try to test the linearity of fx in models (1) and (2) and the constancy of fjxj in model (3) for some j12p.

The hypothesis testing in nonparametric regression model was considered in many papers. Härdle and Mammen [1] developed the visible difference between a parametric and a nonparametric curve estimates. Based on smoothing techniques, many tests were constructed for testing the linearity in regression model; see Hart [2], Cox et al. [3], and Cox and Koh [4] for a review. Recently, Fan et al. [5] studied a generalized likelihood ratio statistic, which behaves well in large sample case. Tests based on penalized criterion were developed by Eubank and Hart [6] and Baraud [7].

The linearity of partial linear regression model (2) was studied by Bianco and Boente [8], Liang et al. [9], and Fan and Huang [10]. There are also many other papers concerning such testing problems (see [11, 12, 13, 14, 15, 16], among others). The constancy of the functional coefficient fjxj in varying-coefficient model (3) was studied in Fan and Zhang [17], Cai et al. [18], Fan and Huang [19], You and Zhou [20], and Tang and Cheng [21]. Local polynomials and smoothing spline methods to estimate the coefficients in model (3) can be seen in Hoover et al. [22], Wu et al. [23], and so on.

The critical values of most of the previous tests were obtained by Wilks theorem or bootstrap method. So such tests only behave well in the case of relatively large sample size. This chapter would give some testing procedures based on regression spline and the fiducial method [24] in Section 2. It has a good performance even when the sample size is small.

In using the regression spline, the key problem is the determination of knots used in spline interpolation. As we know that, for smoothing methods such as kernel-based method and smoothing spline, the smoothness is controlled by smoothing parameters. For the well-known kernel estimate, the bandwidth that is extremely big or small might leads to over-smoothing or under-smoothing, respectively. In order to avoid the selection of an optimal smoothing parameter, multi-scale smoothing method was introduced by Chaudhuri and Marron [25, 26] based on kernel estimation for exploring structures in data. This multi-scale method is known as significant zero crossings of derivatives (SiZer) methodology. The basic idea of SiZer is to infer a nonparametric model by using a wide range of smoothing parameter (bandwidth) values rather than only using one “optimal” value in some sense.

There have been many versions of SiZer for various applications, such as the local likelihood version of SiZer in Li and Marron [27], the robust version of SiZer in Hannig and Lee [28], and the quantile version of SiZer in Park et al. [29]. In addition, Marron and deUñaÁlvarez [30] applied SiZer to estimate length biased, censored density, and hazard functions; Kim and Marron [31] utilized SiZer for jump detection and Park and Kang [32] applied SiZer to compare regression curves. The smoothing spline version of SiZer was proposed by Marron and Zhang [33]. It used the tuning parameter (penalty parameter) that controls the size of penalty as the smoothing parameter is.

Comparing with bandwidth for kernel-based method and tuning parameter for smoothing spline, it is more difficult to determine the number of knots and their positions. For this reason a multi-scale smoothing method based on regression spline is proposed in Section 3 to test the structures of nonparametric regression model. The proposed multi-scale method does not involve the determination of the “best” number of knots and can be extended easily to a more general case.

## 2. Tests for nonparametric function based on regression spline

In this section, the linearity of function fx in model (1) is tested based on regression spline and fiducial method. Then, the proposed test procedure for model (1) is extended to test the linearity of model (2) and the constancy of function coefficient in model (3), respectively.

### 2.1. Test the linearity of nonparametric regression model

Without loss of generality, we suppose that x in model (1) takes values in [0, 1] and the set of knots is T = {0 =t1<t2,,<tm=1}. In order to estimate model (1), nonparametric function f(x) is fitted by kth order splines with knots T. This means that

fxj=1m+k1βjgjx,E4

where βj is coefficient and gjx, j=1,2,,m+k1, is basis function for order k splines, over the knots t1,t2,,tm.

With n-independent observations Y=y1y2ynn, the basis matrix Gn×m+k1 is defined by G=gjxi,xiis the designed point,i=1,2,,n;j=1,2,,m+k1. Hence, model (1) can be approximated as Y+ε. The least squares estimator of coefficients is

β̂=GTG1GTY,E5

and the estimator of fxi can be expressed as

Ŷ=f̂x1f̂x2f̂xnT=GGTG1GTY.E6

For testing the linearity of model (1), linear spline is used to approximate fx. It means that basis function gjx is a linear function:

g1x=xt2t2t1l2t,
gk1x=xtk2tk1tk2lk1txtktktk1lkt,3km,E7
gmx=xtm1tmtm1lmt.

In this case, the approximated function in (4) is a linear interpolation with k =1. The true value is βj=ftj,j=1,2,,m. The linearity of function fx can be written as

H0:β2β1t2t1=β3β2t3t2==βmβm1tmtm1.

Null hypothesis H0 can be expressed in matrix as Lβ=0,

where

L=h2h1h2h100000000hm1hm1hm2hm2,

where hj=tj+1tj,j=1,2,,m2. Null hypothesis H0 is equivalent to the following one:

H0:Lβ=0.E8

The p-value for testing hypothesis H0 will be derived by the fiducial method in the following context. Assume that matrix G has full rank, and let ε˜σN01. In model Y=+ε, the sufficient statistic of βσ2 is β̂S2, where β̂ is defined in (5) and

S2=YIPGY,PG=GGG1G.

By Dawid and Stone [34], the sufficient statistic can be represented as a functional model:

β̂=β+σGG12E1,S=σE2,E=(E1,E2)Q,E9

where Q is the probability measure of E=(E1, E2) and E1N0Im and, independently, E22χ2nm. From linear regression model, the fiducial model of β can be obtained:

β̂=β+SE2GG12E1,E=(E1,E2)Q.E10

Given β̂S2, the distribution of the right side in fiducial model is the fiducial distribution of β. That is, the fiducial distribution of β is the conditional distribution of REβ̂S2 when β̂S2 is given, where

REβ̂S2=β̂SE2GG12E1.E11

For testing hypothesis H0, the p-value is defined as

pβ̂S2=QLREβ̂S2EQREβ̂S2Σ2LEQREβ̂S2Σ2,E12

where Q(·) and EQ express the probability for an event and the expectation of a random variable under Q, respectively, and Σ is the conditional covariance matrix of LEQREβ̂S2 given β̂,S2 and vΣ2=vΣ1v for a vector v.

According to the definition of generalized pivotal quantity in [35], REβ̂S2 is a generalized pivotal quantity and also a fiducial pivotal quantity about β. Naturally, LREβ̂S2 is the fiducial pivotal quantity about Lβ. With the definition of Q in Eq. (10), we have that

pβ̂S2=1Fm2,nmnmβ̂LLGG1L1Lβ̂m2S2,E13

where Fm2,nm is the cdf of F-distribution with degrees of freedom m2andnm.

Under model (1) and the hypothesis that fx is a linear function, null hypothesis H0 given in (8) is true. Suppose that the error is normally distributed, then the p-value given in Eq. (12) distributes as uniform distribution on interval (0, 1). On the other hand, under some mild condition, the test procedure based on pβ̂S2 is consistent. Which means that pβ̂S2 tends to be zero in probability 1 if H0 is false. The corresponding theoretical proof of the large sample properties and finite sample properties of pβ̂S2 is the same as the proof given in Li et al. [36].

In applications, we need to check some hypotheses as follows:

H01:fx=Cβ1=β2==βm,
H02:fx=Cxβ2β1t2t1=β3β2t3t2==βmβm1tmtm1,and,β1=0.

The p-values for testing H01 and H02 can be obtained by replacing L in (12) by L01 and L02, respectively, where L02=e1L,e1=1000 and

L01=h2h1000000hmhm1.E14

### 2.2. Test the linearity of partial linear model

To test the linearity of model (2), p-value can be established analogously. With n-independent observations Y=y1y2ynn, model (2) can be represented as.

yi=Zib+fxi+εi,i=1,2,,n,

where Zi=zi1zip,b=b1bp,xi,i=1,2,,narefixed designed points. With the approximation of fx given in (4), model (2) can be approximated by Y+ε, where X=Gn×p+m+1; = (zij);i=1,2,,n;j=1,2,,p;G is the same as above; and θ=bβ. Then p-value for testing the linearity of model (2) can be defined by replacing G in (12) by X, β by θ, and L by L03, respectively,L03=0m2×pL.

The large sample and finite sample properties of the testing procedure for model (2) are the same as the test procedure for model (1).

### 2.3. Test the constancy of functional coefficient in varying-coefficient model

For model (3), investigators often want to know whether the coefficients are really varying; this means to test the constancy of the coefficient functions, that is, testing hypothesis:

H31:fjx=Cjforj=1,2,,pand some constantCj,E15
H32:fj0x=Cj0for somej=j0and some constantCj0.E16

With the set of knots T = {0 =t1<t2,,<tm=1}, coefficient fjx can also be approximated by

fjx=k=1mβjsgjx,j=1,2,,p,

where the true value of βjs=fjtk. Basic functions gj,j=1,2,,m+1 were defined in (7). The varying-coefficient model (3) is approximately represented as

Y=+ε,E17

where X=F1Fp is n×mp matrix and Fj=zjifkxi,k=1,2,,m,i=1,2,,n,j=1,2,,p. β=β1βp is mp-dimensional parametric vector, βj=fjt1fptm.

It is worth noting that under null hypothesis H31 defined in (15), regression model (3) is equivalent to model (17). However, this equivalence does not hold under null hypothesis H32 defined in (16). Null hypotheses H31 and H32 can be expressed in matrix as the following two, respectively:

H31:L1β=0,E18
H32:L2β=0,E19

where L1 is pm1×mp matrix.

L1=L0100L01,L01=1100000011,
L2=0m1×mj0mL0m1×mpmj0m1×mp.

In the same way as the p-value in (13) is defined, p-value to test hypotheses H31 and H32 can be defined as below if the error ε distributes as normal distribution:

p31β̂S2=1Fpm1,nmpnmpβ̂L1L1XX1L11L1β̂pm1S2,E20
p32β̂S2=1Fm1,nmpnmpβ̂L2L2XX1L21L2β̂m1S2.E21

According to the above discussion, it can be seen that p31β̂S2 is uniformly distributed over (0, 1) under hypothesis H31. However, under null hypothesis H32, varying-coefficient model (2) is not linear. Hence, there is a difference between the distribution function of p32β̂S2 under H32 and uniform distribution. This difference has an accurate expression, which can be seen in Li et al. [37] (Theorem 3). On the other hand, p31β̂S2 and p32β̂S2 both tend to be zero in probability if null hypotheses are false when sample size tends to be infinity under some mild conditions. The corresponding proof was provided also in Li et al. [37].

## 3. Multi-scale method based on regression spline

For regression spline, the number of knots controls the smoothness of the estimator. The determination of knots is important and plays a large influence on the inference results. The GCV method is usually used to choose an optimal number of knots. While, but after the number of knots is given, the determination of the optimal positions of knots is difficult. Shi and Li [38] chose knots by placing an additional new knot to reduce the value of GCV, until it could not be reduced by placing any additional knots. Hence, once a knot was selected, it cannot be removed from the knot set. Mao and Zhao [39] determined the locations of knots conditioned on the number of knots m first and chose m later by GCV criterion. In fact, the locations of knots can be considered as parameters which can be estimated from data. This is the free-knot spline; see DiMatteo et al. [40] and Sonderegger and Hannig [41]. However, the estimation of the optimal locations is computationally intractable, and replicate knots might appear in the estimated knot vectors [42].

On the other hand, many statisticians think that the statistical inference based on one smoothing level is not reliable although it is the optimal one. Therefore, multi-scale method is developed to estimate and test nonparametric regression curves. Chaudhuri and Marron [25, 26] proposed a multi-scale method to explore the significant structures (local minima and maxima or global trend) in data, which is known as SiZer. Significant zero crossings of derivatives (SiZer) is a powerful visualization technique for exploratory data analysis. It applies a large range of smoothing parameter values to do statistical inference simultaneously and use a 2D colored map (SiZer map) to summarize all of the results inferred at different smoothing levels (scales) and locations.

In this section, a regression spline version of SiZer is proposed for exploring structures of curve and comparing multiple regression curves, respectively. The proposed SiZer employs the number of knots as smoothing parameter (scales). For the sake of simplicity, linear spline is employed first to construct SiZer, which is denoted as SiZerLS. In addition, another version of SiZer—SiZerSS—is introduced, which is proposed in Marron and Zhang [33]. In SiZerSS, smoothing spline is used to infer the monotonicity of fx, and the tuning parameter (penalty parameter) that controls the size of penalty is chosen to be as the smoothing parameter. Finally, SiZer-RS, a version of SiZer based on higher-order spline interpolation, is constructed to compare multiple regression curves at different scales and locations simultaneously.

In order to understand SiZerLS clearly, we first present an example in which SiZerLS are simulated. This example is modified from Hannig and Lee [28] with the same regression function:

fx=5+4.21+x0.30.034+5.11+x0.70.014.

The observations generated from model (1) with 200 equally spaced design points from (0, 1) and σN0,0.5 are plotted in Figure 1. Estimator f̂mx denotes the linear spline smoother obtained from (6) using m equally spaced knots chosen from (0, 1). The curves of f̂mx with different values of m are plotted in Figure 1 too. The simulated SiZerLS map and SiZerSS map are shown in Figure 2, respectively.

In Figure 2, BYP SiZerLS is SiZerLS map based on multiple testing procedures, BYP, where BYP denotes the multiple testing procedure proposed in Benjamini and Yekutieli [43]. SiZerSS is the smoothing spline version of SiZer. The two SiZers are simulated under the same range of scales and nominal level 0.05. There are four colors in SiZer maps: red indicates that the estimated regression curve is significantly decreasing; blue indicates that the estimated regression curve is significantly increasing; purple indicates that the curve is neither significantly increasing nor decreasing; gray shows that there are no sufficient data for conducting reasonable statistical inference. Figure 1 preliminarily shows that SiZer maps can locate peaks well. The theoretical foundation of SiZerLS and SiZerSS will be discussed in more detail at a later stage.

### 3.1. Construction of SiZerLS map for exploring features of regression curve

The proposed SiZerLS map will be constructed on the basis of the p-values with multiple testing adjustment. The p-value for testing the monotonicity of the smoothed curve is defined first based on linear spline approximation and fiducial method in the same way as p-values in Section 2. Consequently, multiple testing adjustment is discussed detailedly to control the row-wise false discovery rate (FDR) of SiZerLS.

In the view of SiZer, all of the useful information is included in the smoothed curve, which is defined below. Suppose we have observations xiyii=1n from regression model (1). By linear spline estimation, estimator f̂mx can be obtained:

f̂mx=gxGTG1GTY,E22

where gx=g1xg2xgmx; gjx,j=1,,m are the basis functions defined in (7) on the basis of m knots; and G is the matrix defined in Section 2. The smoothed curve at smoothing level m is denoted as.

fmx=Ef̂mx=gxGTG1GTf,

where f=fx1fx2fxn. SiZer focuses on fmx. Its monotonicity is determined totally by GTG1GTf. Hence, it is enough to test the following m1 pairs of null hypotheses:

HIk=fmtk=ekGG1Gfek+1GG1Gf=fmtk+1(and)
HDk=fmtk=ekGG1Gfek+1GG1Gf=fmtk+1,k=1,2,,m1,E23

where ek is an m-dimensional column vector having 1 in the kth entry and zero elsewhere. Let b denote GG1Gf. Then, HIk and HDk can be written as

HIk=Lkb0,k=1,2,,m1;HDk=Lkb0,k=1,2,,m1,E24

where Lk(ekek+1). The p-values to test hypotheses in (24) under linear model Y=Gb+ε can be defined using pivotal quantity about b. This pivotal quantity is REβ̂S2, which is defined in (11). The p-value for testing HIk is the fiducial probability that null hypothesis holds:

PIkβ̂S=PLkREβ̂S0=PLkβ̂SE2GG12E10
=PnmLkGG1GE1LkGG1Lk12E2nmβ̂SLkGG1Lk12,E25

where the subscript Ik of PIk represents the interval (tk,tk+1) in which we test monotonicity and m represents the number of knots used in linear interpolation. In addition, p-value PDkβ̂S for testing HDk satisfies equation PIkβ̂S+PDkβ̂S=1.

It is worth noting that p-value PIkβ̂S is uniformly distributed on (0,1) if all of the hypotheses HIk,HDk,k=1,2,,m1 are true (regression function is a constant). In applications, p-value PIkβ̂S for testing HIk can be approximated as below when n. This approximation is reasonable (see Theorem 1 in [44]):

PIk,mβ̂S1ΦnmLkβ̂SLkGG1Lk1/2.E26

The proposed SiZerLS map will be constructed on the basis of the above p-values with multiple testing adjustment. In fact, SiZer is a visual method for exploratory data analysis, and it focuses on exploring features that really exist in data instead of testing whether some assumed features are statistically significant in a strict way. FDR is the expected proportion of the false positives among all discoveries, and FDR can be either permissive or conservative according to the number of hypotheses. Considering that different numbers of hypotheses need to be tested for SiZerLS with respect to various smoothing parameters, the multiple testing adjustment to control FDR would be better if used to improve the exploratory property of SiZer. Hence, the well-known multiple testing procedure which was proposed in Benjamini and Yekutieli [43] (denoted as BYP) is applied to control the row-wise FDR of SiZerLS. The BYP was proved to control FDR under α for any dependent test statistics.

#### 3.1.1. Benjamin-Yekutieli procedure to control FDR (BYP)

Suppose that we have obtained p-values PIk,mβ̂S for testing hypotheses HIk in (23), k=1,2,,m1:

1. Order p-values PIk,m and get the ordered p-values PI1,m,PI2,m,,PIm1,m.

2. For a given p-value α, find the largest i for k=1,2,,m1 for which PIi,mm1j=1m11j and reject all HIk,m for k=1,2,,m1.

The detailed steps to construct SiZerLS with BYP adjustment are given below:

Step 1. Construct 2D grid map. Without loss of generality, we assume that designed points xi,i=1,2,,n are chosen from [0, 1]. Then the 2D map is a rectangular area [0, 1; log101/mmax,log101/mmin]; see BYP SiZerLS displayed in Figure 2. The value of m is determined by the following rule: m=round1/10l, where function round (∙) is the nearest integer function and l takes equally spaced values from interval log101/mminlog101/mmax. For a given m, abscissa x takes values at the corresponding knots Tm=t1t2tm. On the basis of different values of m and Tm, the 2D map is divided into many pixels.

Step 2. Calculate p-values for each pixel. Each pixel in the 2D map constructed in step 1 is determined by two adjacent knots and a determined m. For pixel tktk+1m=m0, we calculate p-value PIk,m0 and PDk,m0 for testing hypotheses HIk,m0 and HDk,m0, respectively, with m0 knots.

Step 3. Multiple testing adjustment. For a given value m=m0, carry out multiple testing procedure BYP using p-values PIk,m0 (PDk,m0), k=1,2,,m0, obtained from step 2 to test the fowling family of hypotheses simultaneously:

HI1,m0HI2,m0HIm01,m0HD1,m0HD2,m0HDm01,m0.

Step 4. Color pixels. According to the multiple testing results at smoothing level m0 if HIk is rejected and HDk is accepted, pixel tktk+1m=m0 is colored red to indicate significant decreasing. On the contrary, if HIk,,m0 is accepted and HDk,m0 rejected, pixel tktk+1m=m0 is colored blue to show significant increasing; purple is used for no significant trend in other cases.

In SiZer map, gray indicates that no sufficient data can be used to test the monotonicity of regression function at point x with m knots. Such sufficiency is quantified as effective sample size (ESS). Noting that the number of nonzero elements in the kth column of G has a demonstrable effect on the inference in interval tktk+1, and it is determined directly by how many observations are included in tktk+1, we define ESStkm as.

ESSt1mESSt2mESStmmGG111.

In SiZerLS map, pixel tktk+1m=m0 would be colored gray if.

minESStkm0ESStk+1m0<5.

In order to avoid selecting knots, m equally spaced knots or equal x-quantiles are used in interpolation. The smoothing level of regression spline estimate is controlled by m together with the positions of knots. The level of smoothness should be reduced to detect some local fine feature; however, the total number of knots should be limited to avoid excessive under-smoothing in a wide range. In applications of SiZerLS, the range of scales is recommended to include the coarsest smoothing level, m=2, and the finest smoothing level, avgxTmmaxESSxmmax < 5.

### 3.2. Construction of SiZerSS map for exploring features of regression curve

SiZerSS given in Marron and Zhang [33] employed smoothing spline to construct SiZer map for nonparametric model (1). Given xiyii=1n and a smoothing parameter λ, the smoothing spline estimator is the function f̂λ that minimizes the regularization criterion over function f:

i=1nωiyifxi2+λfx2dx.E27

By simple calculation, we can get the estimator vector:

f̂λ=f̂λx1f̂λx2f̂λxn=W+λK1WY=AλY,E28

where weight matrix W=diagω1ω2ωn and the hat matrix Aλ=W+λK1W.

In order to construct SiZerSS, the derivative of f at any point x needs to be estimated along with its variance. Let si=xi+1xi and n×n1 matrix Q=qij,i=1,2,,n,j=2,,n1, where qj1,j=sj11,qjj=sj11sj1,qj+1,j=sj1, and qi,j=0 for ij2. Let γ1γ2γn=fx1fx2fxn. By the definition of natural cubic spline, fx1=fxn=0. Let γ=γ2γn1. According to Theorem 2.1 of Green and Silverman [45], the vectors f and γ specify a natural cubic spline f if and only if Qf=,

where R is a (n2)×n2 symmetric matrix with elements rij,i=2,,n1,j=2,,n1, which is given by rii=13si1+si,ri,i+1=ri+1,i=16si and rij=0 for ij2. The estimator γ̂ can be obtained from equation R+λQQγ=QY. Then estimator f̂x and f̂x can be written as a linear combination of f̂ and γ̂. Let hix=xxi,i=1,2,,n. When x<x1.

f̂λx=f̂λx1+h1xf̂λx2f̂λx1s1s16γ̂2,f̂x=f̂λx2f̂λx1s1s16γ̂2.

When xixxi+1, let δix=1+hixsiγ̂i+1+1hi+1xhiγ̂i for i=1,2,,n,

f̂λx=hixf̂λxi+1hi+1xf̂λxisi+hixhi+1xδix6,
f̂λx=f̂λxi+1f̂λxisi+hixhi+1xγ̂i+1γ̂i6si+hix+hi+1x6δix.

(When) x>xn

f̂λx=f̂λxn+hnx6f̂λxnf̂λxn1sn1+sn1γ̂n1,
f̂λx=16f̂λxnf̂λxn1sn1+sn1γ̂n1.

The variance of f̂λx can be calculated easily if the estimator of σ2, the variance of the error in model (1), is obtained. σ2 can be estimated by the sum of squared residuals yif̂λxi2. If σ2 is a function of x, σ2x can be estimated by yif̂λx2. The confidence interval of fλx are of the form:

f̂λx±q.SD̂f̂λx,E29

where q is based on the nominal level. For details, see Section 3 of Chaudhuri and Marron [25].

SiZerSS can be constructed as SiZerLS. For different values of x, if interval (29) contains zero, pixel xλ is colored purple; if confidence interval is on the right side of zero, blue is used to indicate increasing; otherwise, red is used to imply decreasing. Gray is used to indicate that there is no sufficient data to do reliable inference. The sufficiency can be found in Chaudhuri and Marron [25].

The simulated SiZerLS and SiZerSS maps are displayed in Figure 2, where the red and blue regions locate the bumps of regression curve accurately. This simulation illustrates the good behavior of SiZerLS and SiZerSS in exploring features in data.

### 3.3. Construction of SiZer-RS map for comparing multiple regression curves

The comparison of two or more populations is a common problem and is of great practical interest in statistics. In this subsection, comparison of multiple regression curves in a general regression setting is developed based on regression spline. Suppose we have n=i=1kni independent observations from the following k regression models:

yij=fixij+σixijεij,i=1,2,,k,j=1,2,,ni,E30

where xij s are covariates, the errors εijN01 s are independent and identically distributed errors, fi· is the regression function, and σi2· is the conditional variance function of the ith population. We are concerned about whether the k populations in model (30) are equal; if not, what is the difference? To this end, a multi-scale method, SiZer-RS, based on regression spline is proposed to compare fi· across multiple scales and locations.

As described in Park and Kang [32], the choice of smoothing parameter is also important for comparing regression curves. They developed SiZer for the comparison of regression curves based on local linear smoother. SiZer map for comparing regression curves is a 2D color map, which consists of a large number of pixels. Each pixel is indexed by a scale (smoothing parameter) and a location; the color of a pixel indicates the result for testing the equality of two or more multiple regression curves at the corresponding location and scale. SiZer provides us with more information about the locations of the differences among the regression curves if they do exist. Park et al. [46] developed an ANOVA-type test statistic and conducted it in scale space for testing the equality of more than two regression curves.

The works mentioned above are kernel-based method. Besides it, regression spline is an important smoothing device and is used widely in applications. For a given smoothing parameter m (the number of knots used in regression spline), the p-value for testing the equality of k regression curves at point x is established. Consequently, SiZer-RS is constructed in the same way as SiZerLS for comparing multiple retrogression curves based on higher-order spline interpolation.

For a given smoothing parameter m (the number of knots used in regression spline), the smoothed curve is defined as fi,mx=E(f̂i,m (x)), where f̂i,mx is the regression spline estimator. SiZer-RS for comparing multiple regression curves is based on the testing results for testing null hypothesis:

Hm,x:f1,mx=f2,mx==fk,mx,E31

at point x with smoothing parameter m. Without loss of generality, we still suppose that the explanatory variable x takes value from [0, 1]. On the basis of a knot set Tm=0=t1<t2<tm=1, we have the approximation:

fixs=1m+q1βi,sgm,sxNmxβim,E32

where βim=βi,1βi,2βi,m+p1. The estimator of fix at smoothing level m can be obtained f̂i,mx=Nmxβ̂im, in which, Nmx=gm,sxs=12m+q1. If q=3, Nmx is defined below:

Nmlx=tltl4tl4tl3tl2tl1tltx+3,l=2,3,,m+3,

where tl=tminmaxl1m for l=2,1,,m+3:

tx+3=tx3,t>x0,tx.

For a function g·,tl4tl3tl2tl1tlg· denotes the fourth-order divided difference of g·, that is:

t1t2g=gt,ift1=t2=tt1t2g=gt2gt1t2t1otherwise,t1t2tkg=gk1t,ift1==tkt1t2tkg=t2t3tkgt1t2tk1gtkt1,otherwise.

Then model (31) can be approximately written as the following linear regression model:

Yi=Gimβim+ΣiEi,E33

where

Yi=yi1yi2yini,Gim=Nmlxini×m+2,Σi=diagσixij,Ei=εi1εi2εini.

At first, we suppose Σi is known and then replace it by its available estimator.

From regression model (33), we can get the estimator β̂im=GimΣi1Gim1GimΣi1Yi. Let bim denote the expectation of β̂im:

bim=Eβ̂im=GimΣi1Gim1GimΣi1fi,

where fi=fixi1fixini. Therefore, the smoothed curve

fi,mx=Ef̂i,mx=ENmxGimΣi1Gim1GimΣi1Yi=Nmxbim.E34

Denote bm=b1mb2mbkm, and correspondingly, denote its estimator as β̂m=β1mβ2mβkm. Hypothesis Hm,x can be presented as

Hm,x:Lmxbm=0k1,E35

where

Lmx=NmxNmxNmxNmxNmx0000Nmx000000000Nmx

is a k1×km+q1 matrix.

The p-value for testing hypothesis Hm,x in (35) can be defined as

pm,xβ̂imΣ̂m=P{TmxLmxLmxΣ̂mLmx1LmxTmx
β̂imLmxLmxΣ̂mLmx1Lmxβ̂im},E36

where TmxGimΣ̂i,m1Gim1GimΣ̂i,m12Ei,i=1,2,,k;Σ̂i,m=diagσ̂ixijj=12ni is an estimator of the variance matrix of the ith regression model and

Σ̂m=diagGimΣ̂i,m1Gim1i=12k

is an estimator of the variance matrix of Tmx given β̂im,σ̂im2,i=1,2,,k. The estimator of σixij can be found in Li and Xu [36], where the smoothing parameter ,mp, can be used as a pilot smoothing parameter, which is different from m used in f̂i,mx. SiZer-RS map can be constructed based on different values of mp, which represents the different trade-offs between the structure of regression curve and errors.

The two SiZer maps given in Figure 4 are constructed using the data plotted in Figure 3 to compare three regression curves f1x=fxx=0,f3x=0.5sin2πx. Since the variance of errors is a constant, it can be estimated by the sum of squares of residues. In this case, pilot smoothing parameter is avoided [47, 48]. The two blue regions in Figure 4 clearly show their difference across interval (0, 1). The gray color indicates that there is no sufficient data that can be used to get credible testing results at x and nearby. The sufficiency is quantized as ESSxm for SiZer-RS, and pixel xm is colored gray if ESSxm<5:

ESSxmmini=1,2,,kNmxGimGim111.

Figure 4 shows that SiZer-RS map can explore the differences between regression curves accurately.

It is worth noting that, for SiZer-RS map, the coarsest smoothing level should be m=q+1 to ensure the effectiveness of the qth regression spline and the finest smoothing level is recommend to be the one such that avgxx1x2xgESSxm<5, where x1,x2,,xg are points at which hypothesis Hm,x is tested and pixels are produced by combing different values of m. In applications, a wide range of values of mp can be used to generate a family of SiZer-RS maps. Particularly, mp and m can both be used as smoothing parameters simultaneously to construct a 3D SiZer-RS map [47, 48].

## 4. Conclusion

This chapter introduces regression spline method for testing the parametric form of nonparametric regression function in nonparametric, partial linear, and varying-coefficient models, respectively. The corresponded p-values are established based on fiducial method and spline interpolation. The test procedures on the basis of the proposed p-value are accurate in some cases and are consistent under some mild conditions, which means that the p-value tends to be zero when null hypothesis is false as sample size and the number of knots used in spline interpolation tend to be infinity. Hence, the proposed test procedures are performed well especially in small sample size case.

The spline-based method frequently used smoothing method, which can be used easily with other statistical methods. When using the spline-based method, the smoothing level is controlled by the number of knots and their positions. In order to sidestep the determination of knots and obtain more reliable results, multi-scale smoothing methods are proposed based on spline regression to infer structures of regression function. The multi-scale method is a visual method to do inference at different locations and smoothing levels. In addition, the smoothing spline version of multi-scale method is also introduced. The proposed multi-scale method can also be used for comparing multiple regression curves. Some real data examples illustrate the practicability of the proposed multi-scale method.

The MATLAB code of SiZerLL and other versions of SiZer based on kernel smoother is available from the homepage of Professor Marron JS; the MATLAB code of SiZerLS can be downloaded from the following website:

## References

1. 1. Härdle W, Mammen E. Comparing nonparametric regression fits. Annals of Statistics. 1993;21:1926-1947
2. 2. Hart JD. Nonparametric Smoothing and Lack-of-Fit Test. New York: Springer; 1997
3. 3. Cox D, Koh E, Wahba G, Yandell BS. Testing the (parametric) null model hypothesis in (semiparametric) partial and generalized spline models. Annals of Statistics. 1988;16:113-119
4. 4. Cox D, Koh EA. Smoothing spline based test of model adequacy in polynomial regression. Annals of the Institute of Statistical Mathematics. 1989;41:383-400
5. 5. Fan J, Zhang C, Zhang J. Generalized likelihood ratio statistics and Wilks phenomenon. Annals of Statistics. 2001;29:153-193
6. 6. Eubank RL, Hart JD. Testing goodness-of-fit in regression via order selection criteria. Annals of Statistics. 1992;20:1412-1425
7. 7. Baraud Y. Model selection for regression on a fixed design. Probability Theory and Related Fields. 2000;117:467-493
8. 8. Bianco A, Boente G. Robust estimators in semi-parametric partly linear regression models. Journal of Statistical Planning and Inference. 2004;22:229-252
9. 9. Liang H, Wang S, Robin JM, Carroll RJ. Estimation in partially linear models with missing covariates. Journal of the American Mathematical Society. 2004;99:357-367
10. 10. Fan J, Huang L. Goodness-of-fit tests for parametric regression models. Journal of the American Mathematical Society. 2001;96:640-652
11. 11. Baraud Y, Huet S, Laurent B. Adaptive tests of linear hypotheses by model selection. Annals of Statistics. 2003;31:225-251
12. 12. Claeskens G. Restricted likelihood ratio lack-of-fit tests using mixed spline models. Journal of the Royal Statistical Society: Series B. 2004;66:909-926
13. 13. Härdel W, Sperlich S, Spokoiny V. Structural tests in additive regression. Journal of the American Mathematical Society. 2001;96:1333-1347
14. 14. Meyer MC. A test for linear versus convex regression function using shape-restricted regression. Biometrika. 2003;90:223-232
15. 15. Stute W. Nonparametric model checks for regression. Annals of Statistics. 1997;5:613-641
16. 16. Liang H. Checking linearity of non-parametric component in partially linear models with an application in systemic inflammatory response syndrome study. Statistical Methods in Medical Research. 2006;15:273-284
17. 17. Fan J, Zhang W. Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics. 2000;27:715-731
18. 18. Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888-902
19. 19. Fan J, Huang T. Profile likelihood inferences on semi-parametric varying-coefficient partially linear models. Bernoulli. 2005;11:1031-1057
20. 20. You J, Zhou Y. Empirical likelihood for semi-parametric varying-coefficient partially linear regression models. Statistics and Probability Letters. 2006;76:412-422
21. 21. Tang Q, Cheng L. M-estimation and B-spline approximation for varying coefficient models with longitudinal data. Journal of Nonparametric Statistics. 2008;20:611-625
22. 22. Hoover DR, Rice JA, Wu C, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809-822
23. 23. Wu C, Yu K, Chiang CT. A two-step smoothing method for varying coefficient models with repeated measurements. Annals of Statistics. 2000;52:519-543
24. 24. Xu X, Li G. Fiducial inference in the pivotal family of distributions. Science in China, Series A: Mathematics. 2006;49:410-432
25. 25. Chaudhuri P, Marron JS. SiZer for exploration of structures in curves. Journal of the American Statistical Association. 1999;94:807-823
26. 26. Chaudhuri P, Marron JS. Scale space view of curve estimation. Annals of Statistics. 2000;28:408-428
27. 27. Li R, Marron JS. Local likelihood SiZer map. The Indian Journal of Statistics. 2005;67:476-498
28. 28. Hannig J, Lee TCM. Robust SiZer for exploration of regression structures and outlier detection. Journal of Computational and Graphical Statistics. 2006;15:101-117
29. 29. Park C, Lee TCM, Hannig J. Multiscale exploratory analysis of regression quantiles using quantile SiZer. Journal of Computational and Graphical Statistics. 2010;3:497-513
30. 30. Marron JS, deŨnáAlvarez J. SiZer for length biased, censored density and hazard estimation. Journal of Statistical Planning and Inference. 2004;121:149-161
31. 31. Kim CS, Marron JS. SiZer for jump detection. Journal of Nonparametric Statistics. 2006;18:13-20
32. 32. Park C, Kang KH. SiZer analysis for the comparison of regression curves. Computational Statistics and Data Analysis. 2008;52:3954-3970
33. 33. Marron JS, Zhang JT. SiZer for smoothing spline. Computational Statistics. 2005;20:481-502
34. 34. Dawid AP, Stone M. The functional model basis of fiducial inference. Annals of Statistics. 1982;10:1054-1067
35. 35. Hannig J, Iyer H, Patterson P. Fiducial generalized confidence intervals. Journal of the American Mathematical Society. 2006;101:254-269
36. 36. Li N, Xu X, Jin P. Testing the linearity in partially linear models. Journal of Nonparametric Statistics. 2011;23:99-114
37. 37. Li N, Xu X, Liu X. Testing the constancy in varying-coefficient regression models. Metrika. 2011;74:409-438
38. 38. Shi P, Li G. Global convergence rates of B-spline M-estimators in nonparametric regression. Statistica Sinica. 1995;5:303-318
39. 39. Mao W, Zhao LH. Free-knot polynomial splines with confidence intervals. Journal of the Royal Statistical Society: Series B. 2003;65:901-909
40. 40. DiMatteo I, Genovese CR, Kass RE. Bayesian curve-fitting with free-knot splines. Biometrika. 2001;88:1055-1071
41. 41. Sonderegger DL, Hannig J. Fiducial Theory for Free-Knot Splines. Vol. 68. New York: Springer International Publishing; 2014. pp. 155-189
42. 42. Lindstrom MJ. Penalized estimation of free-knot splines. Journal of Computational and Graphical Statistics. 1999;8:333-352
43. 43. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics. 2001;29:1165-1188
44. 44. Li N, Xu X. Spline multiscale smoothing to control FDR for exploring features of regression curves. Journal of Computational and Graphical Statistics. 2016;25:325-343
45. 45. Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall; 1994
46. 46. Park C, Hannig J, Kang KH. Nonparametric comparison of multiple regression curves in scale-space. Journal of Computational and Graphical Statistics. 2014;23:657-677
47. 47. Hannig J, Marron JS. Advanced distribution theory for SiZer. Journal of the American Statistical Association. 2006;101:484-499
48. 48. Weerahandi S. Generalized confidence intervals. Journal of the American Statistical Association. 1993;88:899-905

Written By

Na Li

Submitted: 03 May 2017 Reviewed: 05 February 2018 Published: 06 June 2018