Open access peer-reviewed chapter

Computationally Efficient Kalman Filter Approaches for Fitting Smoothing Splines

Written By

Joel Parker, Yifan Zhang, Bonnie J. Lafleur and Xiaoxiao Sun

Reviewed: 22 July 2022 Published: 27 August 2022

DOI: 10.5772/intechopen.106713

From the Edited Volume

Kalman Filter - Engineering Applications

Edited by Yuri V. Kim

Chapter metrics overview

82 Chapter Downloads

View Full Metrics

Abstract

Smoothing spline models have shown to be effective in various fields (e.g., engineering and biomedical sciences) for understanding complex signals from noisy data. As nonparametric models, smoothing spline ANOVA (Analysis Of variance) models do not fix the structure of the regression function, leading to more flexible model estimates (e.g., linear or nonlinear estimates). The functional ANOVA decomposition of the regression function estimates offers interpretable results that describe the relationship between the outcome variable, and the main and interaction effects of different covariates/predictors. However, smoothing spline ANOVA (SS-ANOVA) models suffer from high computational costs, with a computational complexity of ON3 for N observations. Various numerical approaches can address this problem. In this chapter, we focus on the introduction to a state space representation of SS-ANOVA models. The estimation algorithms based on the Kalman filter are implemented within the SS-ANOVA framework using the state space representation, reducing the computational costs significantly.

Keywords

  • Kalman filter
  • smoothing spline
  • functional ANOVA
  • state space representation
  • Markov structure

1. Introduction

Smoothing spline ANOVA (SS-ANOVA) has been widely used in various applications [1, 2, 3]. The representer theorem enables an exact solution of regression function in SS-ANOVA models by minimizing a regularized function in a finite-dimensional space, even though the problem resides in a infinite-dimensional space. While SS-ANOVA models have strong theoretical properties, the estimation algorithms used to fit these models are computational intensive, with a computational complexity of ON3 for datasets with N observations. Numerous approaches have been developed to reduce the heavy computational costs of SS-ANOVA [4, 5, 6, 7]. For example, Kim and Gu (2004) proposed to select a qN basis functions from N ones and reduced the computational complexity to ONq2. Sun et al. (2021) synergistically combined asymptotic results with the smoothing parameter estimates based on randomly selected samples with sizes of NN to reduce the computational complexity of selecting smoothing parameters to ON3.

In this book chapter, we focus on the estimation approaches based on the Kalman filter. The Kalman filter was originally created to solve linear filtering and prediction problems used to generate simulations for the Apollo 11 project [8]. More recently, it has been implemented in a variety of engineering and biomedical fields [9, 10]. The Kalman filter is naturally used to fit state space models, methods that use recursive calculations on each observation entered one at a time and resulting in calculations on more accurate unknown variables after each iteration. The Kalman filter updates the state of the dynamic system given a new observation based on the state after the previous observation and the information gained from the new observation. This memory-less property and its simple recursive formulas make Kalman filter approaches computationally efficient, making them a useful tool for big data analytics. SS-ANOVA models can be reformulated to a state space representation, allowing computationally efficient Kalman filter-based model fitting and reducing computational costs to ON for estimating univariate smoothing spline models [11]. An extension to the bivariate setting also significantly reduces the computational costs of SS-ANOVA models [12, 13].

Section 2 of this chapter will provide the theoretical background of SS-ANOVA models. Section 3 provides a brief background on state space models. The state space representation of SS-ANOVA models can be found in Section 4, along with a simulation study under the univariate setting in Section 5. Section 6 concludes this chapter.

Advertisement

2. Smoothing spline ANOVA models

We assume the data yixi and i=1,2,N are independent and identically distributed where yiYR is the outcome/response variable and xiXRd represents the covariates/predictors. A nonparametric model can then be written by

yi=fxi+ei,E1

where f is a function of covariates and eiN0σ2 represents the random errors. For this nonparametric model, the structure of f is not fixed and can be estimated by minimizing a penalized least squares score,

1Ni=1Nyifxi2+λJf,E2

where the first term measures the goodness of fit of f, and the smoothing parameter λ controls the trade-off between the goodness of fit and the roughness of f measured by Jf [2, 3, 14]. SS-ANOVA models can also handle responses from exponential families and/or correlated responses. For readers who are interested in these topics, more examples of the model estimation and implementation exist (e.g., [2, 14]). The nonparametric estimation allows f to vary in a high-dimensional (possibly infinite) space leading to more flexible results that can balance the bias-variance trade-off [2, 15]. The functional analysis of variance (ANOVA) is applied to the regression function f to improve the interpretability of model estimates by decomposing the function into main and interaction effects of covariates. These main and interaction effects can be estimated in the corresponding subspaces of the reproducing kernel Hilbert space (RKHS), which is introduced in the next section.

2.1 ANOVA

2.1.1 Classical ANOVA

Classical ANOVA can be used to help to understand the decomposition of regression function in (1). We use a one-way classical ANOVA model as an example. The outcome yi can be modeled by yij=μi+eij, where μi is the mean treatment levels with i=1,2,,K1 and j=1,2,,K2. The terms in this model can be rewritten as

yij=μ+δi+eij,E3

where μ is the overall mean effect and δi is the treatment effect. Side conditions are added to ensure the uniqueness of this decomposition. Now consider the univariate nonparametric function in (1). The regression function can be written as

fx=Af+IAf=f0+f1E4

where A is an averaging operator that averages the effect of x and I is the identity operator. We also need to add some side conditions for this decomposition to ensure the uniqueness of the decomposition of the regression function.

2.1.2 Functional ANOVA

The multivariate function fx1x2xd on a d-dimensional product domain X=j=1dXjRd can be decomposed similarly to the classical ANOVA in the RKHS. The construction of the RKHS on Πj=1dXj is by taking the tensor product over the marginal domains Xj. We need the following theorem to construct the tensor-product space.

Theorem 1.1 If R1x1x1 is nonnegative definite on X1 and R2x2x2 is nonnegative definite on X2, then Rx1x2=R1x1x1R2x2x2 is nonnegative definite on X=X1×X2.

Theorem 1.1 implies that the RKHS H on Πj=1dXj has the reproducing kernel R=Πj=1dRj, where Rj is the reproducing kernel for Hγ on Xγ. Additionally, the Hilbert space Hj can be decomposed into Hj=Hj0Hj1, where Hj0 is the null space and Hj1 is the orthogonal complement to Hj0. Then H=j=1dHj can be decomposed as

H=j=1dHj0Hj1=SjSHj1jSHj0=SHSE5

where S denotes all of the subsets of 1d. The term HS has the reproducing kernel RSΠjSRj1.

In general, the inner product in H can be specified as

Jfg=j=1Bθj1fjgjjE6

where θj0 are tuning parameters and B is the number of smoothing parameters. The roughness penalty in (2) can be written in the form of (6). Then the reproducing kernel associated with (6) can be written as

R=j=1BθjRj,E7

where Rj is the reproducing kernel for the corresponding tensor product RKHS in (5).

2.2 An example of RKHS on 012

We will use one example of an RKHS on 012 to demonstrate the decomposition of RKHS. More examples of discrete and/or continuous domains can be found in Gu (2013) [2]. We consider the following tensor sum decomposition on 01,

H1=f:01f2x2dx<=f:f1f:fk1xf:01fdx=01f1dx=01f2dx=0f2201=H100H101H11,E8

where H101H11 forms the contrast in the one-way ANOVA decomposition, and the function kr is a scaled Bernoulli polynomial function with krx=Brx/r! [2, 16]. The RKHS has three reproducing kernels R100xx=1, R101xx=k1xk1x, and R11xx=k2xk2xk4xx.

Now consider the RKHS H on 01×01. Here H can be the tensor product spaces of H1 on 01 and H2 on 01. Based on the tensor sum decomposition in (8), we have

H=H1H2=H100H101H11H200H201H21=H100H200H100H201H101H200H101H201H100H21H11H200H101H21H11H201H11H21,E9

where the first four terms in (9) are in the null space H0 and the remaining five terms are in the orthogonal complement (i.e., H1). The reproducing kernel for the cubic spline (m=2) for each subspace can be found in Table 1.

SubspaceReproducing Kernel
1002001
101200k1x1k1x1
101201k1x1k1x1k1x2k1x2
11200k2x1k2x1k4x1x1
11201k2x1k2x1k4x1x1k1x2k1x2
1121k2x1k2x1k4x1x1k2x2k2x2k4x2x2

Table 1.

Subspaces and their corresponding reproducing kernels for the RKHS, , on 01×01.

2.3 Estimation

The estimation algorithm of SS-ANOVA models relies on the following representer theorem.

Theorem 1.2 (Representer Theorem) There exist coefficient vectors ξ=ξ1ξMRM and c=c1cNRN such that the minimizer of (2) in H=H0H1 has the following representation:

fx=m=1Mξmϕmx+i=1NciRxix,E10

where ϕmm=1M are the basis functions of the null space H0 and R is the reproducing kernel of H1.

Taking into consideration model (1), the function f can be estimated by minimizing (2). Using the representer theorem, the function f can be written as

f=Sξ+QcE11

where f=fx1fxN, S is a N×M matrix (e.g., M=2 for cubic spline) where the ijth entry is ϕjxi and Q is a N×N matrix where the ijth entry is Rxixj of the form (7). Plugging (11) into (2), the penalized least squares can be written as

1NySξQcySξQc+λcQc.E12

We differentiate (12) with respect to ξ and c to obtain the linear system

Q+Ic+Sξ=y,Sc=0.E13

For given smoothing parameters, solving for c and ξ provides the estimation for f. The selection of smoothing parameters for SS-ANOVA models is introduced below.

2.4 Selection of smoothing parameters

The smoothing parameters λ/θ balance the trade-off between the goodness of fit for f and the roughness of f in (2). Choosing the optimal smoothing parameters is data specific and should be performed prior to nonparametric regression analysis. Several smoothing parameter selection methods have been developed for the SS-ANOVA models [17]. Generalized cross-validation (GCV) is one of the most popular methods for selecting the optimal smoothing parameters λ/θ [18, 19].

To avoid overparameterization, let λ=λ/θ1λ/θB. The GCV score is defined as

Vλ=N1yIAλ2yN1trIαAλ2E14

where Aλ is symmetric matrix similar to the hat matrix in linear regression, and tr represents trace. The parameter α1 is a fudge factor [4]. When α=1 it is the original GCV score. Larger α’s yield smoother estimates. By default, we set α=1.4. Then optimal smoothing parameters λ can be chosen by minimizing the GCV score (14) using Newton-Raphson methods.

2.5 Computational complexity

In this section, we will discuss the computation complexity for calculating c and ξ from (12). One requires N3/3+ON2 operations to obtain estimates of c and ξ for the fixed smoothing parameters. In practice, the optimization of the smoothing parameter is also needed, which requires operations of 4BN3/3+ON2, where B is the number of smoothing parameters. Therefore, to minimize (12), the estimation algorithms have a computational complexity of ON3. The following sections will discuss how the Kalman filter can be used to fit SS-ANOVA models, which reduces the computation complexity to ON.

Advertisement

3. State space models

State space methodology was traditionally used to study dynamic problems (e.g., space tracking settings) because the procedure allows for “real-time” updating as data are collected [20]. In this chapter, we use the linear Gaussian state space model as an example to introduce concepts of state space models using the Kalman filter approach. More applications of state space models can be found in a study by Douc, Moulines and Stoffer (2014) and Durbin and Koopman (2001) [21, 22]. A state space model consists of two equations: state equation and measurement equation. The state equation describes the dynamics of the state variables:

zt+1=Gtzt+Ψtηt,ηtiidN0Δt,E15

where zt is the h×1 state vector, ηt is the g×1 disturbance vector with zero mean and a covariance matrix Δt, and Gt and Ψt are fixed design matrices of dimensions h×h and h×g, respectively. The measurement equation shows the relationship between the observed variable and the unobserved state variable:

ot=Φtzt+εt,εtiidN0Vt,E16

where ot is a n×1 vector with n observations, εt is a n×1 disturbance vector with zero mean and a covariance matrix Vt, and Φt is a fixed design matrix of dimension n×h. The initial state vector z0 is assumed to be normally distributed with mean μ0 and covariance matrix P. The two vectors ηt and εt are assumed to be mutually uncorrelated, that is,

ηtεtiidN0Δt00Vt.E17

These two vectors are also uncorrelated to the initial state vector z0.

The Kalman filter utilizes a set of recursive equations to estimate zt, given the observations Ot=o1ot at time t and its error variance matrix Pt [13]. Define

ẑt=EztOt,Pt=E[ztẑt(ztẑtOt],E18

where ẑt is the Kalman filter estimation of zt, with z0=Ez0=μ0, and P0=P. From the state and measurement Eqs. (15) and (16), the estimated ztt1 and the covariance matrix given zt1, and Pt1 become

ẑtt1=EztOt1=Gtzt1,Ptt1=E[ztẑtt1(ztẑtt1Ot1]=GtPt1Gt+ΨtΔtΨt,E19

and

ott1=Φtẑtt1.E20

Then the prediction error vector vt is

vt=otott1=otΦtẑtt1=Φtztẑtt1+εt,E21

with the covariance matrix

Evtvt=Λt=ΦtPtt1Φt+Vt.E22

Using the facts of the joint distribution of zt and vt, the Kalman filter estimator ẑt=EztOt at time t and its covariance matrix can be updated using

ẑt=ẑtt1+Ptt1ΦtΛt1otΦtẑtt1=Gtzt1+Ptt1ΦtΛt1vt,Pt=Ptt1Ptt1ΦtΛt1ΦtPtt1.E23

We further define Kalman gain as

Kt=Ptt1ΦtΛt1.E24

Then the filtered estimate of zt and its covariance matrix Pt is

ẑt=ẑtt1+Ktvt,Pt=IKtΦtPtt1.E25
Advertisement

4. State space representation of SS-ANOVA models

Due to the Markov structure of SS-ANOVA models after reparameterization, the SS-ANOVA models can be represented by the state space models, allowing for efficient estimation by algorithms based on the Kalman filter. Wecker and Ansley (1983) showed the state space representation for univariate smoothing spline models [11]. Such an approach reduces the computational complexity of smoothing splines from ON3 to ON. Based on the fast algorithm for the multivariate Kalman filter, Qin and Guo (2006) extended the univariate case to multivariate SS-ANOVA models [12]. The two-dimensional procedure was implemented, with computational complexity of On1n23 for data of size N=n1n2. The extension to higher dimensions was also discussed. In this chapter, we focus on the univariate setting to demonstrate the procedure to derive the state space representation of smoothing splines.

4.1 Univariate setting

We can use the state space formulation to represent model (1) (d=1) and apply the Kalman filter algorithm to estimate the model parameters of smoothing spline models efficiently. Based on the pioneered work in Wahba (1978) [23], the univariate function fx can be written in the following form

fx=ν=0m1ανxxlνν!+λσxlxxhm1m1!dWh,E26

where the covariate xxlxu and Wh is a Wiener process with the unit dispersion parameter. When all α’s have the diffuse prior distribution, the conditional expectation of fx given all data is the function minimizing (2) with the smoothing parameter 1/λ. The model (26) can be rewritten as

yi=1Uxi+ei,i=1,,N,E27

where 1=100 and Ux=UmxU1x is the m-dimensional stochastic process. In particular, we define

Ujx=ν=0j1Umνxlxxlνν!+λσxlxxhm1m1!dWh,j=m,,1.E28

The vector U contains Umx and its first m1 derivatives. Let αν=Umνxl, and we can easily verify the model (27).

The state space formulation relies on the Markov structure of Ux, which is demonstrated below. We define a m×m matrix, Γmxbxa, for any xb and xa within the interval xlxu as

Γmxbxa=1xbxaxbxam1m1!1xbxam2m2!..1.E29

We can easily verify

Γmxcxa=ΓmxcxbΓmxbxa.E30

To show the Markov structure of Ux, we also define a m×1 random vector ωxbxa=ω1xbxaωmxbxa, where

ωνxbxa=λσxaxbxbhν1ν1!dWh,ν=1,,m.E31

For any ν=1,,m, we have

ωνxcxa=ωνxcxb+j=0ν1xcxbjj!ωνjxbxa.E32

Thus we have

ωxbxa=Γmxcxbωxbxa+ωxcxb.E33

We now apply (30) and (33) to obtain the Markov structure of U

Uxb=ΓmxbxlUxl+ωxbxl=ΓmxbxaΓmxaxlUxl+Γmxbxaωxaxl+ωxbxa=ΓmxbxaUxa+ωxbxa.E34

The Markov structure is the key to the state space representation of SS-ANOVA models. For xl=x1xn=xu, we have

yi=ν=0m1ανxixlνν!+1Ωxi+ei,E35

as the measurement equation, where Ωxi=ΩmxiΩ1xi and Ωνxi=ωνxixl for ν=1,,m and i=1,,N. From (33), we have the state equation

Ωxi=Γmxixi1Ωxi1+ωxixi1,E36

for i=1,,N. Given the parameters λ and σ, the Kalman filtering and smoothing algorithms can be implemented to perform estimation for this state space representation.

Advertisement

5. Simulation studies

We used simulated data to compare the estimates of smoothing splines with those based on state space representation under the univariate setting. The following model was used to simulate N=1,000 observations.

yi=7sinπxi+ei,E37

where xi=ti/100, ti=1,,1,000, and eiN01. To apply the univariate SS-ANOVA model to the simulated data, we used the ssanova function in the gss package (version 2.2-3) [24]. The GCV algorithm was used to select the smoothing parameter of SS-ANOVA models. To implement the Kalman filtering algorithm to fit the smoothing splines, we used Eq. (35) as the measurement equation and Eq. (36) as the state equation. The parameters λ and σ were set to 0.01 and 1, respectively. We fitted the cubic spline (i.e., m=2) in the simulation studies. In the state Eq. (36), we have

Γ2xixi1=1xixi101,E38

for i=2,,1,000. The νν'th element of the variance matrix of ωxixi1 is

λσ2xixi1ν+ν1ν+ν1ν1!ν1!,E39

where ν,ν=1,,m. Given the above information, we utilized the fkf function in the FKF package (version 0.2.3) to implement the Kalman filtering and smoothing algorithms for the SS-ANOVA model. The details of iteration procedures are available in the help document of fkf, which is similar to the procedures described in Section 3. Figure 1 shows the similarity between the SS-ANOVA model fit with the generic algorithm from the gss package and the model fit based on the state space representations in (35) and (36).

Figure 1.

Comparison between the SS-ANOVA model fit and the model fit with the Kalman filter given λ=0.01 on simulated data.

Advertisement

6. Conclusions

In this chapter, we have introduced the theoretical foundation (e.g., representer theorem) and estimation algorithms of SS-ANOVA models. Given tensor product operations, the SS-ANOVA models can handle the multivariate data and study the main and interaction effects in the corresponding subspaces via functional ANOVA. The estimation algorithms of SS-ANOVA models need ON3 operations, which might be prohibitive computationally for analyzing super large data. Utilizing the Markov structure of SS-ANOVA models, the Kalman filter can be used to fit SS-ANOVA models when reparameterized into a state space formulation [11]. This state space representation reduces the computational complexity from ON3 to ON for the univariate case, allowing SS-ANOVA models to be applicable to big data applications. Additional research has been done to extend this representation to the multidimensional setting [12]. For the two-dimensional data with the dimensions of n1 and n2, the SS-ANOVA models can be fitted with the computational complexity of On1n23, where N=n1n2. Furthermore, we provided a simulated example to compare estimates from the state space representation and the estimates from the SS-ANOVA model for the univariate case.

Advertisement

Acknowledgments

Sun’s research was supported by the 18th Mile TRIF Funding from the University of Arizona and the NIH grant U19AG065169. LaFleur’s research was supported by the NIH grant U19AG065169.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Brinkman ND. Ethanol fuel-single-cylinder engine study of efficiency and exhaust emissions. SAE Transactions. 1981;90:1410-1424
  2. 2. Gu C. Smoothing Spline ANOVA Models. New York, NY: Springer; 2013
  3. 3. Wahba G. Spline Models for Observational Data. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1990
  4. 4. Kim YJ, Gu C. Smoothing spline Gaussian regression: More scalable computation via efficient approximation. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2004;66(2):337-356
  5. 5. Ma P, Huang JZ, Zhang N. Efficient computation of smoothing splines via adaptive basis sampling. Biometrika. 2015;102(3):631-645
  6. 6. Helwig NE, Ma P. Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface. 2016;9:433-444
  7. 7. Sun X, Zhong W, Ma P. An asymptotic and empirical smoothing parameters selection method for smoothing spline ANOVA models in large samples. Biometrika. 2021;108(1):149-166
  8. 8. Kalman RE. A new approach to linear filtering and prediction problems. Journal of Basic Engineering. 1960;82(1):35-45
  9. 9. Moreno VM, Pigazo A. Kalman Filter: Recent Advances and Applications. London, UK: IntechOpen; 2009
  10. 10. Shumway RH. Longitudinal data with serial correlation: A state-space approach. Journal of Statistical Planning and Inference. 1995;47(3):395-397
  11. 11. Wecker WE, Ansley CF. The signal extraction approach to nonlinear regression and spline smoothing. Journal of the American Statistical Association. 1983;78(381):81-89
  12. 12. Qin L, Guo W. State space representation for smoothing spline ANOVA models. Journal of Computational and Graphical Statistics. 2006;15(4):830-847
  13. 13. Koopman SJ, Durbin J. Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis. 2000;21(3):281-296
  14. 14. Wang Y. Smoothing Splines: Methods and Applications. Boca Raton, FL: CRC Press; 2011
  15. 15. Yang Y. Model selection for nonparametric regression. Statistica Sinica. 1999;9:475-499
  16. 16. Abramowitz M, Stegun IA. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Washington, DC: U.S. Govt. Print. Off; 1964
  17. 17. Wahba G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. The Annals of Statistics. 1985;13:1378-1402
  18. 18. Xiang D, Wahba G. A generalized approximate cross validation for smoothing splines with non-Gaussian data. Statistica Sinica. 1996:675-692
  19. 19. Graven P, Wahba G. Smoothing noisy data with spline function: Estimating the correct degree of smoothing by the method of Generalized Cross-Validaton. Numerische Mathematik. 1979;31:377-403
  20. 20. De Jong P. The likelihood for a state space model. Biometrika. 1988;75(1):165-169
  21. 21. Douc R, Moulines E, Stoffer D. Nonlinear Time Series: Theory, Methods and Applications with R Examples. Boca Raton, FL: CRC Press; 2014
  22. 22. Durbin J, Koopman SJ. A Simple and Efficient Simulation Smoother for State Space Time Series Analysis. Biometrika, 2002;89(3);603-615
  23. 23. Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society: Series B (Methodological). 1978;40(3):364-372
  24. 24. Gu C. Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software. 2014;58(5):1-25

Written By

Joel Parker, Yifan Zhang, Bonnie J. Lafleur and Xiaoxiao Sun

Reviewed: 22 July 2022 Published: 27 August 2022