Reproducing kernels of (12) on

## Abstract

Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods.

### Keywords

- smoothing spline
- smoothing spline ANOVA models
- reproducing kernel Hilbert space
- penalized likelihood
- basis sampling

## 1. Introduction

Among the nonparametric models, smoothing splines have been widely used in many real applications. There has been a rich body of literature in smoothing splines such as the additive smoothing spline [1, 2, 3, 4, 5, 6], the interaction smoothing spline [7, 8, 9, 10], and smoothing spline ANOVA (SSANOVA) models [11, 12, 13, 14].

In this chapter, we focus on studying the SSANOVA models. Suppose that the data

where

In the model (1),

where

The rest of the chapter is organized as follows. Section 2 provides a detailed introduction to SSANOVA models and the model estimation. The details of adaptive basis selection algorithm and rounding algorithm are reviewed in Section 3. In Appendix, we demonstrate the numerical implementations using the R software.

## 2. Smoothing spline ANOVA models

In this section, we first review smoothing spline models and the reproducing kernel Hilbert space. Second, we present how to decompose a nonparametric function on tensor product domains, which lays the theoretical foundation for SSANOVA models. In the end, we show the estimation of SSANOVA models and illustrate the model with a real data example.

### 2.1. Introduction of smoothing spline models

In the model (1),

*Cubic smoothing splines*

*Suppose that* *follows a normal distribution, that is,* *. Then, the penalized likelihood functional* (2) *can be reduced as the penalized least squares:*

*where* *. The minimizer of* (3) *is called a cubic smoothing spline* [16, 17, 18]*. In* (3)*, the first term quantifies the fidelity to the data, and the second term controls the roughness of the function.*

*Exponential family smoothing splines*

*Suppose that* *follows an exponential family distribution:*

*where* *, and* *are known functions and* *is either known or a nuisance parameter. Then,* *can be estimated by minimizing the following penalized likelihood functional* [19, 20]:

*Note that the cubic smoothing spline in* Example 1 *is a special case of exponential family smoothing splines when* *follows the Gaussian distribution.*

The smoothing parameter

### 2.2. Reproducing kernel Hilbert space

We assume that readers are familiar with Hilbert space, which is a complete vector space with an inner product well defined that allows length and angle to be measured [23]. In a general Hilbert space, the continuity of a functional, which is required in minimizing (2) on

For each

*Riesz representation*

*Let* *be a Hilbert space. For any functional* *of* *, there uniquely exists an element* *such that*

*where* *is called the representer of* *. The uniqueness is in the sense that* *and* *are considered as the same representer for any* *and* *satisfying* *, where* *defines the norm in*

For a better construction of estimator minimizing (2), one needs the continuity of evaluation functional

*Reproducing kernel Hilbert space*

*Consider a Hilbert space* *consisting of functions on domain* *. For every element* *, define an evaluation functional* *such that* *. If all the evaluation functional* *s are continuous,* *, then* *is called a reproducing kernel Hilbert space.*

By Theorem 2.1, for every evaluation functional

The bivariate function

We now introduce the concept of tensor sum decomposition. Suppose that

*Suppose that* *and* *are the reproducing kernel Hilbert spaces* *and* *, respectively. If* *, then* *has a reproducing kernel*

*Conversely, if the reproducing kernel* *of* *can be decomposed into* *, where both* *and* *are positive definite, and they are orthogonal to each other, that is*, *for* *, then the spaces* *and* *corresponding to the kernels* *and* *form a tensor sum decomposition*

### 2.3. Representer theorem

In (2), the smoothness penalty term

*There exist coefficient vectors* *and* *such that*

*where* *is the basis of null space* *and* *is the reproducing kernel of*

This theorem indicates that although the minimization problem is in an infinite-dimensional space, the minimizer of (2) lies in a data-adaptive finite-dimensional space.

### 2.4. Function decomposition

The decomposition of a multivariate function is similar to the classical ANOVA. In this section, we present the functional ANOVA which lays the foundation for SSANOVA models.

#### 2.4.1. One-way ANOVA decomposition

We consider a classical one-way ANOVA model

Similar to the classical ANOVA decomposition, a univariate function

where

#### 2.4.2. Multiway ANOVA decomposition

On a

where

### 2.5. Some examples of model conduction

**Smoothing splines on**

Here, we use an inner product

One can easily check that (9) is a well-defined inner product in

where

For space

(See details in [11]; Section~2.3).

**SSANOVA models on product domains**: A natural way to construct reproducing kernel Hilbert space on product domain

*Suppose that if* *is nonnegative definite on* *and* *is nonnegative definite on* *, then* *is nonnegative definite on*

Theorem 2.4 implies that a reproducing kernel

One can decompose each marginal space

where

In the following, we will give some examples of tensor product smoothing splines on product domains.

#### 2.5.1. Smoothing splines on 1 … K × 0 1

We construct the reproducing kernel Hilbert space by using

In this case, the space

The reproducing kernels of tensor product cubic spline on

Subspace | Reproducing kernel |
---|---|

On other product domains, for example,

#### 2.5.1.1. General form

In general, a tensor product reproducing kernel Hilbert space can be specified as

where

### 2.6. Estimation

In this section, we show the procedure of estimating the minimizer

#### 2.6.1. Penalized least squares

We consider the same model shown in (1), and then the

Let

where

where

By the reproducing property (5), the roughness penalty term can be expressed as

Therefore, the penalized least squares criterion (15) becomes

The penalized least squares (16) is a quadratic form of both

Note that (17) only works for penalized least squares (15), and hence a normal assumption is needed in this case.

#### 2.6.2. Selection of smoothing parameters

In SSANOVA models, properly selecting smoothing parameters is important to estimate

For the multivariate predictors, the penalty term in (15) has the form

where

A GCV score is defined as

where

### 2.7. Case study: Twitter data

Tweets in the contiguous United States were collected over five weekdays in January 2014. The dataset contains information of time, GPS location, and tweet counts (see Figure 2). To illustrate the application of SSANOVA models, we study the time and spatial patterns in this data.

The bivariate function

where

The main effects of time and location are shown in Figure 3. Obviously, in panel (a), the number of tweets has the periodic effect, where it attains the maximum value at 8:00 p.m. and the minimum value at 5:00 a.m. The main effect of time shows the variations of Twitter usages in the United States. In addition, we can infer how the tweet counts vary across different locations based on panel (b) in Figure 3. There tend to be more tweets in the east than those in the west regions and more tweets in the coastal zone than those in the inland. We use the scaled dot product

## 3. Efficient approximation algorithm in massive datasets

In this section, we consider SSANOVA models under the big data settings. The computational cost of solving (17) is of the order

### 3.1. Adaptive basis selection

A natural way to select the basis functions is through uniform sampling. Suppose that we randomly select a subset

The computational cost will be reduced significantly to

Although the uniform basis selection reduces the computational cost and the corresponding

**Step 1** Divide the range of responses

**Step 2** For each

**Step 3** Combine

**Step 4** Define

By adaptive basis selection, the minimizer of (2) keeps the same form as that in Theorem 2.3:

Let

where

The computational complexity of solving (18) is of the order

### 3.2. Rounding algorithm

Other than sampling a smaller set of basis functions to save the computational resources, for example, the adaptive basis selection method presented previously, [15] proposed a new rounding algorithm to fit SSANOVA models in the context of big data.

**Rounding algorithm**: The details of rounding algorithm can be shown in the following procedure:

**Step 1** Assume that all predictors are continuous.

**Step 2** Convert all predictors to the interval

**Step 3** Round the raw data by using the transformation:

where the rounding parameter

**Step 4** After replacing

In Step 3, if

It is evident that the value of rounding parameter can influence the precision of approximation. The smaller the rounding parameter, the better the model estimation and the higher the computational cost.

**Computational benefits**: We now briefly explain why the implementation of rounding algorithm can reduce the computational loads. For example, if the rounding parameter

**Case study**: To illustrate the benefit of the rounding algorithm, we apply the algorithm to the electroencephalography (EEG) dataset. Note that EEG is a monitoring method to record the electrical activity of the brain. It can be used to diagnose sleep disorders, epilepsy, encephalopathies, and brain death.

The dataset [33] contains 44 controls and 76 alcoholics. Each subject was repeatedly measured 10 times by using visual stimulus at a frequency of 256 Hz. This brings about

After applying the model to the unrounded data, rounded data with rounding parameter

GCV | AIC | BIC | CPU time (seconds) | |
---|---|---|---|---|

Unrounded data | 85.9574 | 2,240,019 | 2,240,562 | 15.65 |

Rounded data with | 86.6667 | 2,242,544 | 2,242,833 | 1.22 |

Rounded data with | 86.7654 | 2,242,893 | 2,243,089 | 1.13 |

Based on Table 2, we can easily see that there are no significant difference among the GCV scores and AIC/BIC. In addition using rounding algorithm reduces

## 4. Conclusion

Smoothing spline ANOVA (SSANOVA) models are widely used in applications [11, 20, 36, 37]. In this chapter, we introduced the general framework of the SSANOVA models in Section 2. In Section 3, we discussed the models under the big data settings. When the volume of data grows, fitting the models is computing-intensive [11]. The adaptive basis selection algorithm [14] and rounding algorithm [15] we presented can significantly reduce the computational cost.

## Acknowledgments

This work is partially supported by the NIH grants R01 GM122080 and R01 GM113242; NSF grants DMS-1222718, DMS-1438957, and DMS-1228288; and NSFC grant 71331005.

## Conflict of interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interests; and expert testimony or patent-licensing arrangements) or nonfinancial interest (such as personal or professional relationships, affiliations, knowledge, or beliefs) in the subject matter or materials discussed in this manuscript.

In this appendix, we use two examples to illustrate how to implement smoothing spline ANOVA (SSANOVA) models in R. The gss package in R, which can be downloaded on the CRAN

We now load the gss package:

**Example I**: Apply the smoothing spline to a simulated dataset.

Suppose that the predictor

Then, fit cubic smoothing spline model:

^{˜}

To evaluate the predicted values, one uses:

The

**Example II:** Apply the SSANOVA model to a real dataset.

In this example, we illustrate how to implement the SSANOVA model using the gss package. The data is from an experiment in which a single-cylinder engine is run with ethanol to see how the nox concentration

^{˜}

The predicted values are shown in Figure 5.