## 1. Introduction

Multiple time series are of considerable interest in an array of domains, such as finance, economics, engineering and so on. The data are collected in time order and consist of several related variables of interest, for instance, the data of stock price indexes and the status data of important instruments such as shuttles. It is of much practical significance to model this kind of data well. Moreover, a lot of commonly seen multiple time series are correlated, which makes it reasonable to regard them as a single vector and to fit them using multivariate models. Multivariate models perform well in exploring the interdependencies among multiple time series and capturing the dynamic structure.

Plenty of contributions have been made in the field of parametric models for multivariate time series. For instance, Sims proposed vector autoregressive (VAR) models in 1980 [1], Engle and Kroner considered multivariate generalized autoregressive conditional heteroscedastic (GARCH) models in 1995 [2], and Tsay developed the multivariate threshold models in 1998 [3]. Compared to parametric models, nonparametric models require less assumption about the model structure and are more flexible. Combined with the fact that nonlinearity widely exists in time series, it is ideal to model the multiple time series using nonparametric models. However, not much of achievements have been made about this. This is partly due to the complexity of nonparametric smoothing as well as the curse of dimensionality. With these objectives in mind, Jiang proposed the multivariate functional-coefficient model in 2014 [4], which provides a useful tool for modeling vector time series data.

In this chapter, we first review some vector time series models, next extend them to include an error-correction term by incorporating cointegration among integrated variables, then develop a single index model for choosing the smoothing variable and a variable selection method for the multivariate functional-coefficient models, and finally study multivariate time-varying coefficient models and related hypothesis testing problems.

The remainder of this chapter is organized as follows. In Section 2 we review vector autoregressive (VAR) models. In Section 3, we consider multivariate functional-coefficient regression models and their extensions, where a model selection rule is also proposed. In Section 4 we introduce multivariate time-varying coefficient models and propose a generalized likelihood ratio test. In Section 5 we make a conclusion and discuss some interesting research topics to be completed.

## 2. Review of VAR models

The vector autoregressive model is a generalization of the univariate autoregressive model for forecasting a vector of time series. This model was pioneered by Sims in Ref. [1] and it has acquired a prominent role in analyzing macroeconomic time series. Prior to 1980, large-scale statistical dynamic simultaneous equations model (DSEMs) was widely used in empirical macroeconomics, which often contained dozens or even hundreds of equations. As the economic environment has grown more complicated, the traditional simultaneous models have grown. Sims believed that since these models do not dichotomize variables into “endogenous” and “exogenous,” the exclusion restrictions used to identify the simultaneous equations models make little sense. Thus, he advocated the vector autoregressive model (VAR) to model the interrelationships among a set of macroeconomic variables. In the structure of VAR models, each variable is a linear function of past lags of itself and past lags of the other variables. Sims demonstrated that VARs provide a flexible and tractable framework for analyzing economic time series. While hardly relying on economic theorems, VAR models have proven efficient in capturing the dynamics of multivariate systems as well as forecasting [1]. Specifically, a vector autoregressive model of order p [VAR(p)] has the following general form:

where *y _{t}* = (

*y*

_{1t}, … ,

*y*)

_{Kt}^{′}is a set of K time series variables, c is a

*K*× 1 vector of constant,

*A*’s are

_{i}*K*×

*K*coefficient matrices, and

*e*are error terms. Usually,

_{t}*e*are assumed to be zero-mean independent white noise with time-invariant and positive-definite covariance matrix Σ. For example, a VAR (1) model with two time series components can be written as:

_{t}or the equation set

Using lag-operator L, Eq. (1) can be written as the following form:

Let *A*(*z*) = *I* − *A*_{1}*z* − *A*_{2}*z*^{2} − … *A _{p}z^{p}*, where z is a complex number. Then the VAR process is stable if

In other words, the determinant of the matrix polynomial has no roots in and on the complex unit circle. If the stability conditions are satisfied and the process can be extended to the infinite past, then the VAR process is stationary.

For model (1), since the right-hand side consists of only predetermined variables and the error terms are assumed to be independent white noise with time-invariant covariance, each equation can be estimated by ordinary least squares (OLS). Zellner proved that the OLS estimator coincides with the generalized LS (GLS) estimator [5].

The celebrated model (1) is easy to fit, and its autoregressive structure allows one to study the feedback effects and the Granger causality. However, model (1) employs only the lagged values of y_{t} for forecast and ignores other potentially important variables’ effect. In addition, as time evolved, the coefficients remain constant, which may contrast the real situations where the dynamic structure of the relationship among different time series involves with time.

## 3. Multivariate functional-coefficient regression models and extensions

We briefly reviewed VAR models in the previous section. This parametric method has been significantly developed and widely applied to econometric dynamics as well as other domains. An alternative to modeling vector time series is the nonparametric method, which requires much fewer assumptions on the model structure and may shed light on the later parametric fitting. To illustrate the basic idea of this approach, let us begin with the multivariate threshold autoregressive model [3].

### 3.1. Multivariate threshold autoregressive model

The multivariate threshold autoregressive model is a generalization of the univariate threshold autoregressive model [6]. The idea is to partition one-dimensional variable into s regimes and impose an AR model with exogenous variables in each regime. Consider a *k*− dimensional time series *y _{t}* = (

*y*

_{1t}, … ,

*y*)

_{kt}^{′}and a

*v*-dimensional exogenous variable

*x*= (

_{t}*x*

_{1t}, … ,

*x*)

_{vt}^{′}, for

*t*= 1,…,

*n*. Let − ∞ =

*r*

_{0}<

*r*

_{1}< ⋯ <

*r*= ∞. The multivariate threshold model with threshold variable

_{s}*z*and delay

_{t}*d*has the following form:

where p and q are nonnegative integers and
*Ι _{k}*. The threshold variable

*z*is assumed to be stationary and has a continuous distribution.

_{t}Model (4) is piecewise linear in the threshold space of *z*_{t − d}, but it is nonlinear when *s* > 1 [3]. This model has proven to be useful in practice. Nevertheless, the assumption embedded in this model weakens the practicability, that is, the coefficients are assumed to be constants in the threshold space of *z*_{t − d} in model (4). This assumption is questionable since the economic conditions tend to change slowly over time and the coefficient functions may vary smoothly. Motivated by this, Jiang proposed the multivariate functional-coefficient model, in which the coefficients are functions of threshold variable *z*_{t − d} instead of constants [4].

### 3.2. Multivariate functional-coefficient models

The multivariate functional-coefficient model has the following form:

where
*k* × 1 functional vector, *φ _{i}*(⋅) are

*k*×

*k*functional matrices, and

*β*(⋅) are

_{i}*k*×

*v*functional matrices. The innovation satisfies

*σ*

_{t}^{∗}is a positive-definite matrix and

*σ*

_{t}^{∗}is measurable with respect to the σ-field generated by the historical information

_{t − 1}= {(

*w*

_{j,}

*z*

_{j − d}) :

*j*≤

*t*}, where

*w*= (

_{j}*x*

_{j − 1}, … ,

*x*

_{j − q},

*y*

_{j − 1}, … ,

*y*

_{j − p}). For model (5), we are interested in estimating the regression part. Once it is estimated, one may consider making simultaneous inference about parameters and using the residuals to study the structure of the volatility matrix. This model is a generalization of vector autoregressive models [1], threshold models [3] and functional-coefficient models [7–10]. Even for one-dimensional settings with k = 1, model (5) includes important predictive regression models in econometrics, such as the linear predictive models with nonstationary predictors [11–13] and functional-coefficient models for nonstationary time series data [14]. Model (5) can also be used to investigate the Granger Causality [15–17] and the feedback effect in engineering and finance [18, 19].

For model (5), a weighted local least squares estimation method was provided in [4]. Let *X _{t}* =

*vec*(1,

*y*

_{t − 1}, … ,

*y*

_{t − p},

*x*

_{t − 1}, … ,

*x*

_{t − p}) and

*Φ*(

*z*) = (

*c*(

*z*),

*φ*

_{1}(

*z*), ⋯ ,

*φ*(

_{p}*z*),

*β*

_{1}(

*z*), … ,

*β*(

_{q}*z*)). Then model (5) becomes

where *Φ*(⋅) is a *k* × *m* matrix-valued function and *X _{t}* is an

*m*× 1 vector with

*m*= 1 +

*pk*+

*qv*. For any

*z*

_{t − d}in the neighborhood of

*z*, by the Taylor expansion, we have

Let *S* and *V* be 2 × 2 matrices whose (*i*, *j*)*th* elements are *μ*_{i + j − 2} = ∫ *u*^{i + j − 2}*K*(*u*)*du* and *ν*_{i + j − 2} = ∫ *u*^{i + j − 2}*K*^{2}(*u*)*du*, respectively, and let *s* = (*μ*_{2}, *μ*_{3})^{′}. Given any invertible working variance matrix *σ _{t}*

^{2}of

*σ*

_{t}^{∗2}, the estimator

where ∥ ⋅ ∥ denotes the Euclidean norm, *s*^{′} = max(*p*, *d*, *q*), and *K _{hn}*(

*x*) =

*h*

_{n}^{−1}

*K*(

*x*/

*h*) for kernel function

_{n}*K*(⋅) with bandwidth

*h*controlling the amount of smoothing. Let

_{n}*K*

_{hn}^{(i)}(

*z*

_{t − d}−

*z*) =

*h*

_{n}^{−i}(

*z*

_{t − d}−

*z*)

*K*(

_{hn}*z*

_{−d}−

*z*) and

*i*= 0 , 1 , 2. Then the weighted estimators

Under certain conditions, the weighted estimators are asymptotically normal (see [4]).

Recall that, in model (5), *σ _{t}*

^{∗}is a positive-definite matrix measurable with respect to the sigma-algebra generated by historical information. If there is a parametric structure of

*σ*

_{t}^{∗}, for example, the generalized autoregressive conditional heteroscedastic (GARCH) errors [4], then it helps to improve the efficiency of the weighted estimation. Example 3 in [4] exemplifies this point. Our intuition is that, if a parametric structure of

*σ*

_{t}^{∗}is correctly specified, then the weighted estimation mimics the oracle estimation in the sense that

*σ*

_{t}^{∗}is known. This intuition can be verified theoretically since

*σ*

_{t}^{∗}can be estimated at rate of

### 3.3. Extension of multivariate functional-coefficient models

Due to the fact that many economic factors are not stationary, classic regression analysis requiring the stationarity condition suffers from a great limitation. Cointegration analysis has become a formidable toolkit in analyzing non-stationary economic time series. The concept of cointegration goes back to Granger [20] and initiated a literal research boom. Engle & Granger proposed the well-known Engel-Granger test to examine whether there is a cointegrating relationship among a set of first-order integrated variables [21].

Motivated by Granger and Engel & Granger, Jiang proposed an error-correction version of model (5) by incorporating the cointegrating relationship of first-order integrated variables [4]. This allows us to cope with the nonstationarity of vector time series and to improve the accuracy of forecasting.

Let *s _{t}* denote a

*k*× 1 vector of first-order integrated variables and let

*y*=

_{t}*s*−

_{t}*s*

_{t − 1}. Assume that there is a co-integrating relationship for

*s*; that is, there exists a unique

_{t}*k*×

*r*(0 <

*r*<

*k*) deterministic matrix

*θ*of rank

*r*and a stationary process

*u*such that

_{t}*θ*=

^{T}s_{t}*u*. Then an error-correction form of model (5) is

_{t}where *γ*(*z*_{t − d}) is a *k* × *r* coefficient matrix. This model simplifies to the Granger representation theorem if the coefficient functions are constant and there are no exogenous variables [4].

Due to the widespread presence of cointegrating variables in finance and economics, model (7) should improve the practicability of model (5). However, model (7) requires specification of variable z_{t}. This can be relaxed by using the idea of single index models. Recall that model (5) can be represented in succinct form (6). The similar operations can be applied to model (7). Now set *z _{t}* =

*γ*and let data decide the value of

^{T}X_{t}*γ*. Then model (7) can be extended as

where *γ* is a directional vector such that its first nonzero entry is positive. Model (8) is more flexible than model (7), it is key to estimate *γ*. We introduce the profile lease squares method to estimate model (8). The estimation procedure consists of several steps:

**Step 1**. Given an initial value of *γ*, one obtains the weighted estimator

**Step 2**. Find the value

**Step 3**. Update the value of *γ* by
*Φ*(⋅) is estimated by

It can be shown that
*γ*, since

### 3.4. Variable selection of multivariate functional-coefficient models

In this section, we consider variable selection of model (6). Increasing the lags p and q will necessarily reduce the sum of squared errors. However, doing so will increase the burden of coefficient estimation and may also lead to overfitting. Hence, for the multivariate functional-coefficient model, order selection is of much importance.

Two widely used model selection criteria are Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). However, these stepwise methods yield heavy burden on computation and furthermore bring difficulty in establishing asymptotics for the estimation of selected models. The problems become more severe for high-dimensional data. Various regularization methods have been proposed to deal with these problems. Among them, a popular approach, called LASSO, proposed by Tibshirani, performs variable selection and parameter estimation simultaneously. See Ref. [22]. For univariate varying-coefficient regression models with i.i.d. data, Wang and Xia [23] developed a shrinkage estimation method by combining the idea of group LASSO [24] and kernel smoothing. In the following we develop a shrinkage estimation method for multivariate functional-coefficient model (6):

where the functional-coefficient matrix *Φ*(*z*) = (*c*(*z*), *φ*_{1}(*z*), … , *φ _{p}*(

*z*),

*β*

_{1}(

*z*), … ,

*β*(

_{q}*z*)). Since each column of

*Φ*(⋅) corresponds to the effect of a component of X

_{t}, for variable selection of X

_{t}we should penalize each column of

*Φ*(⋅) as a whole. This leads to minimizing

where *Φ _{j}* = (

*Φ*(

_{j}*z*

_{s′ + 1 − d}), … ,

*Φ*(

_{j}*z*

_{n − d})) with

*Φ*(⋅) being the

_{j}*j*th column of

*Φ*(⋅),

*λ*’s are tuning parameters, and for any matrix

_{j}*A*we use ∥

*A*∥ to denote the Hilbert-Schmidt norm of matrix. It is interesting to establish model selection consistency and the oracle property of the shrinkage estimation.

## 4. Multivariate time-varying coefficient models

Parallel to functional-coefficient model (5), it is natural to consider its alternative with time-varying coefficients [25]:

where *y _{t}* is a

*k*× 1 vector,

*x*is a

_{t}*v*× 1 vector,

*k*× 1 vector,

*ϕ*(⋅) are

_{i}*k*×

*k*smooth matrices and

*β*(⋅) are

_{i}*k*×

*v*smooth matrices. The innovation satisfies the same conditions as model (5). It is known that as time involves the economic conditions change slowly and smoothly. Model (11) reflects this smoothing change by allowing the coefficients being smoothing functions of time.

Let

Using similar arguments to model (6), we can rewrite model (11) as

where *Φ*(⋅) is a *k* × *m* matrix and *X _{t}* is the same as in model (6). By the Taylor expansion, for any t in the neighborhood of

*t*

_{0}∈ (0,

*T*), we have

Running the local linear smoother for model (12), we minimize

over P and Q, where *s* = max(*p*, *q*) and *K _{h}*(

*x*) =

*h*

^{−1}

*K*(

*x*/

*hT*). Then it is straightforward to obtain an explicit form of the minimizer,

where
*K _{h}*

^{(i)}(

*t*−

*t*

_{0}) = (

*Th*)

^{−i}(

*t*−

*t*

_{0})

*(*

^{i}K_{h}*t*−

*t*

_{0}), for

*i*= 0 , 1 , 2.

Define *M* = *E*[(*X _{t}X_{t}^{T}*) ⊗

*I*] and

_{k}*N*=

*E*[(

*X*) ⊗ (

_{t}X_{t}^{T}*σ*

_{t}^{∗})

^{2}]. Let

*μ*= ∫

_{i}*u*(

^{i}K*u*)

*du*,

*v*= ∫

_{i}*u*

^{i}K^{2}(

*u*)

*du*,

Using similar arguments to [4], we can show that this estimator is asymptotically normal with mean zero and variance *Σ*, where *Σ* = (*U*^{−1}*VU*^{−1}) ⊗ (*M*^{−1}*NM*^{−1}).

### 4.1. Generalized likelihood ratio tests

The multivariate time-varying coefficient regression model is flexible and powerful to estimate the dynamic changes of coefficients. After fitting a given dataset, some important questions arise, for example, whether the coefficient functions are actually constant or of some particular forms? This leads to statistical hypothesis testing. To answer these questions, we develop generalized likelihood ratio statistics to test corresponding hypothesis testing problems about the coefficient functions [26].

For the multivariate time-varying coefficient model (12), assume *Σ*_{0}^{−1/2}*ε _{t}* has mean zero and covariance matrix

*I*with

_{k}*Σ*

_{0}being a symmetric positive-definite constant matrix.

Consider the following hypothesis testing problem

where *Θ*_{0}(*t*/*T*) is some known constant matrix *Φ*_{0} or a set of functionalmatrices. Let
*Φ*, and let

where
*Σ* being a known constant covariance matrix from a working model. It is meaningful to study the asymptotic distributions of the test statistic under the null and alternatives.

In the following example, we consider the case when *Θ*_{0}(.) is a known constant. For any *u* = *t*/*T* ∈ (0, 1), if we rewrite matrix *Φ*(*u*) as a vector, Δ(*u*) ≡ *vec*(*Φ*_{1}(*u*), … , *Φ _{m}*(

*u*)), and denote Δ

_{0}(

*u*) ≡

*vec*(

*Φ*

_{01}

^{∗}(

*u*), … ,

*Φ*

_{0M}

^{∗}(

*u*)), then the power of the test is evaluated against alternatives:

where *G*(*u*) = (*g*_{1}(*u*), … , *g _{m}*(

*u*))

*is a vector of functions.*

^{T}**Example 1**. To investigate the performance of the proposed generalized likelihood ratio test, 600 replications for each of sample sizes *T* = 200, *T* = 400 and *T* = 800 from the multivariate time-varying coefficient model were generated:

where *k* = 2, *v* = *p* = *q* = 1, Δ = *vec*(0.5, 0.0074, 0.08, 0.65, 0.25, 0.75)* ^{T}*. We set the initial values

*x*

_{1}= 0 and

*y*

_{1}= (0.15, 0.2). Accordingly,

*X*=

_{t}*vec*(

*y*

_{1 , t − 1},

*y*

_{2 , t − 1},

*x*

_{t − 1}) for

*t*= 2 , … ,

*T*. Three distributions of the error term are considered: bivariate normal, bivariate log-normal, and bivariate

*t*(5), each with variance matrix

*θ*:

where
*θ* = 0 , 0.2 , 0.4 , 0.6 , 0.8 , 1. The power function is estimated by the relative rejection frequency of H_{0} in the above replicates.

The significance level is set to be 5%, and the critical values in simulations are calculated similarly by using the conditional bootstrap method in Ref. [26] for each given *θ* value. Detail of this method is as follows:

**Step 1**. Compute the estimators of the coefficient

**Step 2**. Compute the test statistic *λ _{T}*(

*H*

_{0}) and the residuals {

*e*} from the alternative model.

_{t}**Step 3**. For each given *X _{t}*, draw a bootstrap residual

*e*

_{t}^{∗}from the centered empirical distribution of

*e*and compute

_{t}**Step 4**. Compute the test statistic *λ _{T}*

^{∗}(

*H*

_{0}) using the bootstrap sample constructed in Step 3.

**Step 5**. Repeat Step 3 and Step 4 to get a sample of the test statistic *λ _{T}*

^{∗}(

*H*

_{0}). The critical values at significance level

*α*are calculated by the 100(1 −

*α*)th percentile of the sample.

**Figure 1** displays the power curves in difference scenarios. We can tell from **Figure 1** that the patterns of power curves look like half of an inverted normal density. All the curves rise monotonically from a height equal to the significance level of 5% until eventually it reaches its maximum height of around 90%. It is evident from **Figure 1** that the test is powerful for all three different distributions of error terms. Moreover, the test becomes more powerful as sample size increases. These indicate that the proposed test keeps the size and is powerful for distinguishing the difference between the null and the alternative.

## 5. Conclusions

In this chapter, we have reviewed some parametric and nonparametric methods for modeling nonlinear vector time series data, which include the VAR model, the multivariate threshold autoregressive model, and the multivariate functional-coefficient regression model. These models have great significance in econometrical and statistical theory and application. Based on the weighted local least square estimation, we have proposed a variable selection method for the functional-coefficient model. This model selection procedure is applicable to the proposed multivariate single index models and multivariate time-varying coefficient models. We have also extended the generalized likelihood ratio test to the time-varying coefficient model and demonstrated its performance through simulation. The proposed methodology is very useful for modeling nonlinear dynamic structures inherited in financial data. However, there are many problems remain unsolved for our procedure, such as the limiting theory about the proposed methodology. Future work includes, but not limited to, extending our models to nonstationary settings and exploring their performance in different applications.