Open access peer-reviewed chapter

Likelihood Ratio Tests in Multivariate Linear Model

By Yasunori Fujikoshi

Submitted: October 9th 2015Reviewed: January 22nd 2016Published: July 6th 2016

DOI: 10.5772/62277

Downloaded: 969

Abstract

The aim of this chapter is to review likelihood ratio test procedures in multivariate linear models, focusing on projection matrices. It is noted that the projection matrices to the spaces spanned by mean vectors in hypothesis and alternatives play an important role. Some basic properties are given for projection matrices. The models treated include multivariate regression model, discriminant analysis model, and growth curve model. The hypotheses treated involve a generalized linear hypothesis and no additional information hypothesis, in addition to a usual liner hypothesis. The test statistics are expressed in terms of both projection matrices and sums of squares and products matrices.

Keywords

  • algebraic approach
  • additional information hypothesis
  • generalized linear hypothesis
  • growth curve model
  • multivariate linear model
  • lambda distribution
  • likelihood ratio criterion (LRC)
  • projection matrix

1. Introduction

In this chapter, we review statistical inference, especially likelihood ratio criterion (LRC) in multivariate linear model, focusing on matrix theory. Consider a multivariate linear model with p response variables y1, …, yp and k explanatory or dummy variables x1, …, xk. Suppose that y = (y1, …, yp)′ and x = (x1, …, xk)′ are measured for n subjects, and let the observation of the ith subject be denoted by yi and xi. Then, we have the observation matrices given by

Y=y1y2yn,X=x1x2xn.E1.1

It is assumed that y1, …, yn are independent and have the same covariance matrix Σ. We express the mean of Yas follows:

EY=η=η1,,ηp.E1.2

A multivariate linear model is defined by requiring that

ηiΩ,foralli=1,,p,E1.3

where Ω is a given subspace in the n dimensional Euclid space Rn. A typical Ω is given by

Ω=X=η=Xθ;θ=θ1θk,<θi<,i=1,,k.E1.4

Here, ℛ[X] is the space spanned by the column vectors of X. A general theory for statistical inference on the regression parameter Θcan be seen in texts on multivariate analysis, e.g., see [18]. In this chapter, we discuss with algebraic approach in multivariate linear model.

In Section 2, we consider a multivariate regression model in which xi'sare explanatory variables and Ω = ℛ[X]. The maximum likelihood estimator (MLE)s and likelihood ratio criterion (LRC) for Θ2=Oare derived by using projection matrices. Here, Θ=Θ1Θ2.The distribution of LRC is discussed by multivariate Cochran theorem. It is pointed out that projection matrices play an important role. In Section 3, we give a summary of projection matrices. In Section 4, we consider to test an additional information hypothesis of y2 in the presence of y1, where y1 = (y1. …, yq)′ and y2 = (yq + 1. …, yp)′. In Section 5, we consider testing problems in discriminant analysis. Section 6 deals with a generalized multivariate linear model which is also called the growth curve model. Some related problems are discussed in Section 7.

2. Multivariate regression model

In this section, we consider a multivariate regression model on p response variables and k explanatory variables denoted by y = (y1, …, yp)′ and x = (x1, …, xk)′, respectively. Suppose that we have the observation matrices given by (1.1). A multivariate regression model is given by

Y=XΘ+E,E2.1

where Θis a k × p unknown parameter matrix. It is assumed that the rows of the error matrix Eare independently distributed as a p variate normal distribution with mean zero and unknown covariance matrix Σ,i.e., Np0,Σ.

Let L(Θ, Σ) be the density function or the likelihood function. Then, we have

2logLΘ,Σ=nlog|Σ|+trΣ1YXΘYXΘ+nplog2π.E60000

The maximum likelihood estimators (MLE) Θ^and Σ^of Θand Σare defined by the maximizers of L(Θ, Σ) or equivalently the minimizers of −2log L(Θ, Σ).

Theorem 2.1 Suppose that Yfollows the multivariate regression model in (2.1). Then, the MLEs of Θand Σare given as

Θ^=XX1XY,Σ^=1nYXΘ^YXΘ^=1nYInPXY,E70000

where PX = X(XX)− 1X′. Further, it holds that

2logLΘ^,Σ^=nlog|Σ^|+nplog2π+1.E80000

Theorem 2.1 can be shown by a linear algebraic method, which is discussed in the next section. Note that PX is the projection matrix on the range space Ω=X.It is symmetric and idempotent, i.e.

PX=PX,PX2=PX.E90000

Next, we consider to test the hypothesis

H:EY=X1Θ1Θ2=O,E2.2

against K;Θ2O, where X = (X1X2), X1n × j and Θ=Θ1Θ2,Θ1;j×p.The hypothesis means that the last k − j dimensional variate x2 = (xj + 1, …, xk)′ has no additional information in the presence of the first j variate x1 = (x1, …, xj)′. In general, the likelihood ratio criterion (LRC) is defined by

λ=maxHLΘ,ΣmaxKLΘ,Σ.E2.3

Then we can express

2logλ=minH2logLΘ,ΣminK2logLΘ,Σ=minHnlog|Σ|+trYXΘYXΘminKnlog|Σ|+trYXΘYXΘ.E120000

Using Theorem 2.1, we can expressed as

λ2/nΛ=nΣ^ΩnΣ^ω.E130000

Here, Σ^Ωand Σ^ωare the maximum likelihood estimators of Σunder the model (2.1) or K and H, respectively, which are given by

nΣ^Ω=YXΘ^ΩYXΘ^Ω,Θ^Ω=XX1XY=YInPΩYE2.4

and

nΣ^ω=YX1Θ^1ωYX1Θ^1ω,Θ^1ω=X1X11X1Y=YInPωYE2.5

Summarizing these results, we have the following theorem.

Theorem 2.2 Let λ = Λn/2 be the LRC for testing H in (2.2). Then, Λ is expressed as

Λ=SeSe+Sh,E2.6

where

Se=Σ^Ω,Sh=Σ^ωΣ^Ω,E2.7

and SΩand Sωare given by (2.4) and (2.5), respectively.

The matrices Seand Shin the testing problem are called the sums of squares and products (SSP) matrices due to the error and the hypothesis, respectively. We consider the distribution of Λ. If a p × p random matrix Wis expressed as

W=j=1nzjzj,E180000

where zjNpμj,Σand z1, …, zn are independent, Wis said to have a noncentral Wishart distribution with n degrees of freedom, covariance matrix Σ,and noncentrality matrix Δ=μ1μ1++μnμn.We write that WWpn,Σ;Δ.In the special case Δ=O,Wis said to have a Wishart distribution, denoted by WWpn,Σ.

Theorem 2.3 (multivariate Cochran theorem) Let Y=y1yn, where yiNpμi,Σ, i = 1, …, n and y1, …, yn are independent. Let A, A1, and A2 be n × n symmetric matrices. Then:

1. YAYWpk,Σ;ΩA2=A,trA=k,Ω=EYAEY.

2. YA1Yand YA2Yare independentA1A2 = O.

For a proof of multivariate Cochran theorem, see, e.g. [3, 68]. Let Band Wbe independent random matrices following the Wishart distribution Wpq,Σand Wpn,Σrespectively, with n ≥ p. Then, the distribution of

Λ=WB+WE190000

is said to be the p-dimensional Lambda distribution with (qn)-degrees of freedom and is denoted by Λp(qn). For distributional results of Λp(qn), see [1, 3].

By using multivariate Cochran’s theorem, we have the following distributional results:

Theorem 2.4 Let Seand Shbe the random matrices in (2.7). Let Λ be the Λ-statistic defined by (2.6). Then,

  1. Seand Share independently distributed as a Wishart distribution Wpnk,Σand a noncentral Wishart distribution Wpkj,Σ;Δrespectively, where

    Δ=XΘPXPX1XΘ.E2.8

  • Under H, the statistic Λ is distributed as a lambda distribution Λp(k − jn − k).

  • Proof. Note that PΩ = PX = X(XX)− 1X′, Pω=PX1=X1X1X11X,and PΩPω = PωPΩ. By multivariate Cochran’s theorem the first result (1) follows by checking that

    InPΩ2=InPΩ,PΩPω2=PΩPω,InPΩPΩPω=O.E210000

    The second result (2) follows by showing that Δ0=O,where Δ0is the Δunder H. This is seen that

    Δ0=X1Θ1PΩPωX1Θ1=O,E220000

    since PΩX1 = PωX1 = X1.

    The matrices Seand Shin (2.7) are defined in terms of n × n matrices PΩ and Pω. It is important to give expressions useful for their numerical computations. We have the following expressions:

    Se=YYYXXX1XY,Sh=YXXX1XYYX1X1X11X1Y.E230000

    Suppose that x1 is 1 for all subjects, i.e., x1 is an intercept term. Then, we can express these in terms of the SSP matrix of (y', x')′ defined by

    S=i=1nyiy¯xix¯yiy¯xix¯=SyySyxSxySxx,E2.9

    where y¯and x¯are the sample mean vectors. Along the partition of x=x1x2′,we partition Sas

    S=SyySy1Sy2Sy1S11S12Sy2S21S22.E2.10

    Then,

    Se=Syyx,Sh=Sy21S2211S2y1.E2.11

    Here, we use the notation Syyx=SyySyxSxx1Sxy,Sy21=Sy2Sy1S111S1y, etc. These are derived in the next section by using projection matrices.

    3. Idempotent matrices and max-mini problems

    In the previous section, we have seen that idempotent matrices play an important role on statistical inference in multivariate regression model. In fact, letting EY=η=η1,,η,pconsider a model satisfying

    ηiΩ=R,foralli=1,,p,E3.1

    Then the MLE of Θis Θ^=XX1XY,and hence the MLE of η is denoted by

    η^Ω=XΘ^=PΩY.E280000

    Here, PΩ = X(XX)− 1X′. Further, the residual sums of squares and products (RSSP) matrix is expressed as

    SΩ=Yη^ΩYη^Ω=YInPΩY.E290000

    Under the hypothesis (2.2), the spaces ηi’s belong are the same and are given by ω = ℛ[X1]. Similarly, we have

    η^ω=XΘ^ω=PωY.Sω=Yη^ωYη^ω=YInPωY,E300000

    where Θ^ω=(Θ^1ω'O)and Θ^1ω=X1X11X1Y.The LR criterion is based on the following decomposition of SSP matrices;

    Sω=YInPωY=YInPΩY+YPΩPωY=Se+Sh.E310000

    The degrees of freedom in the Λ distribution Λp(fhfe) are given by

    fe=ndimΩ,fh=kj=dimΩdimω.E320000

    In general, an n × n matrix P is called idempotent if P2 = P. A symmetric and idempotent matrix is called projection matrix. Let Rn be the n dimensional Euclid space, and Ω be a subspace in Rn. Then, any n × 1 vector y can be uniquely decomposed into direct sum, i.e.,

    y=u+v,uΩ,vΩ,E3.2

    where Ω is the orthocomplement space. Using decomposition (3.2), consider a mapping

    PΩ:yu,i.e.PΩy=u.E340000

    The mapping is linear, and hence it is expressed as a matrix. In this case, u is called the orthogonal projection of y into Ω, and PΩ is also called the orthogonal projection matrix to Ω. Then, we have the following basic properties:

    (P1) PΩ is uniquely defined;

    (P2) In − PΩ is the projection matrix to Ω;

    (P3) PΩ is a symmetric idempotent matrix;

    (P4) ℛ[PΩ] = Ω, and dim[Ω] = trPΩ;

    Let ω be a subset of Ω. Then, we have the following properties:

    (P5) PΩPω = PωPΩ = Pω.

    (P6) PΩPω=PωΩ,where ω is the orthocomplement space of ω.

    (P7) Let B be a q × n matrix, and let N(B) = {y; By = 0}. If ω = N[B] ∩ Ω, then ω ∩ Ω = R[PΩB '].

    For more details, see, e.g. [3, 7, 9, 10].

    The MLEs and LRC in multivariate regression model are derived by using the following theorem.

    Theorem 3.1

    1. Consider a function of f(Σ)=log|Σ|+trΣ1Sof p × p positive definite matrix. Then, fΣtakes uniquely the minimum at Σ=S, and the minimum value is given by

    minΣ>OfΣ=fS+p.E350000

    2. Let Y be an n × p known matrix and X an n × k known matrix of rank k. Consider a function of p × p positive definite matrix Σand k × p matrix Θ=θijgiven by

    gΘ,Σ=mlog|Σ|+trΣ1YXΘYXΘ,E360000

    where m > 0, − ∞ < θij < ∞, for i = 1, …, k; j = 1, …, p. Then, gΘ,Σtakes the minimum at

    Θ=Θ^=XX1XY,Σ=Σ^=1mYInPXY,E370000

    and the minimum value is given by mlog|Σ|^+mp.

    Proof. Let 1, …, p be the characteristic roots of Σ1S.Note that the characteristic roots of Σ1Sand Σ1/2SΣ1/2are the same. The latter matrix is positive definite, and hence we may assume 1 ≥ ⋯ ≥ p > 0. Then

    fΣfS=log|ΣS1|+trΣ1Sp=log|Σ1S|+trΣ1Sp=i=1plogi+i10.E380000

    The last inequality follows from x − 1 ≥ log x (x > 0). The equality holds if and only if 1 = ⋯ = p = 1 ⇔ Σ=S.

    Next, we prove 2. we have

    trΣ1YXΘYXΘ=trΣ1YXΘ^YXΘ^+trΣ1XΘ^ΘXΘ^ΘtrΣ1YInPXY.E390000

    The first equality follows from that YXΘ=YXΘ^+XΘ^Θand YXΘ^XΘ^Θ=O.In the last step, the equality holds when Θ=Θ^.The required result is obtained by noting that Θ^does not depend on Σand combining this result with the first result 1.

    Theorem 3.2 Let X be an n × k matrix of rank k, and let Ω = ℛ[X] which is defined also by the set {y : y = X θ }, where θ is a k × 1 unknown parameter vector. Let C be a c × k matrix of rank c, and define ω by the set {y : y = Xθ , C θ = 0}. Then,

    1. PΩ = X(XX)− 1X′.

    2. PΩ − Pω = X(XX)− 1C′{C(XX)− 1C}− 1C(XX)− 1X′.

    Proof. 1 Let ŷ = X(XX)− 1X′ and consider a decomposition y = ŷ + (y − ŷ). Then, ŷ′(y − ŷ) = 0. Therefore, PΩy = ŷ and hence PΩ = X(XX)− 1X′.

    2. Since C θ = C(XX)− 1X′ ⋅ X θ, we can write ω = N[B] ∩ Ω, where B = C(XX)− 1X′. Using (P7),

    ωΩ=PΩB=XXX1C.E400000

    The final result is obtained by using 1 and (P7).

    Consider a special case C = (O Ik − q). Then ω = ℛ[X1], where X = (X1X2), X1 : n × q. We have the following results:

    ωΩ=InPX1X2,PωΩ=InPX1X2X2InPX1X21X2InPX1.E410000

    The expressions (2.11) for Seand Shin terms of Scan be obtained from projection matrices based on

    Ω=X=1n+InP1nX,ωΩ=InP1nPInP1nX1X2.E420000

    4. General linear hypothesis

    In this section, we consider to test a general linear hypothesis

    Hg:CΘD=O,E4.1

    against alternatives Kg : CΘD ≠ O under a multivariate linear model given by (2.1), where C is a c × k given matrix with rank c and D is a p × d given matrix with rank d. When C = (O Ik − j) and D = Ip, the hypothesis Hg becomes H : Θ2 = O.

    For the derivation of LR test of (4.1), we can use the following conventional approach: If U=YD,then the rows of Uare independent and normally distributed with the identical covariance matrix DΣD, and

    EU=XΞ,E4.2

    where Ξ=ΘD.The hypothesis (4.1) is expressed as

    Hg:CΞ=O.E4.3

    Applying a general theory for testing Hg in (2.1), we have the LRC λ:

    λ2/n=Λ=SeSe+Sh,E4.4

    where

    Se=U'InPXU=DYInPAYD,E470000

    and

    Sh=CXX1XUCXX1C1CXX1XU,=CXX1XYDCXX1C1CXX1XYD.E480000

    Theorem 4.1 The statistic Λ in (4.4) is an LR statistic for testing (4.1) under (2.1). Further, under Hg, Λ ∼ Λd(cn − k).

    Proof. Let G = (G1G2) be a p × p matrix such that G1 = D, G1G2=O,and |G| ≠ 0. Consider a transformation from Yto UV=YG1G2.

    Then the rows of UVare independently normal with the same covariance matrix

    Ψ=GΣG=Ψ11Ψ12Ψ21Ψ22,Ψ12:d×pd,E490000

    and

    EUV=XΘG1G2=XΞΔ,Ξ=ΘG1,Δ=ΘG2.E500000

    The conditional of Vgiven Uis normal. The rows of Vgiven Uare independently normal with the same covariance matrix Ψ112, and

    EV|U=XΔ+UXΞΓ=XΔ+UΓ,E510000

    where Δ*=ΔΞΓand Γ=Ψ111Ψ12.We see that the maximum likelihood of Vgiven Udoes not depend on the hypothesis. Therefore, an LR statistic is obtained from the marginal distribution of U,which implies the results required.

    5. Additional information tests for response variables

    We consider a multivariate regression model with an intercept term x0 and k explanatory variables x1, …, xk as follows.

    Y=1θ+XΘ+E,E5.1

    where Yand X are the observation matrices on y = (y1, …, yp)′ and x = (x1, …, xk)′. We assume that the error matrix E has the same property as in (2.1), and rank (1nX) = k + 1. Our interest is to test a hypothesis H2 ⋅ 1 on no additional information of y2 = (yq + 1, …, yp)′ in presence of y1 = (y1, …, yq)′.

    Along the partition of y into (y1′, y2′) let Y,θ, Θ,and Σpartition as

    Y=Y1Y2,Θ=Θ1Θ2,θ=θ1θ2,Σ=Σ11Σ12Σ21Σ22.E530000

    The conditional distribution of Y2given Y1is normal with mean

    EY2|Y1=1θ2+XΘ2+Y11nθ1XΘ1Σ111Σ12=1nθ˜02+XΘ˜2+Y1Σ111Σ12,E5.2

    and the conditional covariance matrix is expressed as

    VarvecY2|Y1=Σ221In,E5.3

    where Σ221=Σ22Σ21Σ111Σ12, and

    θ˜2=θ2θ1Σ111Σ12,Θ˜2=Θ2Θ1Σ111Σ12.E560000

    Here, for an n × p matrix Y=y1yp,vec (Y) means an np-vector y1yp.Now we define the hypothesis H2 ⋅ 1 as

    H21:Θ2=Θ1Σ111Σ12Θ˜2=O.E5.4

    The hypothesis H2 ⋅ 1 means that y2 after removing the effects of y1 does not depend on x. In other words, the relationship between y2 and x can be described by the relationship between y1 and x. In this sense, y2 is redundant in the relationship between y and x.

    The LR criterion for testing the hypothesis H2 ⋅ 1 against alternatives K21:Θ˜21Ocan be obtained through the following steps.

    (D1) The density function of Y=Y1Y2can be expressed as the product of the marginal density function of Y1and the conditional density function of Y2given Y1.Note that the density functions of Y1under H2 ⋅ 1 and K2 ⋅ 1 are the same.

    (D2) The spaces spanned by each column of EY2|Y1are the same, and let the spaces under K2 ⋅ 1 and H2 ⋅ 1 denote by Ω and ω, respectively. Then

    Ω=1nY1X,ω=1nY1,E580000

    and dim(Ω) = q + k + 1, dim(ω) = k + 1.

    (D3) The likelihood ratio criterion λ is expressed as

    λ2/n=Λ=SΩSω=SΩSΩ+SωSΩ.E590000

    where SΩ=Y2InPΩY2and Sω=Y2InPωY2.

    (D4) Note that EY2|Y1PωPωEY2|Y1=Ounder H2 ⋅ 1. The conditional distribution of Λ under H2 ⋅ 1 is Λp − q(kn − q − k − 1), and hence the distribution of Λ under H2 ⋅ 1 is Λp − q(kn − q − k − 1).

    Note that the Λ statistic is defined through Y2InPΩY2and Y2PΩPωY2, which involve n × n matrices. We try to write these statistics in terms of the SSP matrix of (y′, x′)′ defined by

    S=i=1nyiy¯xix¯yiy¯xix¯=SyySyxSxySxx,E600000

    where y¯and x¯are the sample mean vectors. Along the partition of y=y1y2′,we partition Sas

    S=S11S12S1xS21S22S2xSx1Sx2Sxx.E610000

    We can show that

    Sω=S221=S22S21S111S12,SΩ=S221x=S22xS21xS11x1S12x.E620000

    The first result is obtained by using

    ω=1n+InP1nY1.E630000

    The second result is obtained by using

    Ω=1n+Y˜1X˜=1n+InP0X+InPXInP0Y1,E640000

    where Y˜1=InP1nY1and X˜=InP1nX.

    Summarizing the above results, we have the following theorem.

    Theorem 5.1 In the multivariate regression model (5.1), consider to test the hypothesis H2 ⋅ 1 in (5.4) against K2 ⋅ 1. Then the LR criterion λ is given by

    λ2/n=Λ=S221xS221,E650000

    whose null distribution is Λp − q(kn − q − k − 1).

    Note that S221can be decomposed as

    S221=S221x+S2x1Sxx11Sx21.E660000

    This decomposition is obtained by expressing S221xin terms of S221, S2x1, Sxx1, and Sx21by using an inverse formula

    H11H12H21H221=H111OOO+H111H12IH2211H21H111I.E5.5

    The decomposition is expressed as

    S221S221x=S2x1Sxx11Sx21.E5.6

    The result may be also obtained by the following algebraic method. We have

    S221S221x=Y2PΩPωY2=Y2PωΩY2,E690000

    and

    Ω=1n+Y˜1X˜,ω=1n+Y˜1.E700000

    Therefore,

    ωΩ=InP11PY˜1Y˜1X˜=InP11PY˜1X˜,E710000

    which gives an expression for PωΩby using Theorem 3.1 (1). This leads to (5.6).

    6. Tests in discriminant analysis

    We consider q p-variate normal populations with common covariance matrix Σand the ith population having mean vector θi. Suppose that a sample of size ni is available from the ith population, and let yij be the jth observation from the ith population. The observation matrix for all the observations is expressed as

    Y=y11y1n1y21yq1yqnq.E6.1

    It is assumed that yij are independent, and

    yij~Nθi,Σ,j=1,,ni;i=1,,q,E6.2

    The model is expressed as

    Y=AΘ+E,E6.3

    where

    A=1n10001n20001nq,Θ=θ1θ2θq.E750000

    Here, the error matrix Ehas the same property as in (2.1).

    First, we consider to test

    H:θ1==θq=θ,E6.4

    against alternatives K : θi ≠ θj for some ij. The hypothesis can be expressed as

    H:CΘ=O,C=Iq1,1q1.E6.5

    The tests including LRC are based on three basic statistics, the within-group SSP matrix W,the between-group SSP matrix B,and the total SSP matrix Tgiven by

    W=i=1qni1Si,B=i=1qniy¯iy¯y¯iy¯,T=B+W=i=1qj=1niyijy¯yijy¯,E6.6

    where y¯iand Siare the mean vector and sample covariance matrix of the ith population, and y¯is the total mean vector defined by 1/ni=1qniy,¯iand n=i=1qni.In general, Wand Bare independently distributed as a Wishart distribution Wpnq,Σand a noncentral Wishart distribution Wpq1,Σ;Δrespectively, where

    Δ=i=1qniθiθ¯θiθ¯,E790000

    where θ¯=1/ni=1q+1niθi. Then, the following theorem is well known.

    Theorem 6.1 Let λ = Λn/2 be the LRC for testing H in (6.4). Then, Λ is expressed as

    Λ=WW+B=WT,E6.7

    where W, B, and Tare given in (6.6). Further, under H, the statistic Λ is distributed as a lambda distribution Λp(q − 1, n − q).

    Now we shall show Theorem 6.1 by an algebraic method. It is easy to see that

    Ω=A,ω=NCAA1AΩ=1n.E810000

    The last equality is also checked from that under H

    EY=A1qθ=1nθ.E820000

    We have

    T=YInP1nY=YInPAY+YPAP1nY=W+B.E830000

    Further, it is easily checked that

    1. InPA2=InPA,PAP1n2=PAP1n.

    2. PAP1nPAP1n=O.

    3. fe = dim[ℛ[A]] = tr(In − PA) = n − q,

      fh=dim1nA=trPAP1n=q1.

    Related to the test of H, we are interested in whether a subset of variables y1, …, yp is sufficient for discriminant analysis, or the set of remainder variables has no additional information or is redundant. Without loss of generality, we consider the sufficiency of a subvector y1 = (y1, …, yk)′ of y, or redundancy of the remainder vector y2 = (yk + 1, …, yp)′. Consider to test

    H21:θ1;21==θq;21=θ21,E6.8

    where

    θi=θi;1θi;2,θi;1;k×1,i=1,,q,E850000

    and

    θi;21=θi;2Σ21Σ111θi;1,i=1,,q.E860000

    The testing problem was considered by [11]. The hypothesis can be formulated in terms of Maharanobis distance and discriminant functions. For its details, see [12, 13]. To obtain a likelihood ratio for H2 ⋅ 1, we partition the observation matrix as

    Y=Y1Y2,Y1:n×k.E870000

    Then the conditional distribution of Y2given Y1is normal such that the rows of Y2are independently distributed with covariance matrix Σ221=Σ22Σ21Σ111Σ12, and the conditional mean is given by

    EY2|Y1=AΘ21+Y1Σ111Σ12,E6.9

    where Θ21˙=(θ1;21,,θq;21).The LRC for H2 ⋅ 1 can be obtained by use of the conditional distribution, and following the steps (D1)–(D4) in Section 5. In fact, the spaces spanned by each column of EY2|Y1are the same, and let the spaces under K2 ⋅ 1 and H2 ⋅ 1 denote by Ω and ω, respectively. Then

    Ω=AY1,ω=1nY1,E890000

    dim(Ω) = q + k, and dim(ω) = q + 1. The likelihood ratio criterion λ can be expressed as

    λ2/n=Λ=SΩSω=SΩSΩ+SωSΩ.E900000

    where SΩ=Y2InPΩY2and Sω=Y2InPωY2.We express the LRC in terms of W, B, and T.Let us partition W, B, and Tas

    W=W11W12W21W22,B=B11B12B21B22,T=T11T12T21T22,E6.10

    where W12:q×pq, B12:q×pq, and T12:q×pq.Noting that PΩ=PA+PInPAY1, we have

    SΩ=Y2InPAInPAY1Y1InPAY11Y1InPAY2=W22W21W111W12=W221.E920000

    Similarly, noting that Pω=P1n+PInP1nY1, we have

    Sω=Y2InP1nInP1nY1Y1InP1nY11Y1InP1nY2=T22T21T111T12=T221.E930000

    Theorem 6.2 Suppose that the observation matrix Yin (6.1) is a set of samples from Npθi,Σi = 1, …, q. Then the likelihood ratio criterion λ for the hypothesis H2 ⋅ 1 in (6.8) is given by

    λ=W221T221n/2,E940000

    where Wand Tare given by (6.6). Further, under H2 ⋅ 1,

    W221T221Λpkq1,nqk.E950000

    Proof. We consider the conditional distributions of W221and T221given Y1by using Theorem 2.3, and see also that they do not depend on Y1.We have seen that

    W221=Y2Q1Y2,Q1=InPAPInPAY1.E960000

    It is easy to see that Q12=Q1, rankQ1=trQ1=nqk, Q1A=O, Q1X1 = O, and

    EY2|Y1Q1EY2|Y1=O.E970000

    This implies that W221|Y1Wpknqk,Σ221and hence W22 ⋅ 1 ∼ Wp − k(n − q − kΣ22 ⋅ 1). For T221, we have

    T221=Y2Q2Y2,Q2=InP1nPInP1nY1,E980000

    and hence

    T221W221=Y2Q2Q1Y2.E990000

    Similarly, Q2is idempotent. Using P1nPA=PAP1n=P1n, we have Q1Q2=Q2Q1=Q1, and hence

    Q2Q12=Q2Q1,Q1Q2Q1=Q.E1000000

    Further, under H2 ⋅ 1,

    EX2|X1Q2Q1EX2|X1=O.E1010000

    7. General multivariate linear model

    In this section, we consider a general multivariate linear model as follows. Let Ybe an n × p observation matrix whose rows are independently distributed as p-variate normal distribution with a common covariance matrix Σ.Suppose that the mean of Yis given as

    EY=AΘX,E7.1

    where A is an n × k given matrix with rank k, X is a p × q matrix with rank q, and Θis a k × q unknown parameter matrix. For a motivation of (7.1), consider the case when a single variable y is measured at p time points t1, …, tp (or different conditions) on n subjects chosen at random from a group. Suppose that we denote the variable y at time point tj by yj. Let the observations yi1, …, yip of the ith subject be denoted by

    yi=yi1yip,i=1,,n.E1030000

    If we consider a polynomial regression of degree q − 1 of y on the time variable t, then

    Eyi=Xθ,E1040000

    where

    X=1t1t1q11tptpq1,θ=θ1θ2θq.E1050000

    If there are k different groups and each group has a polynomial regression of degree q − 1 of y, we have a model given by (7.1). From such motivation, the model (7.1) is also called a growth curve model. For its detail, see [14].

    Now, let us consider to derive LRC for a general linear hypothesis

    Hg:CΘD=O,E7.2

    against alternatives Kg:CΘDO.Here, C is a c × k given matrix with rank c, and D is a q × d given matrix with rank d. This problem was discussed by [1517]. Here, we obtain LRC by reducing it to the problem of obtaining LRC for a general linear hypothesis in a multivariate linear model. In order to relate the model (7.1) to a multivariate linear model, consider the transformation from Yto UV:

    UV=YG,G=G1G2,E7.3

    where G1 = X(XX)− 1, G2=X˜, and X˜are a p × (p − q) matrix satisfying X˜X=Oand X˜X˜=Ipq.Then, the rows of UVare independently distributed as p-variate normal distributions with means

    EUV=AΘO,E1080000

    and the common covariance matrix

    Ψ=GΣG=G1ΣG1G1ΣG2G2ΣG1G2ΣG2=Ψ11Ψ12Ψ21Ψ22.E1090000

    This transformation can be regarded as one from y = (y1, …, yp)′ to a q-variate main variable u = (u1, …, uq)′ and a (p − q)-variate auxiliary variable v = (v1, …, vp − q)′. The model (7.1) is equivalent to the following joint model of two components:

    1. The conditional distribution of Ugiven Vis

      U|VNn×qA*Ξ,Ψ112.E7.4

  • The marginal distribution of Vis

    VNn×pqO,Ψ22,E7.5

  • where

    A*=AV,Ξ=ΘΓ,Γ=Ψ221Ψ21,Ψ112=Ψ11Ψ12Ψ221Ψ21.E1120000

    Before we obtain LRC, first we consider the MLEs in (7.1). Applying a general theory of multivariate linear model to (7.4) and (7.5), the MLEs of Ξ, Ψ112, and Ψ22are given by

    Ξ^=A*A*1A*U,nΨ112=UInPA*U,nΨ22=VV.E7.6

    Let

    S=YInPAY,W=GSG=UVInPAUV,E1140000

    and partition Was

    W=W11W12W21W22,W12:q×pq.E1150000

    Theorem 7.1 For an n × p observation matrix Y, assume a general multivariate linear model given by (7.1). Then:

    1. The MLE Θ^of Θis given by

    Θ^=AAA1AYS1XXS1X1.E1160000

    2. The MLE Ψ^112of Ψ112is given by

    nΨ112=W112=XS1X1.E1170000

    Proof. The MLE of Ξis Ξ^=A*A*1A*U.The inverse formula (see (5.5)) gives

    Q=A*A*1=AA1OOO+AA1AVIpqVInPAV1AA1AVIpq=Q11Q12Q21Q22.E1180000

    Therefore, we have

    Θ^=Q11A+Q12VU=AA1AYG1AA1AYG2G2SG21G2SG1.E1190000

    Using

    G2G2SG21G2=S1G1G1S1G21G1S1,E1200000

    we obtain 1. For a derivation of 2, let B=InPAV.Then, using PA*=PA+PB, the first expression of (1) is obtained. Similarly, the second expression of (2) is obtained.

    Theorem 7.2 Let λ = Λn/2 be the LRC for testing the hypothesis (7.2) in the generalized multivariate linear model (7.1). Then,

    Λ=|Se|/|Se+Sh|,E1210000

    where

    Se=DX'S1X1D,Sh=CΘ^DCRC1CΘ^DE1220000

    and

    R=AA1+AA1AYS1SXXS1X1X×S1YAAA1.E1230000

    Here Θ^is given in Theorem 7.1.1. Further, the null distribution is Λd(cn − k − (p − q)).

    Proof. The test of Hg in (7.2) against alternatives Kg is equivalent to testing

    Hg:C*ΞD=OE7.7

    under the conditional model (7.4), where C* = (C O). Since the distribution of Vdoes not depend on Hg, the LR test under the conditional model is the LR test under the unconditional model. Using a general result for a general linear hypothesis given in Theorem 4.1, we obtain

    Λ=|S˜e|/|S˜e+S˜h|,E1250000

    where

    S˜e=DUInA*A*A*1A*UD,S˜h=CΞ^DC*A*A*1C*1CΞ^D.E1260000

    By reduction similar to those of MLEs, it is seen that S˜e=Seand S˜h=Sh.This completes the proof.

    8. Concluding remarks

    In this chapter, we discuss LRC in multivariate linear model, focusing on the role of projection matrices. Testing problems considered involve the hypotheses on selection of variables or no additional information of a set of variables, in addition to a typical linear hypothesis. It may be noted that various LRCs and their distributions are obtained by algebraic methods.

    We have not discussed with LRCs for the hypothesis of selection of variables in canonical correlation analysis, and for dimensionality in multivariate linear model. Some results for these problems can be found in [3, 18].

    In multivariate analysis, there are some other test criteria such as Lawley-Hotelling trace criterion and Bartlett-Nanda-Pillai trace criterion. For the testing problems treated in this chapter, it is possible to propose such criteria as in [12].

    The LRCs for tests of no additional information of a set of variables will be useful in selection of variables. For example, it is possible to propose model selection criteria such as AIC (see [19]).

    Acknowledgments

    The author wishes to thank Dr. Tetsuro Sakurai for his variable comments for the first draft. The author’s research is partially supported by the Ministry of Education, Science, Sports, and Culture, a Grant-in-Aid for Scientific Research (C), no. 25330038, 2013–2015.

    © 2016 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Yasunori Fujikoshi (July 6th 2016). Likelihood Ratio Tests in Multivariate Linear Model, Applied Linear Algebra in Action, Vasilios N. Katsikis, IntechOpen, DOI: 10.5772/62277. Available from:

    chapter statistics

    969total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    Related Content

    This Book

    Next chapter

    Matrices, Moments and Quadrature: Applications to Time- Dependent Partial Differential Equations

    By James V. Lambers, Alexandru Cibotarica and Elisabeth M. Palchak

    Related Book

    First chapter

    Simulation of Piecewise Hybrid Dynamical Systems in Matlab

    By Fatima El Guezar and Hassane Bouzahir

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More about us