Comparison of Methodologies for Analysis of Longitudinal Data Using MATLAB

this context this work proposes to develop a single and easy computational implementation to solve a great number of practical problems of analysis of longitudinal A well-known statement says that the PID controller is the â€œbread and butterâ€ ​ of the control engineer. This is indeed true, from a scientific standpoint. However, nowadays, in the era of computer science, when the paper and pencil have been replaced by the keyboard and the display of computers, one may equally say that MATLAB is the â€œbreadâ€ ​ in the above statement. MATLAB has became a de facto tool for the modern system engineer. This book is written for both engineering students, as well as for practicing engineers. The wide range of applications in which MATLAB is the working framework, shows that it is a powerful, comprehensive and easy-to-use environment for performing technical computations. The book includes various excellent applications in which MATLAB is employed: from pure algebraic computations to data acquisition in real-life experiments, from control strategies to image processing algorithms, from graphical user interface design for educational purposes to Simulink embedded systems.


Introduction
In several areas of scientific knowledge there is a need for studying the behavior of one or more variables using data generated by repeated measurements of the same unit of observations along time or spatial region. Due to this, many experiments are constructed in which various treatments are applied on the same plot at different times, or only one treatment is applied to an experimental unit and it is made a measurement of a characteristic or a set of features in more than one occasion [Khattree & Naik, 2000]. Castro and Riboldi [Castro & Riboldi, 2005] define data collected under these kinds of experimental setups as repeated measures. More specifically, he asserts that "repeated measures is understood as the data generated by repeatedly observing a number of investigation units under different conditions of evaluation, assuming that the units of investigation are a random sample of a population of interest". In order to analyze repeated measures data it is necessary to take a care about not independency between observations. This is so because it is expected a high degree of correlation between data collected on the same observation unit over time, and there is usually more variability in the measurements between the subjects than within a given subject. A very common type of repeated measures is longitudinal data, i.e., repeated measures where the observations within units of investigation were not or can not have been randomly assigned to different conditions of evaluation, usually time or position in space. There are basically two paths to be taken in the analysis of longitudinal data; univariate analysis, which requires as a precondition a rigid structure of covariances, or multivariate analysis, which, despite being more flexible, is less efficient in detecting significant differences than the univariate methodology. In Advances in Longitudinal Data Analysis [Fitzmaurice et al., 2009], Fitzmaurice comments that despite the advances made in statistical methodology in the last 30 years there has been a lag between recent developments and their widespread application to substantive problems, and adds that part of the problem why the advances have been somewhat slow to move into the mainstream is due to their limited implementation in widely available standard computer software. In this context this work proposes to develop a single and easy computational implementation to solve a great number of practical problems of analysis of longitudinal www.intechopen.com data, through the decomposition of the sum of squares error of the polynomial models of regression. In light of the above, not independents the computational support MatLab looks likes an ideal tool for the implementation and dissemination of this kind of statistical analysis methods, and linear models, first because its matrix structure fits perfectly well for linear models which facilitates the construction of models for univariate and multivariate analysis, and second because being a large diffusion tool of, it allows for that the models to be implemented, modified and reused in several uses in different situations by several users who have access to a MatLab community on the internet. This avoids the need for the acquisition of expensive software with black box structure.

Review
As far as the analysis of experiments using longitudinal data is concerned the methods traditionally used are: univariate analyis or Univariate Profile Model whereby longitudinal data is considered as if it were observations done in subdivisions of the slots, usually requiring that the variance of the response be constant in the occasions of evaluation and that the covariance between responses in different occasions be equal; multivariate analysis or Multivariate Profile Model whereby it is admitted that these variances and covariances be distinct. Despite its apparent versatility, as far as the dimension of the matrix of variances and covariances, the multivariate model becomes less attractive, because its results are hard to interpret, and its estimates are not consistent. The univariate profile model gives consistent estimates and should be used every time when its presuppositions are met. Otherwise, the multivariate profile model is a viable alternative [Castro & Riboldi, 2005;Johnson & Wichern, 1998]. Using the univariate analysis in split-plot designs, regarding time as a sub-plot may cause problems because, as it is known, this design presupposes that the covariance matrix meets the condition of sphericity which does not always happen. What is found in the literature is that repeated measures in one same experimental unit along time are in general correlated, and that these correlations are greater for closer times [Malheiros, 1999]. Xavier [Xavier, 2000] asserts that a sufficient condition for the F test of the analysis of variance of the sub-plots for the time factor and the interaction time*treatments, be valid, is that the covariance matrix has a so called composite symmetry shape. The composite symmetry occurs when the variance and covariance matrix may be expressed as: where: 2 σ : is the variance of the sub-plot (within-subjects); 2 1 σ : is the variance of the plot (among-subjects).
The composite symmetry condition implies that the random variable be equally correlated and has equal variances considering the different occasions. A more general condition of the www.intechopen.com ∑ is described by Huynh and Feldt [Huynh & Feldt, 1970]. This condition, called HUYNH-FELDT (H-F) or sphericity condition (circularity), specifies that the elements of the ∑ matrix be expressed for one λ 0, > as: where λ is the difference between the means of the variances and the means of the covariances.
The H-F condition is necessary and sufficient for the F test in the usual analysis of variance in split-plot in time to be valid. This condition is equivalent to specifying that the variances of the difference between pairs of errors are equal, and if the variances are all equal then the condition is equivalent to compound symmetry [Xavier, 2000].
To check the condition of circularity Mauchly [Mauchly, 1940] presents the test of sphericity. This test uses H-F condition for the covariance matrix of (t-1) normalized orthogonal contrasts for repeated measures not correlated with equal variances. Vonesh and Chinchilli [Vonesh & Chinchilli, 1997] state that the sphericity test is not very powerful for small samples and is not robust when there is violation of the normality assumption. According to Box;Greenhouse & Geisser;and Huynh & Feldt [Box, 1954;Greenhouse & Geisser, 1959;Huynh & Feldt, 1976], although the matrix ∑ may not satisfy the condition of sphericity, the central F distribution may be used, in an approximate form, if a correction in the degrees of freedom associated with the causes of variation involving the time factor is made. The degrees of freedom correction in these sources of variation is done by multiplying the original degrees by a factor ε . When ∑ is uniform, the value of ε =1.
According to Freitas [Freitas, 2007] the correction of the number of degrees of freedom should be made only in statistics that involve comparisons within subjects (time factor and interaction time*treatments). The statistics involving comparisons between subjects do not need corrections in the degrees of freedom because there is always an exact central F distribution.
When the pattern of the ∑ matrix is not satisfied, not even close, the multivariate techniques are used since this type of solution is applicable to any ∑ matrix. The only requirement of the multivariate procedure is that the ∑ matrix should be common to all treatments. Due to the essentially multivariate nature of the response vectors, in studies involving longitudinal data, the multivariate analysis technique also known as multivariate profile analysis is a natural alternative to the problem at hand [Wald, 2000]. The multivariate profile analysis is well discussed in the literature by authors such as [Lima, 1996;Morrison, 1990;Singer, 1986].

www.intechopen.com
The multivariate profile analysis is one of the statistics technique used to analyze observations derived from experiments that use longitudinal data. This technique bases itself both in the number of experimental units and the sample size [Castro, 1997]. Unlike the univariate profile analysis model, the multivariate profile analysis model does not require that the variance of the repeated measures or that the correlation between pairs of repeated measures remain constant along time. Nevertheless, both models require that the variances and the correlations be homogeneous in each moment in time [Vieira, 2006]. The routine techniques for analysis of variance impose the condition of independence of observations. However, this restriction generally does not apply to longitudinal data where the observations in the same individual are usually correlated. In such case, the adequate manner for treating the observations would be the multivariate form [Vonesh & Chinchilli, 1997].
Cole & Grizzle [Cole & Grizzle, 1966] use the multivariate analysis of variance according to the Smith et al. [Smith et al., 1962] formulation and comment on its versatility in the construction of specific hypothesis testing that may be obtained as particular cases of the general linear multivariate hypothesis test procedure. They assert that such hypothesis may be tested by three alternative criterions, all of which dependent on characteristic roots of matrix functions due to the hypothesis and of the matrix due to the error: criterion of the maximum characteristic root, criterion of the product of the roots (criterion of the verosimilarity ratio) and criterion of the sum of the roots. The authors illustrate the application of the multivariate analysis of variance and demonstrate that the information requested from these experiments may be formulated in terms of the following null hypotheses: i. there are no principal effects of "measured conditions" (occasions); ii. there are no effects of treatments; iii. there is no interaction of treatment and occasions. The multivariate analysis of variance is a powerful instrument to analyze longitudinal data but if the uniformity hypothesis of the variance and covariance matrix is not rejected the univariate analysis should be employed. Nonetheless, if the variance and covariance matrix of repeated measures has the serial correlation structure one should use an analysis method that takes into account the structure of this matrix in order that one might have an increment in the testing power. In this way the multivariate analysis of variance becomes the most convenient one if not the only appropriate one among the available procedures [Cole & Grizzle, 1966;Smith et al., 1962]. Lima [Lima, 1996] asserts that the multivariate profile analysis possesses as its main advantage the fact that is allows for the adoption of a very general model to represent the structure of covariances admitting that the variances of responses in each time and the covariances of responses between distinct times be different. In studying longitudinal data investigation methods, Greenhouse & Geisser [Greenhouse & Geisser, 1959] observed that the ratios between the mean squares obtained in the analysis of variance for the mixed univariate model will only have exact distribution of probability F if the observations in time be normally distributed with equal variances and be mutually independent or equally correlated. Because these presuppositions are strict, the authors prefer considering the observations in time as a vector of samples of a normal multivariate distribution with an arbitrary variance and covariance matrix. Being so, the multivariate perspective presented by Morrison [Morrison, 1990] allows for the adoption of a general model to represent the covariance structure of the observations. In this case, the covariance www.intechopen.com matrix is known as being non structured where all variances and covariances might be different and, as pointed out by Andreoni [Andreoni, 1989], it is only applicable when: -there be no theoretical or empirical basis to establish any pattern for this matrix; -there be no need to extrapolate the model beyond the occasions of the considered observations. The quantity of parameters associated with the non structured matrix that need to be estimated is proportional to the number of conditions of evaluation. In situations where the number is large, when the number of experimental units is small in relation to the number of evaluation events or when there is the presence of many incomplete observations the efficiency of the estimators might be affected. In some cases it may be impossible to estimate the parameters of this covariance matrix [Wald, 2000]. Meredith & Stehman [Meredith & Stehman, 1991] state that the disadvantage of the multivariate analysis is the lack of power to estimate the parameters of the covariance matrix in case when t (number of measurement events or times) is large and n is small. Stuker [Stuker, 1986] comments on the restriction of the multivariate analysis of covariance in which the number of experimental units minus the number of treatments should be greater than the number of observations taken in each experimental unit otherwise the required matrix due to error for these tests is singular. Timm [Timm, 1980] claims that the restrictions to the application of the multivariate profile analysis occur due to the need for complete individual response profiles and to the low power of these hypothesis tests due to excessive parametering. On the other hand, except for these restrictions, the majority of the cases in longitudinal data studies, the analysis procedure of multivariate analysis of variance is the most convenient if not the only appropriate one among the available techniques.

Data
In order to conduct the study it was created a data matrix with the following structure: The following Matlab commands upload and dimension the file in addition to determining the index of the column of each variable. M=load('-ascii', 'file.txt'); [n,c]=size(M); a=input('column of the independent variable X ='); b=input('column of the dependent variable Y ='); aa=input('initial column of the control variable curve ='); nc=input('number of curves to be compared ='); npc=input('number of points per curve =');

Data analysis
Once the data base is correctly structured the first step is to adjust the best polynomial model that explains the variation of Y in function of the X periods. Towards this, the parameters of the polynomial of adjustment will be estimated by the matrix expression below.
that have the Snedecor F distribution with (g-1) and (n-g) degrees of freedom.
And to measure the degree of explanation of the variability of Y according to the polynomial model it is used the coefficient of determination.
And to measure the degree of explanation of the Y variability in function of the polynomial model it is employed the determination coefficient.
After the adjustment of the polynomial model for the data set, the next step is to adjust the same model for each of the k treatments separately, so The test for comparing the curves is based on the decomposition of SQε in one part explained by the variation between the curves and the other by the variation within the curves.
−− is the variation explained by the treatments, and is the variation within each treatment.
The following commands calculate the regression parameters for the individual curves.

Results
The following parameters must be furnished when running the program: independent variable column X =1 dependent variable column Y =2 initial column of the control variable curve =3 number of curves to be compared =2 number of points per curve =7 www.intechopen.com The following graph is generated in order to choose the degree of the polynomial to be adjusted.

Conclusion
Given its matrix structure, Matlab presented itself as an efficient tool for linear models. The programs and the methodology presented were efficient to the comparing of polynomial growth curves. The modular sequence in which the programs were developed allows the user to implement new routines as well as new methodology proposals for the solution of the proposed problem. The solutions presented for the problem of comparison of polynomial growth curves may be used in part or in conjunction for the solution of other linear models problems.