Open access peer-reviewed chapter

# Penalized Spline Joint Models for Longitudinal and Time-To-Event Data

By Huong Thi Thu Pham and Hoa Pham

Submitted: December 1st 2017Reviewed: February 26th 2018Published: June 6th 2018

DOI: 10.5772/intechopen.75975

## Abstract

The joint models for longitudinal data and time-to-event data have recently received numerous attention in clinical and epidemiologic studies. Our interest is in modeling the relationship between event time outcomes and internal time-dependent covariates. In practice, the longitudinal responses often show non-linear and fluctuated curves. Therefore, the main aim of this chapter is to use penalized splines with a truncated polynomial basis to parameterize the non-linear longitudinal process. Then, the linear mixed effects model is applied to subject-specific curves and to control the smoothing. The association between the dropout process and longitudinal outcomes is modeled through a proportional hazard model. Two types of baseline risk functions are considered, namely a Gompertz distribution and a piecewise constant model. The resulting models are referred to as penalized spline joint models; an extension of the standard linear joint models.

### Keywords

• survival data
• longitudinal data
• joint models
• time-dependent covariates
• random effects

## 1. Introduction

The joint models for longitudinal data and time-to-event data are aimed to measure the association between the longitudinal marker level and the hazard rate for an event. The longitudinal data are collected repeatedly for several subjects. In this data, there are two types of covariates, namely, time-independent covariates and time-dependent covariates. Furthermore, there are also two different categories of time-dependent covariates, namely, external and internal covariates. In clinical studies, internal time-dependent longitudinal outcomes are often applied to monitor disease progression and failure time.

In modern survival analysis, Cox  has been considered as a very popular joint model to be used for time-independent covariates. These models measured the effect of time-independent covariates on the hazard rate for an event. Subsequently, the extended Cox model was developed for external time-dependent covariates. However, these latter models cannot handle longitudinal biomarkers. Therefore, Rizopoulos  introduced joint models for internal time-dependent covariates and the risk for an event based on linear mixed-effects models and relative risk models.

The basic assumption for the standard joint models proposed by Rizopoulos  is that the hazard rate at a given time of the dropout process is associated with the expected value of the longitudinal responses at the same time. The whole history of response has an influence on the survival function. Thus, it is crucial to obtain good estimates for the subject-specific trajectories in order to have an accurate estimation of the survival function. In addition, an important feature that we need to account for is that many observations in the sample often show non-linear and fluctuated longitudinal trajectories in time. Each observation has its own trajectory. Therefore, flexibility is needed for subject-specific longitudinal submodels in the joint models to improve the predictions.

There are several previous works to flexibly model the subject-specific longitudinal profiles in the joint models. Brown et al.  applied B-splines with multidimensional random effects. In particular, Brown et al.  assumed that both subject and population trajectories have the same number of basis functions. By doing this, the number of parameters in the longitudinal submodel is reasonably large. If we have to deal with the roughness of the fit for this model, the computational problems will increase especially when the dimension of the random effects vector is large. Ding and Wang  proposed the use of B-splines with a single multiplicative random effect to link the population mean function with the subject-specific profile. This simple model can gain an easy estimation for parameters, however may not be appropriate for many practical applications . Rizopoulos  considered more flexible models using natural cubic splines with the expansion of the random effects vector. The roughness of the fit is still not mentioned in these models.

In this chapter, we present new approaches to model non-linear shapes of subjects-specific evolutions for joint models by extending the standard joint models of Rizopoulos . In particular, we implement penalized splines using a truncated polynomial basis for the longitudinal submodel. Following this, the linear mixed-effects approach is applied to model the individual trajectories and impose smoothness over adjacent coefficients respectively. The ECM algorithm is used for parameter estimation. In addition, corresponding standard errors are calculated using the observed information matrix. However, as the matrices of random effects covariates in our models are different from the matrices of random effects covariates in the standard joint models, the JM package of Rizopoulos  cannot be used for our models. Therefore, a set of R codes are written for the penalized spline joint models to implement the proposed procedures on the simulated data and a case study respectively.

The chapter is organized as follows. Section 2 describes the penalized splines with truncated polynomial basis for the joint models. In this section, the two models are specified as penalized spline joint model with hazard rate at base line having Gompertz distribution (referred to as Model 1) and penalized spline joint model with a piecewise-constant baseline risk function (referred to as Model 2). The joint likelihood, score functions and the ECM algorithm for estimation are presented in Section 3. We then validate the proposed algorithm using extensive simulation studies and then apply it for AIDS data in Section 4. Finally, Section 5 gives concluding remarks.

## 2. The penalized spline joint models

In this section, we introduce the joint models using penalized spline with truncated polynomial basis. The proposed parametrization is based on the standard joint models of Rizopoulos  and the regression model of a longitudinal response using penalized spline.

Notations in this section are taken from Rizopoulos . Let Tibe the true survival time and Cibe the censoring time for the ithsubject i=1n. Tidenotes the observed failure time for the ithsubject i=1n, which is defined as Ti=minTiCi. If an ithsubject is not censored, this means that we have observed its survival time, we will have TiCi. If an ithsubject is censored, this means that we lose its follow up, or the subject has died from other causes, we will have Ti>Ci. Furthermore, we define the event indicator as δi=ITiCi. The observed data for survival outcome are Tiδi,i=1,,n.

For a longitudinal response, suppose that we have nsubjects in the sample and the actual observed longitudinal data for each subject-iat time point tis yit. We measure the ithsubject at nitime points. Thus, the longitudinal data consists of the measurements yij=yitijj=1nitaken at time points tij.We denote the true and unobserved value of the longitudinal outcome at time tas mit. We assume the relation between yitand mitas

yit=mit+εit,E1

where εitN0σε2.

When survival function Stis assumed to have a specific parametric form associating with a longitudinal submodel, estimations for parameters of interest are usually based on the likelihood function . In the maximum likelihood method, there are different treatments for different types of covariates in the longitudinal submodel. Here, we present the two different categories of time-dependent covariates and the estimation techniques for these covariates will be introduced in the following sections. We let the time-dependent covariate for the ithsubject at time tbe denoted by yit. We let Yit=yis0s<tdenote the covariate history of the ithsubject up to time t. According to Kalbfleisch and Prentice , the exogenous covariates are the covariates satisfying the condition:

PrsTi<s+dsTisYis=PrsTi<s+dsTisYit,E2

for all s,tsuch that 0<stand ds0. An equivalent definition is

PrYitYisTis=PrYitYisTi=s,st.E3

On the other hand, endogenous time-varying covariates are the ones that do not satisfy the condition in (2.2). In particular,

PrYitYisTisPrYitYisTi=s,st.

In the penalized spline regression models [8, 9], the observed longitudinal covariate is modeled using the truncated power functions with a general power basis of degree p. Moreover, the longitudinal response is also parameterized as a linear mixed-effects model to specify the individual curves and impose the amount of smoothing. As a result, the coefficients of the knots will be constrained to handle smoothing. In particular, the longitudinal submodel for the ithsubject at time point tijis

yij=ftij+gitij+εtij,εitijN0σε2,ftij=β0+β1tij++βptijp+k=1KupktijKk+p,gitij=vi0+vi1tij+vi2tij2++viptijp+k=1KwipktijKk+p.E4

Here, the set 1tijtijptijK1+ptijKK+pis known as the truncated power basis of degree p. Moreover, K1,,KKare fitted Kknots, for which Kis chosen following Ruppert et al. , Chapter 5), Appendix D. The function f.is the smooth function which reflects the overall trend of the population. The function gi.is the smooth function which reflects the individual curves. To constrain the coefficient of knots, the vector up1upKTin the function f.is treated as random effects. Therefore, βT=β0βpis a p+1×1row vector of fixed effects and biT=up1upKvi0vipwip1wipKis a p+2K+1×1vector of random effects for the ithsubject. The assumptions for the random effects for the ithsubject are vi0vipTN0, upkN0σu2, wipkN0σw2and they are independent of one another. We can now rewrite (2.4) as

yitij=ftij+gitij+εitij=β0+β1tij+β2tij2++βptijp+k=1K(upk+wipk)tijKk+p+vi0+vi1tij+vi2tij2++viptijp+εitij.E5

We let uipk=upk+wipkand note that uipkN0σu2+σw2. In order to allow greater flexibility, we assume that uip1uipKTN0D, where D=DiagD11DKK. By doing this, the dimension of the vector of random effects, biT=vi0vipuip1uipK, decreases to p+K+1×1. Consequently, the dimension of the multi-integrals in the log-likelihood function in (3.2) will also decrease. This presentation is crucial for reducing the computational problems while coding. The model in (2.5) now becomes:

yitij=ftij+gitij+εitij=β0+β1tij+β2tij2++βptijp+k=1KuipktijKk+p+vi0+vi1tij+vi2tij2++viptijp+εitij.E6

The model in (2.6) can be rewritten in matrix notation as:

y=Xβ+Zb+ε,E7

where

X=X1Xn,Z=X1000X2000XnZ1000Z2000Zn,
Xi=1ti1ti12ti1p1tinitini2tinip,Zi=ti1K1+pti1KK+ptiniK1+ptiniKK+p,
bT=v10v1pvn0vnpu1p1u1pKunp1unpK,βT=β0βp.

Here, yis the i=1nni×1matrix of observed longitudinal data; Xis the i=1nni×p+1matrix of fixed effect covariates; Zis the i=1nni×p+K+1nmatrix of random effect covariates and εis the i=1nni×1matrix of error.

Postulating a proportion hazard model, the penalized spline joint models for longitudinal and time-to-event data is defined by

hitMitwi=limdt0PrtTi<t+dtTitMitwi/dt=h0texpγTwi+αmit,E8

where h0tis the hazard at baseline and wiis a vector of baseline covariates (such as treatment indicator, gender of a patient, etc.). Furthermore, Mit=mis0s<tdenotes the history of the true unobserved longitudinal process up to time point t.

Using (2.7), the longitudinal submodel for the ithsubject is given by

mit=mit+εit,εitN0σε2yit=XiTtβ+XiTtvi+ZiTtui+εitviN0,uiN0D,E9

where the covariance matrix of random effects biT=vi0vipuip1uipKis given as

G=Covbi=00D.

To complete the specification of the model in (2.8), we now need to define the form for the baseline risk function h0.. Motivated by the fact that in real life, h0.is usually unknown. Therefore, two options are adopted to determine the form of the function h0.in this chapter. First, a standard option is to use a known parametric distribution for the risk function. For this option, the Gompertz distribution is chosen. Second, the piecewise constant model is chosen when h0.is considered completely unspecified.

Therefore, the proposed penalized spline joint models considered in this chapter are as follows:

Model 1: Penalized spline joint model with hazard rate at base line having Gompertz distribution

hitMitwi=λ1expλ2texpγTwi+αmitmit=XiTtβ+XiTtvi+ZiTtui.E10

Model 2: Penalized spline joint model with a piecewise-constant baseline risk function

hi(t|Mit,wi)=q=1QξqIνq1<tνqexpγTwi+αmitmit=XiTtβ+XiTtvi+ZiTtui,E11

where 0=ν0<ν1<<νQdenotes a split of the time scale, with νQbeing larger than the largest observed time and ξqdenotes the value of the baseline hazard in the interval νq1νq. In both models, Xi, Zi, β, viand uiare given in (2.7).

## 3. Parameter estimation

After defining the two penalized spline joint models, we now present the joint likelihood and score functions of the parameters in the models. The ECM algorithm is also presented in this section.

### 3.1. Likelihood and score functions

Following Rizopoulos , we assume that the vector of time-independent random effects biunderlies both the longitudinal and survival processes. This means that

pTiδiyibiθ=pTiδibiθpyibiθpyibiθ=jpyitijbiθ,E12

where θ=θtTθyTθbTTdenotes the full parameter vector with θt=γTαθh0TTdenoting the parameter vector for the survival outcomes. Furthermore, θy=βTσε2Tis the parameter vector for longitudinal outcomes and θb=vechGis the vector-half of the variance matrix of random effects. In addition, we assume that the hazard rate at time tconditional on the covariate path depends on the current value of longitudinal outcomes and the censoring mechanism is independent of the true event times and future longitudinal measurements. Under these assumptions, the log-likelihood formulation of the penalized spline joint models can be written as

lθ=lθTiδiyi=ilogbipTiδibiθtβpyibiθypbiθbdbi,E13

where the conditional density for survival part has the form of

pTiδibiθtβ=hTiMiTiwiθtβδiSTiMiTiwiθtβ=h0texpγTwi+αmitδiexp0Tih0sexpγTwi+αmisds.E14

Here, Stis the survival function at time t.

Moreover, the density for the longitudinal part with the random effects is given by

pyibiθypbiθb=jpyitijbiθypbiθb=12πσε2ni2expyitijXiTtijβXiTtijviZiTtijui22σε2×2πqb2detG1/2expbiTG1bi/2,E15

where qbdenotes the dimensionality of the random effects vector.

We consider the log likelihood of the (Ti,δi,yi,bi) over the unknown θt, βand bi

1loglθTiδiyibi=logpTiδibiθtβ+logpyibiβ+logpbiG.

The function for maximizing the log likelihood involves the density function of survival time and least squares with a penalty term, which is

logpTiδibiθtβyiXiβXiviZiuiTyiXiβXiviZiuiσε2biTG1bi.E16

According to Ruppert et al. , the term σε2biTG1biis called a roughness penalty and the variance components matrix defined as F=σε2G1. Using a Lagrange multiplier argument, the variance components matrix is the condition to constrain the coefficients of the knots ui. These will restrict the influence of the variables tKk+pand will lead to smoother spline functions.

Using (3.2), the score vector for the penalized spline joint models can be expressed as:

Sθ=iθTlogpTiδibiθtβpyibiθypbiθbdbi=iθTlogpTiδibiθtβp(yibiθy)p(biθb).pbiTiδiyiθdbi.E17

The requirement for numerical integration with respect to the random effects is one of the main difficulties in the joint models . The maximum likelihood estimation (MLE) is typically obtained using standard maximization algorithms such as expectation maximization algorithm or Newton-Raphson algorithm.

### 3.2. The ECM algorithm

The EM algorithm has been widely used in the joint models, such as for the standard joint model of Rizopoulos  and for the generalized linear mixed joint model . The ECM algorithm is a natural extension of EM algorithm for which the maximization process on the M-step is conditional on some functions of the parameters under estimation. It also can reduce computer time. The ECM algorithm will be used to obtain the maximum likelihood estimates of the penalized spline joint models following McLachlan and Krishnan  in this chapter.

In these models, the random effects biare considered as missing data. Hence, it is difficult to estimate directly the parameter vector θthat maximizes the observed data log likelihood lθin (3.2). Alternatively, we can estimate the parameter vector θthat maximizes the expected value of the complete data log-likelihood which is ElogpTiδiyibiθTiδiyiθit, where θitis the parameter vector given at the ithiteration.

The following are the steps of this algorithm.

Step 1: Initialization

We first initialize the parameters. We assume that there are mparameters in the models and the starting value of the parameter vector is θ0=θ10θm0. Based on these initial values, we calculate the log-likelihood using (3.2).

Step 2: The E-step for the penalized joint models

We fill in the missing data and replace the log-likelihood function of the observed data with the expected function of the complete data log-likelihood as follows:

Qθθit=ilogpTiδiyibiθ.pbiTiδiyiθitdbi=ilogpTiδibiθ+logp(yibiθ)+logp(biθ).pbiTiδiyiθitdbi.E18

Step 3: The conditional M-step for the penalized joint models.

This step will be implemented in four stages as follows:

3.1 Given the current value of the parameter vector at the ithiteration θit=θ1itθ2itθmit, we calculate the log likelihood at lθit=ilogbipTiδiyibiθitdbi.

3.2 Propose the new value for the first parameter θ1propwhich maximizes Qθθit. Then, we calculate the log likelihood at lθpropwhere θprop=θ1propθ2itθmit.

3.3 Set θ1it=θpropif lθproplθit, otherwise set θ1it=θit.

3.4 Similarly, based on the value of the parameter vector θ1it, we update the new value for the second parameter and continue updating for the last parameter, θmitand set θit+1=θmit.

Step 4: Iterate among steps 2–3 until the algorithm numerically converges.

To update the new values for parameters in the conditional M-step, we have the closed-form estimates for the measurement of error variance σ2and the covariance matrix of the random effects respectively by maximizing the expected function Qθθit. Unfortunately, we cannot obtain closed-form expressions for the remaining of the parameters. We thus employ the one-step Newton-Raphson approach to get the updates for βit+1, γit+1, αit+1and θh0it+1respectively as detailed in Appendix B.

Following Louis , standard errors for the parameter estimates can be calculated by using the estimated observed information matrix

v̂arθ̂=Iθ̂1,

where

Iθ̂=i=1nSiθθθ=θ̂.

## 4. Empirical results

This section presents two simulation studies for Model 1, whereas Model 2 will be applied for a case study only. In Section 4.1, we simulate data from Model 1 with three internal knots in the longitudinal submodel and Gompertz distribution for the baseline risk function. In Section 4.2, we simulate data from Model 1 having Gompertz distribution for the baseline risk function and non-linear logarithm subject-specific trajectories. The ECM algorithm, written in R code, is applied to estimate the true values of parameters in both cases.

### 4.1. Simulation study 1

#### 4.1.1. Data description

Recall the penalized spline joint Model 1 of (2.10) with three internal knots in longitudinal submodel and Gompertz distribution for the baseline risk function in the form of

hit=h0texpγxi+αmit=λ1expλ2texpγxi+αmit,E19

where h0tis the hazard function at baseline having Gompertz distribution, xiis baseline covariate and mitdenotes the true and unobserved value of the longitudinal at time t. The form of mitis given by

mit=β0+β1t+ui1tK1++ui2tK2++ui3tK3++vi0,E20

where bi=u11u12u13vi0Tis the vector of random effects and is assumed to have a normal distribution with mean zero and the diagonal covariance matrix D=DiagD11D22D33D44. K1,K2,K3denote the three internal knots put into the model. The observed longitudinal value at time point tfor the ithsubject is of the form

yit=mit+εit,E21

where the error variable εitis assumed to come from a normal distribution with mean zero and variance σ2.

From this model, the vector of all parameters θof the models in (4.1) and (4.2) is θ=θtTθyTθbTT, where θt=γαλ1λ2Tdenotes the parameter vector for the survival outcomes. Furthermore, θy=β0β1σε2Tis the parameter vector for longitudinal outcomes and θb=Dis the variance matrix of random effects.

To simulate the observed survival time Tiof the joint model in (4.1), we applied the methods adapted by Bender et al. , Austin  and Crowther and Lambert  to generate the true survival time. We further assumed that the censoring mechanism is exponentially distributed with parameter λ. The observed survival time was the minimum of the censoring time and the true survival time. We generated the survival time Tifor n=500subjects with the parameters: β0=5,β1=2,λ1=0.1,λ2=0.5,γ=0.5,α=0.05,δ=2and D=Diag2224. Then we generated the longitudinal responses mitusing (4.2). The simulated model is therefore

hit=0.1exp0.5texp0.5xi+0.05mitmit=5+2t+ui1t1++ui2t2++ui3t3++vi0.E22

The sample of simulated data is presented in Appendix A. The curve of Kaplan-Meier estimate for the survival function of simulated data (left panel) and the longitudinal trajectories for the whole simulated sample (right panel) are presented in Figure 1. The dashed lines in the left panel correspond to 95% pointwise confidence intervals. It is clear from the plot of Kaplan-Meier estimator that the survival probability starts from 1 and decreases gradually until at the 5thmonth of the study. After this, it is nearly zero after 6 months or so. The right panel is the longitudinal trajectories for the first 100 subjects reflecting the form as in (4.2). Figure 1.Kaplan-Meier estimate of the survival function of the simulated data of (4.4) (left panel). Longitudinal trajectories of the first 100 subjects from the simulated sample of (4.4) (right panel).

#### 4.1.2. Parameter estimation

The ECM algorithm, as described in Section 3.2, is now implemented to estimate all parameters in (4.4). The initial values of the parameters were set at β0=1,β1=1,λ1=0.05,λ2=0.1,γ=0.1,α=0.01,σ=1,D11=3,D22=3,D33=3,D44=3, respectively. However, these initial values can also be set randomly. The traces of each of these parameters are presented in Figures 2 and 3, respectively. The traces of estimates show the way how the algorithm updates new values of the parameters. In addition, they also demonstrate the convergence of the algorithm after 10–20 iterations. In particular, the parameters β0,β1,λ2,α,σ,D22and D33converge linearly to the true values while the parameters λ1,γ,D11,and D44oscillate before converging to the true values.

We now run the simulation for 30 independent samples with different sample sizes (n=200,300and 500). Then, we calculate the means, standard deviations (SD) and mean square error (MSE) of parameters as presented in Table 1. The point estimates of each parameter are reasonably close to the true values when the sample sizes are 300 and 500. This is also supported by the values of SD and MSE which decrease gradually when the sample size increases. In addition to this, we also compare the parameter estimates with different censoring rates (20% and 40%) for a sample size of 500 in 5, Appendix E. The result shows that when the sample size is large the censoring rate has little influence on the estimates.

ParameterTrue valuen=200n=300n=500
EstimateSDMSEEstimateSDMSEEstimateSDMSE
β054.210.720.764.680.500.325.100.300.27
β121.690.750.571.860.750.282.100.570.18
λ10.10.120.130.000.120.120.000.110.100.00
λ20.50.500.150.020.570.140.010.480.140.02
γ0.50.500.170.030.490.120.040.510.090.01
α0.050.030.040.000.040.050.000.040.040.00
σ22.060.130.012.020.060.002.020.060.00
D1122.870.920.622.590.730.532.270.800.22
D2222.030.420.162.210.460.232.100.430.05
D3322.100.370.170.340.500.342.220.590.10
D4445.241.820.764.320.740.604.240.630.18

### Table 1.

Summary statistics for parameter estimation of the simulated data of the model in (4.4) for different sample sizes.

### 4.2. Simulation study 2

#### 4.2.1. Data description

We now perform a simulation study on proportional hazard model having Gompertz distribution at baseline and non-linear subject-specific trajectory. In particular, the model is in the form of

hit=h0texpγxi+αmit=λ1expλ2texpγxi+αmit,E23

where h0tis the hazard function at baseline having Gompertz distribution, xiis baseline covariate and mitdenotes the true and unobserved value of the longitudinal at time t. The observed longitudinal value at time point tfor the ithsubject has the non-linear form

yit=mit+εit=5log1+t+bi1t+bi0+εit,E24

where εitN0σ2. In the model of (4.6), the mean longitudinal response of the population is assumed to have a non-linear logarithm curve. Different subjects are assumed to have different intercepts and slopes. In particular, we assume that bi=bi0bi1Thaving a bivariate normal distribution with mean μ=32and covariance matrix D=Diag11. The true values of the other parameters we put in the model were λ1=0.01,λ2=0.1,γ=0.5,α=0.2,σ=2, respectively. In addition, the censoring mechanism is assumed exponentially distributed with a parameter of λ=0.25.

Based on the model in (4.5) and the simulation study 1, we simulated survival times Tifor 500 subjects with 35% censoring rate. In particular, the ending time for the study was 5 months and all subjects alive by the end of the study (i.e. time 5) were assumed to be censored. This design was also reflected of many clinical studies in real life. In this sample, there were 329 uncensored subjects and 1387 observations for 500 subjects. For each subject, 1–5 longitudinal measurements were recorded. On average, there were three longitudinal measurements per subject. In Figure 4, the Kaplan-Meier estimate for survival curve is presented for the simulated data of (4.5) with 95% pointwise confidence intervals in the left panel. Moreover, the subject-specific longitudinal profiles for six randomly selected subjects is drawn in the right panel. It can be seen that some of the subjects in this dataset showed non-linear evolutions in their longitudinal values. Each subject has its own trajectory. Figure 4.Kaplan-Meier estimate of the survival function of the simulated data of (4.5) (left panel). Longitudinal trajectories for the six randomly selected subjects of (4.6) (right panel).

#### 4.2.2. Parameter estimation

We will be using Model 1 in (4.1) and in (4.2) to handle the non-linear longitudinal trajectory in the simulated data in (4.5). In this model, we put three internal knots at 25, 50 and 75%, respectively, of the follow up time. Then, the ECM algorithm, as explained in Section 3, is implemented once again to estimate all parameters in the model.

The results for parameter estimation are presented in Table 2. The means, standard deviations and 95% confidence intervals of parameter estimates are calculated for 30 independent samples. The point estimates for λ1,λ2,γ,α,σ2are reasonably close to the true values. Similarly, the 95% CIs include the true values of λ1,λ2,γ,α,σ2.

ParameterTrue valueEstimateSD95% CI
β03.3990.673[3.158;3.640]
β14.3300.142[4.280;4.380]
λ10.010.0130.021[0.007;0.021]
λ20.10.0830.184[0.017;0.148]
γ0.50.6400.386[0.486;0.778]
α0.20.1860.142[0.136;0.237]
σ21.9930.061[1.971;2.015]
D110.9770.190[0.909;1.044]
D221.3650.183[1.300;1.430]
D331.9760.154[1.921;2.031]
D441.3930.196[1.322;1.463]

### Table 2.

Summary statistics for parameter estimation of the simulated data of the model in (4.5) and (4.6).

Based on the estimated values of parameters, we generate back the estimated survival time by approximating values of random effects from linear mixed-effects function. The detail of the generation is explained in Appendix C. Then, we use the Kaplan-Meier estimate to compare between the survival function of the simulated dataset (the black solid line) and the estimated survival function (the dashed line) which are presented in the left panel of Figure 5. Figure 5.Kaplan-Meier estimate of the survival function from simulated failure times (the solid line) with 95% confidence intervals (dot lines), from Model 1 (4.5) (the dashed line) (left panel). Observed longitudinal trajectories (the solid line) and predicted longitudinal trajectories (the dashed line) for the twelve randomly selected subjects (right panel).

Moreover, we also draw the smooth and predicted longitudinal profiles for 12 patients chosen randomly in the right panel of Figure 5. The dot points are the true observed longitudinal values from simulated data. The solid lines are the smooth longitudinal profiles of the true observed longitudinal values using the loess smoother and the dashed lines are the predicted profiles of 12 randomly selected individuals. It is clear that the Kaplan-Meier estimates from simulated data overlaps the Kaplan-Meier estimates based on the predicted value in the left panel of Figure 4. The penalized spline regression model in (2.10) was a good fit for subject-specific curves in the right panel of Figure 5.

In summary, simulation studies have shown the stability of the algorithm and the goodness of fit of the penalized spline models. From the simulation study 1, it is shown that the updating process through the ECM algorithm converges quickly to the true values of the parameters. In addition, the simulation study 2 shows that the model can well predict the survival function and individual trajectories respectively.

### 4.3. The AIDS data set

In the AIDS dataset, there were 467 patients with advanced human immunodeficiency virus infection during antiretroviral treatment who had failed or were intolerant to zidovudine therapy. Patients in the study were randomly assigned to receive either didanosine drug (ddI) or zalcitabine drug (ddC). CD4 cells are a type of white blood cells made in the spleen, lymph nodes and thymus gland and are part of the infection-fighting system. CD4 cell counts were recorded at the time of study entry as well as at 2, 6, 12 and 18 months thereafter. The detail regarding the design of this study can be found in Abrams et al. . By the end of the study, there were 188 patients died, resulting in about 59.7% censoring. There were 1405 longitudinal responses recorded.

Previously, Rizopoulos  used his standard joint model for the AIDS data which consider the variability between subjects mostly depend on the intercept. However, the model could not predict observed longitudinal data accurately. When the time unit is changed from month to year in the data, the variability between subjects depends not only on the intercept but also on the obstimevariable. In addition, the longitudinal trajectories plot also shows many non-linear curves as depicted in the right panel of Figure 6. Figure 6.Kaplan-Meier estimate of the survival function of the AIDS data (left panel). Longitudinal trajectories for CD4 cell count of the first 100 patients for two groups (right panel).

Given the non-linearity, it is appropriate to apply our models, Model 1 and Model 2, for the AIDS data. In particular, we use the two joint models in (2.10) and (2.11) with the four internal knots are placed at 20, 40, 60, 80%, respectively of the observed failure times for hazard rate at baseline. Then, the ECM algorithm is implemented to estimate all parameters in the two models. A summary of statistics for parameter estimation using Model 1 and Model 2 is presented in Table 3.

Model 1Model 2
ParameterEstimateStd. errorz-valuep-valueParameterEstimateStd. errorz-valuep-value
β07.870.06127.07<0.001β07.810.07114.34<0.001
β1−1.690.11−14.77<0.001β1−1.620.12−12.99<0.001
γ0.220.112.060.039γ0.310.103.030.002
α−0.200.01−15.84<0.001α−0.240.01−18.15<0.001
λ11.680.07λ11.040.11
λ20.330.00λ21.790.23
σ2.360.36λ31.380.38
D112.180.14λ41.670.42
D221.040.07λ52.480.66
D330.850.06σ2.620.45
D4411.870.78D111.020.07
D220.970.06
D330.990.07
D4411.400.75

### Table 3.

Summary statistics for parameter estimation of the AIDS data of Model 1 and Model 2 respectively.

Following Rizopoulos , in Model 1 and Model 2, the univariate Wald tests are applied for the fixed effects β=β0β1Tin the longitudinal submodel, the regression coefficient γand the association parameter αrespectively. The results from Table 3 show that the point estimates of β0,β1,γ,αare all statistically significant for both models at a significance level of 5%.

We conduct the Kaplan-Meier estimate of the survival function from the observed survival time (the light solid line) and the dot lines correspond to 95% pointwise confidence intervals in Figure 6 (left panel). The predicted survival function from Model 1 is the dashed line and the predicted survival function from Model 2 is the bold solid line. The plots show that Model 2 works very well in this case as shown in Figure 7. Moreover, Model 2 is also preferred in practice because h0.usually is considered as unspecified in order to avoid the impact of misspecifying the distribution of survival times. Figure 7.Kaplan-Meier estimates of the survival function from observed failure times, from model 1 and from model 2 (left panel). Observed longitudinal trajectories (the solid line) and predicted longitudinal trajectories (the dashed line) for the 12 randomly selected patients (right panel).

Based on the model of longitudinal regression in (4.2), we also draw the smooth and predicted longitudinal profiles for nine patients from the AIDS dataset as depicted in Figure 7 (right panel). The dot points are the true observed longitudinal values. The solid lines are the smooth longitudinal profiles using the loess smoother and the dashed lines are the predicted profiles of nine randomly selected individuals. Most of the predicted profiles are quite close to the observed ones.

## 5. Discussion

In this chapter, two joint models using a penalized spline with a truncated polynomial basis have been proposed to model a non-linear longitudinal outcome and a time-to-event data. The use of a truncated polynomial basis gives us an intuitive and obvious way to model non-linear longitudinal outcome. By adding some penalties for the coefficients of the knots and using linear mixed-effects models, the smoothing is controlled and the individual curves are specified.

We have conducted a sensitivity analysis on the assumption of normality for either random effects or errors. The t-distribution with the degree of freedom 5 is applied for each of them. The results show that the estimates of parameters are sensitive when both of terms are not normally distributed.

The main findings we may derive from this chapter are, at least, threefold: (1) the ECM algorithm provides a reasonable quick convergence algorithm for the proposed models; (2) the fitted joint models are able to measure the association between the internal time-dependent covariates and the risk for an event and (3) the two models provide a good prediction for both the longitudinal and survival functions, as presented in empirical results.

The limitations of this study are, at least, threefold: (1) the number of internal knots is limited to three due to computational costs; (2) the polynomial power functions can form an ill-conditioned basis for the models and (3) the estimation results are sensitive when both random effects and error are not normally distributed.

Based on the limitations, our future work will focus on using new methods for approximating the integrals to reduce the computational problems or relaxing the normality assumption. Furthermore, we will apply a different basis for joint models, that is the penalized B-spline. In terms of parameter estimation, we are considering a different approach to estimate the parameters in the models using a Bayesian approach, via Markov chain Monte Carlo (MCMC) algorithms.

One sample of simulated data of the penalized spline joint model in (4.4) is presented in Table 4 for the first three patients. The subjects are measured bimonthly and the entry time is 0 for all subjects. Obstime variable includes the time points at which these measurements are recorded. Time variable includes the observed survival times when subject meets an event. xis a time-constant binary random variable with parameter p=0.5. Column y contains the longitudinal responses. Death variable is the event status indicator. This variable receives value 1 when the true survival time is less than or equal to the censoring time and 0 otherwise. We define the four random effects variables which are Z1=obstimeK1+, Z2=obstimeK2+, Z3=obstimeK3+and Z4=1. For the longitudinal process, there are 1902 of observations for 500 subjects. For each subject, 1-7 longitudinal measurements are recorded. On average, there are four longitudinal measurements per subject. For the event process, there are 297 subjects who meet for an event which is equivalent to 59.4% of the whole sample.

IdObstimeTimexyDeathZ1Z2Z3Z4
10.04.9701.4110.00.00.01
10.54.9706.4510.00.00.01
11.04.9704.1010.00.00.01
11.54.9701.5010.50.00.01
12.04.9704.0711.00.00.01
12.54.9706.1611.50.50.01
13.04.9703.6012.01.00.01
13.54.9708.3212.51.50.51
14.04.9706.3213.02.01.01
20.02.7906.8110.00.00.01
20.52.7907.7710.00.00.01
21.02.7909.7510.00.00.01
21.52.79011.0410.50.00.01
22.02.7907.2011.00.00.01
30.01.900−1.8400.00.00.01
30.51.9001.1200.00.00.01
31.01.9000.7800.00.00.01

### Table 4.

A snapshot of simulated data for penalized spline joint model in (4.4).

ParameterTrue valueCensored (20%)Censored (40%)
EstimateSDMSEEstimateSDMSE
β054.850.300.255.100.300.27
β121.860.450.202.100.570.18
λ10.10.130.120.000.110.100.00
λ20.50.520.070.000.490.140.02
γ0.50.480.100.000.510.090.00
α0.050.050.020.000.040.040.00
σ22.020.050.002.020.060.00
D1122.210.670.172.270.800.22
D2222.160.270.092.100.430.05
D3322.260.270.012.220.600.10
D4444.200.530.204.240.630.18

### Table 5.

Summary statistics for parameter estimation of the simulated data of the model in (22) for different censoring rates.

The integrals with respect to the random effects in (3.7) do not have closed-form solutions. Therefore, in this chapter, we implement the Gaussian-Hermite quadrature rule as in Rizopoulos  to approximate the integrals. In our simulation study and R coding, we use the Gaussian-Hermite quadrature rule with 10 quadrature points.

The updating formulas of the parameters in Step 3 have different forms for each parameter following Rizopoulos . We have the closed-form estimates for the measurement error variance σε2in the longitudinal model and the covariance matrix of the random effects as follows:

Ĝit+1=1nibiTbipbiTiδiyiθitdbi=1Nivb˜iit+b˜iitb˜iitT,E25

where b˜i=EbiTiδiyiθ=bipbiTiδiyiθdbiand v˜bi=varbiTiδiyiθ=bib˜ipbiTiδiyiθdbi. The updating formula for σε2is

σ̂εit+12=1niWTWpbiTiδiyiθitdbi,E26

where W=yiXiβXiuiZivi.

Unfortunately, we cannot obtain closed-form expressions for the fixed effects βand the parameters of the survival submodel γ,α,andθh0. We thus employ the one-step Newton-Raphson approach to obtain the updated βit+1, γit+1,αit+1and θh0it+1. In particular, we have

Sθ=Qθθitθθ̂it+1=θ̂itSθ̂itθ1Sθ̂it,E27

where Sθis the score vector corresponding to parameter θand the score vector has the form of

Sθ=Qθθitθ=iθTlogpTiδibiθitp(yibiθit)p(biθit).pbiTiδiyiθitdbi.

There are four cases for simulating survival time Tiof the model (4.1) as follows.

When the survival time t<K1, we calculate the cumulative hazard function Hit=0thisds. Based on the relation between the survival function Sit, cumulative hazard function Hitand cumulative distribution Fit, we have

Sit=expHit=1Fit.E28

Following (5.4), we set

u=1FiTi,E29

where uis a random variable with uUni01. The survival time tis the solution of the equation

U=expHit=exp0thisds.

The condition t<K1is equal to

logU<0K1hsds.

When K1t<K2, we calculate the cumulative hazard function Hit=0K1hisds+K1thisds. The survival time tis the solution of the equation

U=exp0K1hisds+K1thisds,

where Uis a value of uUni01. The condition K1t<K2is equal to

logU<0K1hisds+K1K2hisds.

When K2t<K3, we calculate the cumulative hazard function Hit=0K1hisds+K1K2hisds+K2thisds. The survival time tis the solution of the equation

U=exp0K1hisds+K1K2hisds+K2thisds,

where Uis a value of uUni01. The condition K2t<K3is equal to

logU<0K1hisds+K1K2hisds+K2K3hisds.

When K3t, the cumulative hazard function has the form Hit=0K1hisds+K1K2hisds+K2K3hisds+K3thisds. Survival time tis the solution of the equation

U=exp0K1hisds+K1K2hisds+K2K3hisds+K3thisds.

In particular, Ruppert et al.  introduced a default choices for knot location and number of knots. The idea is to choose sufficient knots to resolve the essential structure in the underlying regression function. But for more complicated penalized spline models, there are computational advantages to keeping the number of knots relatively low. A reasonable default is to choose the knots to ensure that there are a fixed number of unique observations, say 4–5, between each knot. For large data sets, this can lead to an excessive numbers of knots; therefore, a maximum number of allowable knots (say, 20–40 total) are recommended.

According to Ruppert et al. , the choice for knot position is

Kk=k+1K+2thsample quantile of the unique xifor k=1,,K.

The simple choice of K is

K=min14×number of uniquexi35.

See Table 5.

## More

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## How to cite and reference

### Cite this chapter Copy to clipboard

Huong Thi Thu Pham and Hoa Pham (June 6th 2018). Penalized Spline Joint Models for Longitudinal and Time-To-Event Data, Topics in Splines and Applications, Young Kinh-Nhue Truong and Muhammad Sarfraz, IntechOpen, DOI: 10.5772/intechopen.75975. Available from:

### Related Content

#### Topics in Splines and Applications

Edited by Young Kinh-Nhue Truong

Next chapter

#### Application of Cubic Spline Interpolation Technique in Power Systems: A Review

By Akhil Prasad, Atul Manmohan, Prabhakar Karthikeyan Shanmugam and Kothari D.P.

#### Recent Advances in Integral Equations

Edited by Francisco Bulnes

First chapter

#### Introductory Chapter: Frontier Research on Integral Equations and Recent Results

By Francisco Bulnes

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all Books