Summary statistics for parameter estimation of the simulated data of the model in (4.4) for different sample sizes.

## Abstract

The joint models for longitudinal data and time-to-event data have recently received numerous attention in clinical and epidemiologic studies. Our interest is in modeling the relationship between event time outcomes and internal time-dependent covariates. In practice, the longitudinal responses often show non-linear and fluctuated curves. Therefore, the main aim of this chapter is to use penalized splines with a truncated polynomial basis to parameterize the non-linear longitudinal process. Then, the linear mixed effects model is applied to subject-specific curves and to control the smoothing. The association between the dropout process and longitudinal outcomes is modeled through a proportional hazard model. Two types of baseline risk functions are considered, namely a Gompertz distribution and a piecewise constant model. The resulting models are referred to as penalized spline joint models; an extension of the standard linear joint models.

### Keywords

- survival data
- longitudinal data
- joint models
- time-dependent covariates
- random effects

## 1. Introduction

The joint models for longitudinal data and time-to-event data are aimed to measure the association between the longitudinal marker level and the hazard rate for an event. The longitudinal data are collected repeatedly for several subjects. In this data, there are two types of covariates, namely, time-independent covariates and time-dependent covariates. Furthermore, there are also two different categories of time-dependent covariates, namely, external and internal covariates. In clinical studies, internal time-dependent longitudinal outcomes are often applied to monitor disease progression and failure time.

In modern survival analysis, Cox [1] has been considered as a very popular joint model to be used for time-independent covariates. These models measured the effect of time-independent covariates on the hazard rate for an event. Subsequently, the extended Cox model was developed for external time-dependent covariates. However, these latter models cannot handle longitudinal biomarkers. Therefore, Rizopoulos [2] introduced joint models for internal time-dependent covariates and the risk for an event based on linear mixed-effects models and relative risk models.

The basic assumption for the standard joint models proposed by Rizopoulos [2] is that the hazard rate at a given time of the dropout process is associated with the expected value of the longitudinal responses at the same time. The whole history of response has an influence on the survival function. Thus, it is crucial to obtain good estimates for the subject-specific trajectories in order to have an accurate estimation of the survival function. In addition, an important feature that we need to account for is that many observations in the sample often show non-linear and fluctuated longitudinal trajectories in time. Each observation has its own trajectory. Therefore, flexibility is needed for subject-specific longitudinal submodels in the joint models to improve the predictions.

There are several previous works to flexibly model the subject-specific longitudinal profiles in the joint models. Brown et al. [3] applied B-splines with multidimensional random effects. In particular, Brown et al. [3] assumed that both subject and population trajectories have the same number of basis functions. By doing this, the number of parameters in the longitudinal submodel is reasonably large. If we have to deal with the roughness of the fit for this model, the computational problems will increase especially when the dimension of the random effects vector is large. Ding and Wang [4] proposed the use of B-splines with a single multiplicative random effect to link the population mean function with the subject-specific profile. This simple model can gain an easy estimation for parameters, however may not be appropriate for many practical applications [5]. Rizopoulos [5] considered more flexible models using natural cubic splines with the expansion of the random effects vector. The roughness of the fit is still not mentioned in these models.

In this chapter, we present new approaches to model non-linear shapes of subjects-specific evolutions for joint models by extending the standard joint models of Rizopoulos [2]. In particular, we implement penalized splines using a truncated polynomial basis for the longitudinal submodel. Following this, the linear mixed-effects approach is applied to model the individual trajectories and impose smoothness over adjacent coefficients respectively. The ECM algorithm is used for parameter estimation. In addition, corresponding standard errors are calculated using the observed information matrix. However, as the matrices of random effects covariates in our models are different from the matrices of random effects covariates in the standard joint models, the JM package of Rizopoulos [6] cannot be used for our models. Therefore, a set of R codes are written for the penalized spline joint models to implement the proposed procedures on the simulated data and a case study respectively.

The chapter is organized as follows. Section 2 describes the penalized splines with truncated polynomial basis for the joint models. In this section, the two models are specified as penalized spline joint model with hazard rate at base line having Gompertz distribution (referred to as Model 1) and penalized spline joint model with a piecewise-constant baseline risk function (referred to as Model 2). The joint likelihood, score functions and the ECM algorithm for estimation are presented in Section 3. We then validate the proposed algorithm using extensive simulation studies and then apply it for AIDS data in Section 4. Finally, Section 5 gives concluding remarks.

## 2. The penalized spline joint models

In this section, we introduce the joint models using penalized spline with truncated polynomial basis. The proposed parametrization is based on the standard joint models of Rizopoulos [2] and the regression model of a longitudinal response using penalized spline.

Notations in this section are taken from Rizopoulos [2]. Let

For a longitudinal response, suppose that we have

where

When survival function

for all

On the other hand, endogenous time-varying covariates are the ones that do not satisfy the condition in (2.2). In particular,

In the penalized spline regression models [8, 9], the observed longitudinal covariate is modeled using the truncated power functions with a general power basis of degree

Here, the set

We let

The model in (2.6) can be rewritten in matrix notation as:

where

Here,

Postulating a proportion hazard model, the penalized spline joint models for longitudinal and time-to-event data is defined by

where

Using (2.7), the longitudinal submodel for the

where the covariance matrix of random effects

To complete the specification of the model in (2.8), we now need to define the form for the baseline risk function

Therefore, the proposed penalized spline joint models considered in this chapter are as follows:

*Model 1*: Penalized spline joint model with hazard rate at base line having Gompertz distribution

*Model 2*: Penalized spline joint model with a piecewise-constant baseline risk function

where

## 3. Parameter estimation

After defining the two penalized spline joint models, we now present the joint likelihood and score functions of the parameters in the models. The ECM algorithm is also presented in this section.

### 3.1. Likelihood and score functions

Following Rizopoulos [2], we assume that the vector of time-independent random effects

where

where the conditional density for survival part has the form of

Here,

Moreover, the density for the longitudinal part with the random effects is given by

where

We consider the log likelihood of the (

The function for maximizing the log likelihood involves the density function of survival time and least squares with a penalty term, which is

According to Ruppert et al. [9], the term

Using (3.2), the score vector for the penalized spline joint models can be expressed as:

The requirement for numerical integration with respect to the random effects is one of the main difficulties in the joint models [2]. The maximum likelihood estimation (MLE) is typically obtained using standard maximization algorithms such as expectation maximization algorithm or Newton-Raphson algorithm.

### 3.2. The ECM algorithm

The EM algorithm has been widely used in the joint models, such as for the standard joint model of Rizopoulos [2] and for the generalized linear mixed joint model [10]. The ECM algorithm is a natural extension of EM algorithm for which the maximization process on the M-step is conditional on some functions of the parameters under estimation. It also can reduce computer time. The ECM algorithm will be used to obtain the maximum likelihood estimates of the penalized spline joint models following McLachlan and Krishnan [11] in this chapter.

In these models, the random effects

The following are the steps of this algorithm.

*Step 1*: *Initialization*

We first initialize the parameters. We assume that there are

*Step 2*: *The E-step for the penalized joint models*

We fill in the missing data and replace the log-likelihood function of the observed data with the expected function of the complete data log-likelihood as follows:

*Step 3*: *The conditional M-step for the penalized joint models.*

This step will be implemented in four stages as follows:

3.1 Given the current value of the parameter vector at the

3.2 Propose the new value for the first parameter

3.3 Set

3.4 Similarly, based on the value of the parameter vector

*Step 4*: *Iterate among steps 2–3 until the algorithm numerically converges.*

To update the new values for parameters in the conditional M-step, we have the closed-form estimates for the measurement of error variance

Following Louis [12], standard errors for the parameter estimates can be calculated by using the estimated observed information matrix

where

## 4. Empirical results

This section presents two simulation studies for Model 1, whereas Model 2 will be applied for a case study only. In Section 4.1, we simulate data from Model 1 with three internal knots in the longitudinal submodel and Gompertz distribution for the baseline risk function. In Section 4.2, we simulate data from Model 1 having Gompertz distribution for the baseline risk function and non-linear logarithm subject-specific trajectories. The ECM algorithm, written in R code, is applied to estimate the true values of parameters in both cases.

### 4.1. Simulation study 1

#### 4.1.1. Data description

Recall the penalized spline joint Model 1 of (2.10) with three internal knots in longitudinal submodel and Gompertz distribution for the baseline risk function in the form of

where

where

where the error variable

From this model, the vector of all parameters

To simulate the observed survival time

The sample of simulated data is presented in Appendix A. The curve of Kaplan-Meier estimate for the survival function of simulated data (left panel) and the longitudinal trajectories for the whole simulated sample (right panel) are presented in Figure 1. The dashed lines in the left panel correspond to 95% pointwise confidence intervals. It is clear from the plot of Kaplan-Meier estimator that the survival probability starts from 1 and decreases gradually until at the

#### 4.1.2. Parameter estimation

The ECM algorithm, as described in Section 3.2, is now implemented to estimate all parameters in (4.4). The initial values of the parameters were set at

We now run the simulation for 30 independent samples with different sample sizes (

Parameter | True value | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Estimate | SD | MSE | Estimate | SD | MSE | Estimate | SD | MSE | ||

5 | 4.21 | 0.72 | 0.76 | 4.68 | 0.50 | 0.32 | 5.10 | 0.30 | 0.27 | |

2 | 1.69 | 0.75 | 0.57 | 1.86 | 0.75 | 0.28 | 2.10 | 0.57 | 0.18 | |

0.1 | 0.12 | 0.13 | 0.00 | 0.12 | 0.12 | 0.00 | 0.11 | 0.10 | 0.00 | |

0.5 | 0.50 | 0.15 | 0.02 | 0.57 | 0.14 | 0.01 | 0.48 | 0.14 | 0.02 | |

0.5 | 0.50 | 0.17 | 0.03 | 0.49 | 0.12 | 0.04 | 0.51 | 0.09 | 0.01 | |

0.05 | 0.03 | 0.04 | 0.00 | 0.04 | 0.05 | 0.00 | 0.04 | 0.04 | 0.00 | |

2 | 2.06 | 0.13 | 0.01 | 2.02 | 0.06 | 0.00 | 2.02 | 0.06 | 0.00 | |

2 | 2.87 | 0.92 | 0.62 | 2.59 | 0.73 | 0.53 | 2.27 | 0.80 | 0.22 | |

2 | 2.03 | 0.42 | 0.16 | 2.21 | 0.46 | 0.23 | 2.10 | 0.43 | 0.05 | |

2 | 2.10 | 0.37 | 0.17 | 0.34 | 0.50 | 0.34 | 2.22 | 0.59 | 0.10 | |

4 | 5.24 | 1.82 | 0.76 | 4.32 | 0.74 | 0.60 | 4.24 | 0.63 | 0.18 |

### 4.2. Simulation study 2

#### 4.2.1. Data description

We now perform a simulation study on proportional hazard model having Gompertz distribution at baseline and non-linear subject-specific trajectory. In particular, the model is in the form of

where

where

Based on the model in (4.5) and the simulation study 1, we simulated survival times

#### 4.2.2. Parameter estimation

We will be using Model 1 in (4.1) and in (4.2) to handle the non-linear longitudinal trajectory in the simulated data in (4.5). In this model, we put three internal knots at 25, 50 and 75%, respectively, of the follow up time. Then, the ECM algorithm, as explained in Section 3, is implemented once again to estimate all parameters in the model.

The results for parameter estimation are presented in Table 2. The means, standard deviations and 95% confidence intervals of parameter estimates are calculated for 30 independent samples. The point estimates for

Parameter | True value | Estimate | SD | 95% CI |
---|---|---|---|---|

— | 3.399 | 0.673 | [3.158;3.640] | |

— | 4.330 | 0.142 | [4.280;4.380] | |

0.01 | 0.013 | 0.021 | [0.007;0.021] | |

0.1 | 0.083 | 0.184 | [0.017;0.148] | |

0.5 | 0.640 | 0.386 | [0.486;0.778] | |

0.2 | 0.186 | 0.142 | [0.136;0.237] | |

2 | 1.993 | 0.061 | [1.971;2.015] | |

— | 0.977 | 0.190 | [0.909;1.044] | |

— | 1.365 | 0.183 | [1.300;1.430] | |

— | 1.976 | 0.154 | [1.921;2.031] | |

— | 1.393 | 0.196 | [1.322;1.463] |

Based on the estimated values of parameters, we generate back the estimated survival time by approximating values of random effects from linear mixed-effects function. The detail of the generation is explained in Appendix C. Then, we use the Kaplan-Meier estimate to compare between the survival function of the simulated dataset (the black solid line) and the estimated survival function (the dashed line) which are presented in the left panel of Figure 5.

Moreover, we also draw the smooth and predicted longitudinal profiles for 12 patients chosen randomly in the right panel of Figure 5. The dot points are the true observed longitudinal values from simulated data. The solid lines are the smooth longitudinal profiles of the true observed longitudinal values using the loess smoother and the dashed lines are the predicted profiles of 12 randomly selected individuals. It is clear that the Kaplan-Meier estimates from simulated data overlaps the Kaplan-Meier estimates based on the predicted value in the left panel of Figure 4. The penalized spline regression model in (2.10) was a good fit for subject-specific curves in the right panel of Figure 5.

In summary, simulation studies have shown the stability of the algorithm and the goodness of fit of the penalized spline models. From the simulation study 1, it is shown that the updating process through the ECM algorithm converges quickly to the true values of the parameters. In addition, the simulation study 2 shows that the model can well predict the survival function and individual trajectories respectively.

### 4.3. The AIDS data set

In the AIDS dataset, there were 467 patients with advanced human immunodeficiency virus infection during antiretroviral treatment who had failed or were intolerant to zidovudine therapy. Patients in the study were randomly assigned to receive either *didanosine* drug (*ddI*) or *zalcitabine* drug (*ddC*). CD4 cells are a type of white blood cells made in the spleen, lymph nodes and thymus gland and are part of the infection-fighting system. CD4 cell counts were recorded at the time of study entry as well as at 2, 6, 12 and 18 months thereafter. The detail regarding the design of this study can be found in Abrams et al. [16]. By the end of the study, there were 188 patients died, resulting in about 59.7% censoring. There were 1405 longitudinal responses recorded.

Previously, Rizopoulos [2] used his standard joint model for the AIDS data which consider the variability between subjects mostly depend on the intercept. However, the model could not predict observed longitudinal data accurately. When the time unit is changed from month to year in the data, the variability between subjects depends not only on the intercept but also on the

Given the non-linearity, it is appropriate to apply our models, Model 1 and Model 2, for the AIDS data. In particular, we use the two joint models in (2.10) and (2.11) with the four internal knots are placed at 20, 40, 60, 80%, respectively of the observed failure times for hazard rate at baseline. Then, the ECM algorithm is implemented to estimate all parameters in the two models. A summary of statistics for parameter estimation using Model 1 and Model 2 is presented in Table 3.

Model 1 | Model 2 | ||||||||
---|---|---|---|---|---|---|---|---|---|

Parameter | Estimate | Std. error | z-value | p-value | Parameter | Estimate | Std. error | z-value | p-value |

7.87 | 0.06 | 127.07 | <0.001 | 7.81 | 0.07 | 114.34 | <0.001 | ||

−1.69 | 0.11 | −14.77 | <0.001 | −1.62 | 0.12 | −12.99 | <0.001 | ||

0.22 | 0.11 | 2.06 | 0.039 | 0.31 | 0.10 | 3.03 | 0.002 | ||

−0.20 | 0.01 | −15.84 | <0.001 | −0.24 | 0.01 | −18.15 | <0.001 | ||

1.68 | 0.07 | 1.04 | 0.11 | ||||||

0.33 | 0.00 | 1.79 | 0.23 | ||||||

2.36 | 0.36 | 1.38 | 0.38 | ||||||

2.18 | 0.14 | 1.67 | 0.42 | ||||||

1.04 | 0.07 | 2.48 | 0.66 | ||||||

0.85 | 0.06 | 2.62 | 0.45 | ||||||

11.87 | 0.78 | 1.02 | 0.07 | ||||||

0.97 | 0.06 | ||||||||

0.99 | 0.07 | ||||||||

11.40 | 0.75 |

Following Rizopoulos [2], in Model 1 and Model 2, the univariate Wald tests are applied for the fixed effects

We conduct the Kaplan-Meier estimate of the survival function from the observed survival time (the light solid line) and the dot lines correspond to 95% pointwise confidence intervals in Figure 6 (left panel). The predicted survival function from Model 1 is the dashed line and the predicted survival function from Model 2 is the bold solid line. The plots show that Model 2 works very well in this case as shown in Figure 7. Moreover, Model 2 is also preferred in practice because

Based on the model of longitudinal regression in (4.2), we also draw the smooth and predicted longitudinal profiles for nine patients from the AIDS dataset as depicted in Figure 7 (right panel). The dot points are the true observed longitudinal values. The solid lines are the smooth longitudinal profiles using the loess smoother and the dashed lines are the predicted profiles of nine randomly selected individuals. Most of the predicted profiles are quite close to the observed ones.

## 5. Discussion

In this chapter, two joint models using a penalized spline with a truncated polynomial basis have been proposed to model a non-linear longitudinal outcome and a time-to-event data. The use of a truncated polynomial basis gives us an intuitive and obvious way to model non-linear longitudinal outcome. By adding some penalties for the coefficients of the knots and using linear mixed-effects models, the smoothing is controlled and the individual curves are specified.

We have conducted a sensitivity analysis on the assumption of normality for either random effects or errors. The t-distribution with the degree of freedom 5 is applied for each of them. The results show that the estimates of parameters are sensitive when both of terms are not normally distributed.

The main findings we may derive from this chapter are, at least, threefold: (1) the ECM algorithm provides a reasonable quick convergence algorithm for the proposed models; (2) the fitted joint models are able to measure the association between the internal time-dependent covariates and the risk for an event and (3) the two models provide a good prediction for both the longitudinal and survival functions, as presented in empirical results.

The limitations of this study are, at least, threefold: (1) the number of internal knots is limited to three due to computational costs; (2) the polynomial power functions can form an ill-conditioned basis for the models and (3) the estimation results are sensitive when both random effects and error are not normally distributed.

Based on the limitations, our future work will focus on using new methods for approximating the integrals to reduce the computational problems or relaxing the normality assumption. Furthermore, we will apply a different basis for joint models, that is the penalized B-spline. In terms of parameter estimation, we are considering a different approach to estimate the parameters in the models using a Bayesian approach, via Markov chain Monte Carlo (MCMC) algorithms.

One sample of simulated data of the penalized spline joint model in (4.4) is presented in Table 4 for the first three patients. The subjects are measured bimonthly and the entry time is 0 for all subjects. *Obstime* variable includes the time points at which these measurements are recorded. *Time* variable includes the observed survival times when subject meets an event. *y* contains the longitudinal responses. *Death* variable is the event status indicator. This variable receives value 1 when the true survival time is less than or equal to the censoring time and 0 otherwise. We define the four random effects variables which are

Id | Obstime | Time | x | y | Death | ||||
---|---|---|---|---|---|---|---|---|---|

1 | 0.0 | 4.97 | 0 | 1.41 | 1 | 0.0 | 0.0 | 0.0 | 1 |

1 | 0.5 | 4.97 | 0 | 6.45 | 1 | 0.0 | 0.0 | 0.0 | 1 |

1 | 1.0 | 4.97 | 0 | 4.10 | 1 | 0.0 | 0.0 | 0.0 | 1 |

1 | 1.5 | 4.97 | 0 | 1.50 | 1 | 0.5 | 0.0 | 0.0 | 1 |

1 | 2.0 | 4.97 | 0 | 4.07 | 1 | 1.0 | 0.0 | 0.0 | 1 |

1 | 2.5 | 4.97 | 0 | 6.16 | 1 | 1.5 | 0.5 | 0.0 | 1 |

1 | 3.0 | 4.97 | 0 | 3.60 | 1 | 2.0 | 1.0 | 0.0 | 1 |

1 | 3.5 | 4.97 | 0 | 8.32 | 1 | 2.5 | 1.5 | 0.5 | 1 |

1 | 4.0 | 4.97 | 0 | 6.32 | 1 | 3.0 | 2.0 | 1.0 | 1 |

2 | 0.0 | 2.79 | 0 | 6.81 | 1 | 0.0 | 0.0 | 0.0 | 1 |

2 | 0.5 | 2.79 | 0 | 7.77 | 1 | 0.0 | 0.0 | 0.0 | 1 |

2 | 1.0 | 2.79 | 0 | 9.75 | 1 | 0.0 | 0.0 | 0.0 | 1 |

2 | 1.5 | 2.79 | 0 | 11.04 | 1 | 0.5 | 0.0 | 0.0 | 1 |

2 | 2.0 | 2.79 | 0 | 7.20 | 1 | 1.0 | 0.0 | 0.0 | 1 |

3 | 0.0 | 1.90 | 0 | −1.84 | 0 | 0.0 | 0.0 | 0.0 | 1 |

3 | 0.5 | 1.90 | 0 | 1.12 | 0 | 0.0 | 0.0 | 0.0 | 1 |

3 | 1.0 | 1.90 | 0 | 0.78 | 0 | 0.0 | 0.0 | 0.0 | 1 |

Parameter | True value | Censored (20%) | Censored (40%) | ||||
---|---|---|---|---|---|---|---|

Estimate | SD | MSE | Estimate | SD | MSE | ||

5 | 4.85 | 0.30 | 0.25 | 5.10 | 0.30 | 0.27 | |

2 | 1.86 | 0.45 | 0.20 | 2.10 | 0.57 | 0.18 | |

0.1 | 0.13 | 0.12 | 0.00 | 0.11 | 0.10 | 0.00 | |

0.5 | 0.52 | 0.07 | 0.00 | 0.49 | 0.14 | 0.02 | |

0.5 | 0.48 | 0.10 | 0.00 | 0.51 | 0.09 | 0.00 | |

0.05 | 0.05 | 0.02 | 0.00 | 0.04 | 0.04 | 0.00 | |

2 | 2.02 | 0.05 | 0.00 | 2.02 | 0.06 | 0.00 | |

2 | 2.21 | 0.67 | 0.17 | 2.27 | 0.80 | 0.22 | |

2 | 2.16 | 0.27 | 0.09 | 2.10 | 0.43 | 0.05 | |

2 | 2.26 | 0.27 | 0.01 | 2.22 | 0.60 | 0.10 | |

4 | 4.20 | 0.53 | 0.20 | 4.24 | 0.63 | 0.18 |

The integrals with respect to the random effects in (3.7) do not have closed-form solutions. Therefore, in this chapter, we implement the Gaussian-Hermite quadrature rule as in Rizopoulos [5] to approximate the integrals. In our simulation study and R coding, we use the Gaussian-Hermite quadrature rule with 10 quadrature points.

The updating formulas of the parameters in Step 3 have different forms for each parameter following Rizopoulos [2]. We have the closed-form estimates for the measurement error variance

where

where

Unfortunately, we cannot obtain closed-form expressions for the fixed effects

where

There are four cases for simulating survival time

When the survival time

Following (5.4), we set

where

The condition

When

where

When

where

When

In particular, Ruppert et al. [9] introduced a default choices for knot location and number of knots. The idea is to choose sufficient knots to resolve the essential structure in the underlying regression function. But for more complicated penalized spline models, there are computational advantages to keeping the number of knots relatively low. A reasonable default is to choose the knots to ensure that there are a fixed number of unique observations, say 4–5, between each knot. For large data sets, this can lead to an excessive numbers of knots; therefore, a maximum number of allowable knots (say, 20–40 total) are recommended.

According to Ruppert et al. [9], the choice for knot position is

The simple choice of K is

See Table 5.