Open access peer-reviewed chapter

Modeling Heterogeneity Using Lindley Distribution

Written By

Arvind Pandey and Lalpawimawha

Reviewed: 07 September 2021 Published: 27 November 2021

DOI: 10.5772/intechopen.100340

From the Edited Volume

Computational Statistics and Applications

Edited by Ricardo López-Ruiz

Chapter metrics overview

205 Chapter Downloads

View Full Metrics

Abstract

Frailty models are intended for use in survival analysis to explain unobserved heterogeneity in an individual caused by various hereditary variables or environmental influences. A shared frailty model was utilized to examine the data. It is based on the idea that frailty affects the hazard rate in a multiplicative manner. In this manuscript, we introduce a new frailty model called the Lindley shared frailty model with exponential power and generalized Rayleigh as baseline distributions. The The Bayesian method of the Monte Carlo method of the Markov chain is used to estimate the parameters used in the model; simulation studies are also carried out to compare the actual and calculated values of the parameters; the proposed model is compared with the Bayesian comparison method Compare and propose the best model of infectious disease data.

Keywords

  • Bayesian technique
  • exponential power distribution
  • generalized Rayleigh distribution
  • Lindley frailty
  • MCMC

1. Introduction

The term frailty was coined by Vaupel et al. [1]. The frailty model is typically represented as an unobservable random variable that multiplies the risk function, with the frailty random variable supposed to be one of the parameter distributions, such as gamma, log-normal, positive stable, inverse Gaussian, power variance function, and so on. Let Y be a continuous random variable of lifetime of an individual and the frailty random variable (RV) be V. The conditional hazard function (CHF) for a given frailty variable V=v at time y>0 is

myv=vh0yeXβ,E1

where m0y is a baseline hazard function (BHF) at time y>0, X is a covariate and β is a regression coefficient, these are in vector form. The CHF for given frailty at time y>0 is

Syv=e0ymxvdx=evM0yeXβ,E2

where M0y is cumulative baseline hazard function (CBHF) at time y>0. Integrating over the range of frailty variable V having density fv, we get marginal survival function as

Sy=0Syvfvdv,E3
=LvM0yeXβ,E4

where LV. is a Laplace transformation of the distribution of V. Once we have survival function at time y>0 of lifetime random variable of an individual one can obtain probability structure and can base their inference on it.

Frailty models have gained more attention in the recent medical research due to the uniqueness property of the frailty parameter. Generally, gamma distribution, log-normal distribution and inverse Gaussian distributions are the most commonly used frailty distributions [2, 3]. Hanagal and Dabade [4] introduced new Compound negative binomial shared frailty models for bivariate survival data using Weibull and generalized exponential as baseline distributions. Pandey et al. [5] compared gamma, inverse Gaussian and positive frailty models with generalized Pareto as baseline distribution. Pandey et al. [6] also compared gamma and inverse Gaussian frailty distributions under additive property.

To extract the features of the Lindley shared frailty model, we used Lindley as a frailty distribution with right censored data under generalized Rayleigh and exponential power as baseline distributions. The survival periods are dependent in this case because the frailty variable follows the Lindley distribution. The predicted value of the frailty distribution variance influences the population’s degree of heterogeneity. The higher the variance of the frailty distribution, the more heterogeneity there is in the population under consideration. The frailty distribution becomes degraded when zero variance is observed. As a baseline distribution, the exponential power distribution is used. Because it exhibits a rising hazard rate, which is typical in real-life distributions, the exponential power distribution is chosen as the baseline distribution. The Lindley distribution with one parameter was first proposed by Lindley [7] for analyzing failure times data. It belongs to an exponential family, but it is used as an alternative to the exponential distribution. Lindley distribution is alluring due to the ability of modeling failure time data with increasing, decreasing, unimodal and bathtub shaped hazard rates. Ghitany et al. [8, 9] discussed different properties of Lindley distribution and also showed that Lindley distribution is better than the exponential distribution for modeling failure time data when considering hazard rate is unimodal or bathtub shaped. It is also shown that Lindley distribution is more flexible than exponential distribution in modeling lifetime data. Many authors have discussed and introduced different generalization of Lindley distribution. Bakouch et al. [10] introduced extended Lindley distribution. Ghitany et al. [11] proposed the power Lindley distribution and Shanker et al. [12] proposed two parameter Lindley distribution, which could also be reduced to one parameter case. The mean of a two parameter Lindley distribution is always greater than the mode indicating that the distribution is positively skewed.

The classic approach and the Bayesian approach are two widely utilized techniques in general. We can employ prior distributions here, therefore we’ll estimate the model parameters using the Bayesian Markov Chain Monte Carlo (MCMC) approach. Furthermore, because characteristics with diverse posterior distributions may be easily generated, the results and model selection criteria can be clearly interpreted. Run after thinning mean and autocorrelation plots, follow-up plots, past plot couplings, sample autocorrelation plots dictate chain behavior, burn duration, autocorrelation delay, and how observations are made It’s utilized for cognitive confirmation on its own. We also give simulation experiments to back up the model’s performance. All of the model’s estimation processes are detailed, as well as infection statistics relating to kidney infections.

In Sections 2 and 3, the introduction of the Lindley shared frailty model and baseline distributions are given, followed by proposed models and estimation strategies in Sections 4 and 5. In Sections 6 and 7, application of the proposed model and discussion are given.

Advertisement

2. Lindley shared frailty model

Let a continuous random variable V follows two parameter Lindley distribution (TPLDP) with parameters α and λ then density function of V is

fv=α2αλ+1λ+veλv;v>0,α>0,λα>10;otherwise,E5

and the Laplace transform is

LVs=α21+s+αλs+α21+λα,s+α>0.E6

The mean and variance of frailty variable are EZ=αλ+2ααλ+1 and VV=2+4αλ+α2λ2α2αλ+12. For identifiability, we assume V has expected value equal to one i.e. EV=1, which imply that α = ξ and λ=2ξξξ1. Under this restriction the density function and the Laplace transformation of Lindley distribution reduces to

fv=eξvξξ2v+ξξv2ξ2;v>0,ξ>00;otherwise,E7

and the Laplace transform is

LVs=ξξ+s2ξs+ξ2.E8

with variance of V is 4ξξ22ξ2. The frailty variable V is degenerate at V=1. Replacing Laplace transform in Eq. (4), we get the unconditional bivariate survival function for the jth individual as

Sy1ky2k=ξξ+ηkM01y1k+M02y2k2ξηkM01y1k+M02y2k+ξ2E9

where M01y1k and M02y2k are the cumulative baseline hazard functions of the lifetime Y1k and Y2k respectively.

And for without frailty, the model becomes

Sy1ky2k=eηkM01y1k+M02y2k.E10
Advertisement

3. Baseline distributions

As a starting point, we’ll look at the generalized Rayleigh distribution. Surles and Padgett [13] proposed the two-parameter Burr type X distribution, dubbed the generalized Rayleigh distribution, and demonstrated that the two-parameter generalized Rayleigh distribution may be utilized to describe strength and general lifetime data rather efficiently. The two-parameter generalized Rayleigh distribution can be utilized well in survival analysis to describe strength data as well as general lifetime data. If a continuous random variable Y has a two-parameter generalized Rayleigh distribution, the survival function, hazard function, and cumulative hazard function are as follows:

Sy=11eλy2α;y>0,λ>0,α>0E11
my=2αλ2yeλy21eλy2α111eλy2α;y>0,λ>0,α>0E12
My=log11eλy2α;y>0,λ>0,α>0E13

where α and λ stands for shape and scale parameters respectively of the distribution. It has also some attractive properties increasing hazard and bathtub type depends on the parameter value.

The second baseline distribution considered here is exponential power distribution. A continuous random variable Y is said to follow the exponential power distribution if its survival function, hazard function and cumulative hazard function are, respectively,

Sy=e1eλyα;y>0,λ>0,α>0E14
my=αλyα1eλyα;y>0,λ>0,α>0E15
My=eλyα1E16

where λ and α are the shape and scale parameters of the exponential power distribution. The hazard function and cumulative hazard function are respectively,

my=αλyα1eλyα;y>0,λ>0,α>0E17
My=eλyα1E18

The hazard function is decreasing function at time y when α<1 for smaller values of λ but as λ increases hazard function takes U shape curve and further increment in λ gives increasing nature to hazard function.

Advertisement

4. Proposed models

The unconditional survival function is obtained by replacing the cumulative hazard functions of generalized Rayleigh distribution and exponential power distribution in Eqs. (9) and (10). Then,

Sy1ky2k=elog11eλ1y1k2α1+log11eλ2y2k2α2ηk[1+ξ(log11eλ1y1k2α1+log11eλ2y2k2α2))]1/ξE19
Sy1ky2k=eα1λ1eλ1y1k1+α2λ2eλ2y2k1ηk1+ξα1λ1eλ1y1k1+α2λ2eλ2y2k11/ξE20
Sy1ky2k=elog11eλ1y1k2α1+log11eλ2y2k2α2ηkE21
Sy1ky2k=eα1λ1eλ1y1k1+α2λ2eλ2y2k1ηkE22

The Eqs. (19) and (20) are Lindley shared frailty model with generalized Rayleigh and exponential power as baseline distributions, called as Model-I and Model-II and Eqs. (21) and (22) are without frailty models under the same baseline distributions, called as Model-III and Model-IV.

Advertisement

5. Estimation strategies

By assuming independence between censoring scheme and individual lifetimes, the likelihood function associated with failure times for the kth people (k = 1,2,3, n) and censoring times is given by

IΨ¯βξ=k=1n1f1y1ky2kk=1n2f2y1kd2kk=1n3f3d1ky2kk=1n4f4d1kd2kE23

where Ψ¯, β and ξ are vectors of baseline parameters, regression coefficients and frailty distribution parameter. The likelihood function for without frailty is given as

IΨ¯β=k=1n1f1y1ky2kk=1n2f2y1kd2kk=1n3f3d1ky2kk=1n4f4d1kd2kE24

and n1,n2,n3 and n4 are the number of observations, which are observed to lie in the intervals y1k<d1k,y2k<d2k; y1k<d1k,y2k>d2k; y1k>d1k,y2k<d2k and y1k>d1k,y2k>d2k respectively and the contribution of the kth individual in the likelihood function as

f1y1ky2k=2Sy1ky2ky1ky2kf2y1kd2k=Sy1kd2ky1kf3d1ky2k=Sd1ky2ky2kf4d1kd2k=Sd1kd2kE25

Putting Eq. (24) in Eqs. (23) and (24), we get the likelihood functions for the Lindley shared frailty models under generalized Rayleigh and exponential power baseline distributions and likelihood function for without frailty models under the same baseline distributions.

The joint posterior density of the parameters given failure times is given as

πα1λ1α2λ2ξβLα1λ1α2λ2ξβ×g1α1g2λ1g3α2g4λ2g5ξi=15piβi

where gi. (i=1,2,,5) represent the prior density function of baseline parameters and frailty variance, which are suppose to have known hyper parameters; pi. represents prior density function for the regression coefficient βi; βi represents regression coefficients of vector form except βi, i=1,2,,a and likelihood function I. is also presented by Eqs. (23) and (24). It is assumed that all of the parameters are distributed independently in this case.

The expression of the likelihood function in Eqs. (23) and (24) are not easy to solve by using the Newton–Raphson method. MLEs fail to converge as it involved a large number of parameters. As a result, the Bayesian approach was used to estimate the parameters involved in the models, which is free of such issues.

Prior distributions are utilized as follows: for a frailty parameter with a small value of Ψ, a gamma distribution with mean 1 and big variance ΓΨΨ is used as a prior distribution. As a prior for the regression coefficient, say φ2, a normal distribution with mean zero and huge variance is utilized. Because we do not know anything about the baseline parameters, we use the same type of prior distributions used by Ibrahim et al. [14] and Sahu et al. [15], as well as a non-informative prior. As non-informative prior distributions, Γa1b1 and Ua2b2 are utilized. All the hyper-parameters Ψ,φ,a1,a2,b1 and b2 are supposed to be known in advanced. Here Γa1b1 stands for gamma distribution having shape parameter a1 and scale parameter b1 and Ua2b2 stands for the uniform distribution over the interval a2 to b2. We provide the hyper-parameters as Ψ=0.0001,φ2=1000,a1=1,b1=0.0001,a2=0, and b2=100.

The Metropolis Hasting Algorithm and Gibbs Sampler were used to estimate the parameters in the models fitted with the preceding prior density function and likelihood Eqs. (23) and (24), Metropolis Hasting Algorithm and Gibbs Sampler was utilized. Geweke test and Gelman-Rubin statistics, as suggested by Geweke [16] and Gelman et al. [17], show that the Markov chain converges to a stationary distribution. We used trace plots, coupling from the past plots, and sample autocorrelation plots to examine the chain’s behavior, as well as to determine the burn-in period and autocorrelation lag.

It is important to decide which model provides the best fit to the dataset, the comparison of models was done using Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Deviance Information Criteria (DIC) and Bayes factor.

Advertisement

6. Application in real life data

The models’ applicability was tested by applying them to infectious illness data relating to kidney infection that occurred during catheter implantation [18]. It includes 38 patients’ first and second recurrence times of infection from catheters used with portable dialysis equipment. For each patient in a cluster, these two times of infection are clustered together. Other pertinent data includes infection duration, patient age, gender (0 for male and 1 for female), and illness kinds such as Glomerulo Nephritis (GN), Acute Nephritis (AN), and Polycystic Kidney Disease (PKD).

To begin, the Kolmogorov–Smirnov test is used to determine the goodness of fit for kidney infection data, and the p-values obtained for the first and second recurrences are large enough to rule out the hypothesis that the first and second recurrence times follow the distributions with survival functions as given in Eqs. (11) and (14) in univariate case and it is also assumed to be appropriate for bivariate case. The corresponding p-values are given in Table 1. The posterior summary of the proposed models are presented in Tables 2 and 3. It consists of estimate (posterior mean), standard error, 95% lower and upper credible limits, GR statistics values with p-values and Geweke test values. From Tables 2 and 3, It is observed that we can observe that regression coefficients for all the models are more or less same. Also for all these proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates are seems to be significant. To test the models’ accuracy, we created 95% and 50% predictive intervals from a generated random sample based on a predictive distribution as described by Sahu et al. [15], and counted the total number of actual recurrence times for first and second kidney infections that fell within the intervals. The 95 percent and 50 percent predictive intervals are contained in the 95% and 50% predictive intervals for Models I and II, respectively, 76, 58, and 76, 60 out of 76 observations. This demonstrates that the two models are appropriate for the data. Model-I is a better model in terms of AIC, BIC, and DIC values, since it has lower AIC, BIC, and DIC values than Model-II in Table 4. However, because the difference between AIC, BIC, and DIC values for Model I and Model II is so little, AIC, BIC, and DIC values are not suitable for deciding between the two models. To compare model u with model v, we use the Bayes factor (Table 5). Model-I is better than Model-II, since the equivalent value of 2logBuv is larger than 10, suggesting that there is a very strong positive to favor Model-I over Model-II for the provided dataset, confirming our earlier conclusion in Table 4. As a result of all of the demonstrated comparison criteria, we can conclude that Model-I is superior to Model-II in terms of modeling kidney infection data.

DistributionRecurrence times
FirstSecond
Generalized Rayleigh0.980780.99889
Exponential power0.962910.75766

Table 1.

p-Values of K-S Statistics for goodness of fit test for kidney infection data set.

ParameterEstimateStandard errorLower credible limitUpper credible limitGeweke valuesp valuesGelman & Rubin values
Burn in period = 5600; autocorrelation lag = 275
α10.37160.03120.31180.42831.0007−0.00480.4980
α20.42530.04550.33260.50441.0003−0.00450.4981
λ10.00320.00040.00230.00411.0008−0.00170.4992
λ20.00260.00040.00180.00341.0031−0.00520.4979
ξ1.12870.04231.07221.21961.0032−0.00950.4961
β10.01530.00410.00830.02381.0052−0.00420.4983
β2−1.07920.2740−1.6013−0.53501.00010.00700.4983
β30.00210.00040.00120.00291.0008−0.00600.4975
β40.00310.00040.00220.00401.0005−0.00100.4995
β5−0.21490.0514-0.3031−0.09471.00080.00590.5023

Table 2.

Posterior results with baseline generalized Rayleigh distribution.

ParameterEstimateStandard errorLower credible limitUpper credible limitGeweke valuesp valuesGelman & Rubin values
Burn in period = 6800; autocorrelation lag = 280
α10.44400.02510.38780.49051.0010−0.00650.4973
α20.50400.03680.42630.56891.00020.00320.5012
λ10.31200.04710.23530.40211.00010.00290.5011
λ20.21500.04450.15300.31661.0016−0.00530.4978
ξ1.20610.04961.11971.30130.99990.00260.5010
β10.00010.00011.7e-050.00021.00030.00360.5014
β2−2.52470.3867−3.2854−1.74541.00210.00710.5014
β30.00200.00040.00120.00291.00060.01190.5047
β40.00310.00040.00210.00401.00030.01070.5042
β5−0.99160.4466−1.8481−0.17041.00270.00010.5000

Table 3.

Posterior results with baseline exponential power distribution.

Model no.AICBICDIC
Model I638.5262654.9020625.1190
Model II700.3005716.6763686.8069
Model III691.5817706.3200720.7843
Model IV702.0827716.8210689.8978

Table 4.

AIC, BIC and DIC values for all models.

Numerator model against denominator model2logeBuvRangeEvidence against model in denominator
ModelI against ModelII63.23936>10Very Strong Positive

Table 5.

Bayes factor values and decision for test of significance for frailty fitted to kidney infection data set.

Advertisement

7. Discussion

In this study, we examined a new Lindley shared frailty model under generalized Rayleigh and exponential power as baseline distributions.

To suit all of the proposed models, the Metropolis-Hastings and Gibbs sampler was used. The proposed models were used to assess kidney infection data, and the best model was suggested. To conduct the analysis, we used self-composed programs in the R statistical software.

All of the exhibited comparison criteria indicated that the Lindley shared frailty model with generalized Rayleigh baseline distribution is superior to exponential power baseline distribution and without frailty models for modeling kidney infection data under the identical baseline distributions. The estimates of frailty variance are 0.9415 and 0.9739, which are high in all the proposed models indicating that there is a strong evidence of a high degree of heterogeneity among the patients in the population. A few patients are anticipated to be exceptionally inclined to infection compared to others with the same covariate values. Some patients are expected to be very prone to infection compared to others with the same covariate values. Also we can say that there is a strong positive correlation between the two infection times for the same patient.

The most important properties of the proposed models that were not mentioned in the previous study are the estimates of the frailty variances are high in all proposed models as compared to previous study given by McGilchrist and Aisbett [18] on log-normal frailty, Hanagal and Bhambure [19], the disease type GN and AN has lower infection rates as compared to other covariates. All the covariates are significant factors for kidney infection, but the disease type are insignificant in the previous proposed frailty models (see [4]). It is very crucial to be mention that Lindly shared frailty model based on generalized Rayleigh baseline distribution is performed better to analyze kidney infection data than other frailty models [4, 19].

References

  1. 1. Vaupel, J. W., Manton, K. G., Stallard, E. 1979. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16(3), 439-454
  2. 2. Gupta, R.D., and Kundu, D. (2001). Exponentiated Exponential Family: An Alternative to Gamma and Weibull Distributions, Biometrical Journal,43(1), 117–130
  3. 3. Kheiri, S., A. Kimber, and M.R. Meshkani. 2007. Bayesian analysis of an inverse Gaussian correlated frailty model. Computational Statistics and Data Analysis 51: 5317-5326
  4. 4. Hanagal, D.D., Dabade, A.D. 2013. Compound negative binomial shared frailty models for bivariate survival data. Statistics and Probability Letters, 83, 2507-2515
  5. 5. Pandey, A., Bhushan, S., Lalpawimawha, R. 2018. Shared frailty models with baseline generalized Pareto distribution, Communications in Statistics-Theory and Methods, DOI: 10.1080/03610926.2018.1500597
  6. 6. Pandey, A., Bhushan, S., Lalpawimawha, R., Misra, P.K. 2019. Comparison of additive shared frailty models under Lindley baseline distribution, Communications in Statistics-Simulation and Computation, DOI: 10.1080/03610918.2019.1664573
  7. 7. Lindley, D.V. 1958. Fiducial distributions and Bayes’ theorem. Journal of the Royal Statistical Society, Series B, 20, 102-107
  8. 8. Ghitany, M.E., Atieh, B., Nadarajah, S. 2008. Lindley Distribution and Its Applications. Mathematics and Computers in Simulation, 78(4), 2008, 493-506
  9. 9. Ghitany, M.E., Alqallaf, F., Al-Mutairi, D.K., Hussain, H. 2011. A Two Parameter Weighted Lindley Distribu-tion and Its Applications to Survival Data. Mathematics and Computers in Simulation, 81(6), 1190-1201
  10. 10. Bakouch, H.S., Al-Zahrani, B.M., Al-Sho, A.A., Marchi, A.A., Louzada, F. 2012. An Extended Lindley Distribution. Journal of the Korean Statistical Society, 41(1), 75-85
  11. 11. Ghitany, M., Al-Mutairi, D., Balakrishnan, N. and Al-Enezi, I., 2013. Power Lindley distribution and associated inference. Computational Statistics and Data Analysis, 64, 20-33
  12. 12. Shanker, R., Sharma, S., Shanker, R. 2013. A Two-Parameter Lindley Distribution for Modeling Waiting and Survival Times Data. Applied Mathematics, 4, 363-368
  13. 13. Surles, J. G., Padgett, W.J. (2001). Inference for reliability and stress-strength for a scaled Burr Type X distribution. Lifetime Data Anal. 7,187-200
  14. 14. Ibrahim, J.G., Chen, Ming-Hui, Sinha, D., 2001. Bayesian Survival Analysis. Springer-Verlag
  15. 15. Sahu, S.K., D.K. Dey, H. Aslanidou, & D. Sinha. 1997. A Weibull regression model with gamma frailties for multivariate survival data. Life time data analysis, 3, 123-137
  16. 16. Geweke, J. 1992. “Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments.” In Bayesian Statistics 4 (eds. J.M. Bernardo, J. Berger, A.P. Dawid and A.F.M. Smith), 169-193. Oxford: Oxford University Press
  17. 17. Gelman, A., & D.B. Rubin. 1992. A single series from the Gibbs sampler provides a false sense of security. In Bayesian Statistics 4 (J. M. Bernardo, J. 0.Berger, A. P. Dawid and A. F. M. Smith, eds.). Oxford: Oxford Univ. Press, 625-632
  18. 18. McGilchrist, C.A., & C.W. Aisbett. 1991. Regression with frailty in survival analysis. Biometrics, 47, 461-466
  19. 19. Hanagal, D.D., & Bhambure, S.M. (2016). Modeling bivariate survival data using shared inverse Gaussian frailty model. Communications in Statistics-Theory and Methods, 45(17), 4969-4987

Written By

Arvind Pandey and Lalpawimawha

Reviewed: 07 September 2021 Published: 27 November 2021