p-Values of K-S Statistics for goodness of fit test for kidney infection data set.
Frailty models are intended for use in survival analysis to explain unobserved heterogeneity in an individual caused by various hereditary variables or environmental influences. A shared frailty model was utilized to examine the data. It is based on the idea that frailty affects the hazard rate in a multiplicative manner. In this manuscript, we introduce a new frailty model called the Lindley shared frailty model with exponential power and generalized Rayleigh as baseline distributions. The The Bayesian method of the Monte Carlo method of the Markov chain is used to estimate the parameters used in the model; simulation studies are also carried out to compare the actual and calculated values of the parameters; the proposed model is compared with the Bayesian comparison method Compare and propose the best model of infectious disease data.
- Bayesian technique
- exponential power distribution
- generalized Rayleigh distribution
- Lindley frailty
The term frailty was coined by Vaupel et al. . The frailty model is typically represented as an unobservable random variable that multiplies the risk function, with the frailty random variable supposed to be one of the parameter distributions, such as gamma, log-normal, positive stable, inverse Gaussian, power variance function, and so on. Let be a continuous random variable of lifetime of an individual and the frailty random variable (RV) be . The conditional hazard function (CHF) for a given frailty variable at time is
where is a baseline hazard function (BHF) at time , is a covariate and is a regression coefficient, these are in vector form. The CHF for given frailty at time is
where is cumulative baseline hazard function (CBHF) at time . Integrating over the range of frailty variable having density , we get marginal survival function as
where is a Laplace transformation of the distribution of . Once we have survival function at time of lifetime random variable of an individual one can obtain probability structure and can base their inference on it.
Frailty models have gained more attention in the recent medical research due to the uniqueness property of the frailty parameter. Generally, gamma distribution, log-normal distribution and inverse Gaussian distributions are the most commonly used frailty distributions [2, 3]. Hanagal and Dabade  introduced new Compound negative binomial shared frailty models for bivariate survival data using Weibull and generalized exponential as baseline distributions. Pandey et al.  compared gamma, inverse Gaussian and positive frailty models with generalized Pareto as baseline distribution. Pandey et al.  also compared gamma and inverse Gaussian frailty distributions under additive property.
To extract the features of the Lindley shared frailty model, we used Lindley as a frailty distribution with right censored data under generalized Rayleigh and exponential power as baseline distributions. The survival periods are dependent in this case because the frailty variable follows the Lindley distribution. The predicted value of the frailty distribution variance influences the population’s degree of heterogeneity. The higher the variance of the frailty distribution, the more heterogeneity there is in the population under consideration. The frailty distribution becomes degraded when zero variance is observed. As a baseline distribution, the exponential power distribution is used. Because it exhibits a rising hazard rate, which is typical in real-life distributions, the exponential power distribution is chosen as the baseline distribution. The Lindley distribution with one parameter was first proposed by Lindley  for analyzing failure times data. It belongs to an exponential family, but it is used as an alternative to the exponential distribution. Lindley distribution is alluring due to the ability of modeling failure time data with increasing, decreasing, unimodal and bathtub shaped hazard rates. Ghitany et al. [8, 9] discussed different properties of Lindley distribution and also showed that Lindley distribution is better than the exponential distribution for modeling failure time data when considering hazard rate is unimodal or bathtub shaped. It is also shown that Lindley distribution is more flexible than exponential distribution in modeling lifetime data. Many authors have discussed and introduced different generalization of Lindley distribution. Bakouch et al.  introduced extended Lindley distribution. Ghitany et al.  proposed the power Lindley distribution and Shanker et al.  proposed two parameter Lindley distribution, which could also be reduced to one parameter case. The mean of a two parameter Lindley distribution is always greater than the mode indicating that the distribution is positively skewed.
The classic approach and the Bayesian approach are two widely utilized techniques in general. We can employ prior distributions here, therefore we’ll estimate the model parameters using the Bayesian Markov Chain Monte Carlo (MCMC) approach. Furthermore, because characteristics with diverse posterior distributions may be easily generated, the results and model selection criteria can be clearly interpreted. Run after thinning mean and autocorrelation plots, follow-up plots, past plot couplings, sample autocorrelation plots dictate chain behavior, burn duration, autocorrelation delay, and how observations are made It’s utilized for cognitive confirmation on its own. We also give simulation experiments to back up the model’s performance. All of the model’s estimation processes are detailed, as well as infection statistics relating to kidney infections.
In Sections 2 and 3, the introduction of the Lindley shared frailty model and baseline distributions are given, followed by proposed models and estimation strategies in Sections 4 and 5. In Sections 6 and 7, application of the proposed model and discussion are given.
2. Lindley shared frailty model
Let a continuous random variable follows two parameter Lindley distribution (TPLDP) with parameters and then density function of is
and the Laplace transform is
The mean and variance of frailty variable are and . For identifiability, we assume has expected value equal to one i.e. , which imply that = and =. Under this restriction the density function and the Laplace transformation of Lindley distribution reduces to
and the Laplace transform is
with variance of is . The frailty variable is degenerate at . Replacing Laplace transform in Eq. (4), we get the unconditional bivariate survival function for the individual as
where and are the cumulative baseline hazard functions of the lifetime and respectively.
And for without frailty, the model becomes
3. Baseline distributions
As a starting point, we’ll look at the generalized Rayleigh distribution. Surles and Padgett  proposed the two-parameter Burr type X distribution, dubbed the generalized Rayleigh distribution, and demonstrated that the two-parameter generalized Rayleigh distribution may be utilized to describe strength and general lifetime data rather efficiently. The two-parameter generalized Rayleigh distribution can be utilized well in survival analysis to describe strength data as well as general lifetime data. If a continuous random variable has a two-parameter generalized Rayleigh distribution, the survival function, hazard function, and cumulative hazard function are as follows:
where and stands for shape and scale parameters respectively of the distribution. It has also some attractive properties increasing hazard and bathtub type depends on the parameter value.
The second baseline distribution considered here is exponential power distribution. A continuous random variable is said to follow the exponential power distribution if its survival function, hazard function and cumulative hazard function are, respectively,
where and are the shape and scale parameters of the exponential power distribution. The hazard function and cumulative hazard function are respectively,
The hazard function is decreasing function at time when for smaller values of but as increases hazard function takes shape curve and further increment in gives increasing nature to hazard function.
4. Proposed models
The Eqs. (19) and (20) are Lindley shared frailty model with generalized Rayleigh and exponential power as baseline distributions, called as Model-I and Model-II and Eqs. (21) and (22) are without frailty models under the same baseline distributions, called as Model-III and Model-IV.
5. Estimation strategies
By assuming independence between censoring scheme and individual lifetimes, the likelihood function associated with failure times for the people (= 1,2,3, n) and censoring times is given by
where , and are vectors of baseline parameters, regression coefficients and frailty distribution parameter. The likelihood function for without frailty is given as
and and are the number of observations, which are observed to lie in the intervals ; ; and respectively and the contribution of the individual in the likelihood function as
Putting Eq. (24) in Eqs. (23) and (24), we get the likelihood functions for the Lindley shared frailty models under generalized Rayleigh and exponential power baseline distributions and likelihood function for without frailty models under the same baseline distributions.
The joint posterior density of the parameters given failure times is given as
where () represent the prior density function of baseline parameters and frailty variance, which are suppose to have known hyper parameters; represents prior density function for the regression coefficient ; represents regression coefficients of vector form except , and likelihood function is also presented by Eqs. (23) and (24). It is assumed that all of the parameters are distributed independently in this case.
The expression of the likelihood function in Eqs. (23) and (24) are not easy to solve by using the Newton–Raphson method. MLEs fail to converge as it involved a large number of parameters. As a result, the Bayesian approach was used to estimate the parameters involved in the models, which is free of such issues.
Prior distributions are utilized as follows: for a frailty parameter with a small value of , a gamma distribution with mean 1 and big variance is used as a prior distribution. As a prior for the regression coefficient, say , a normal distribution with mean zero and huge variance is utilized. Because we do not know anything about the baseline parameters, we use the same type of prior distributions used by Ibrahim et al.  and Sahu et al. , as well as a non-informative prior. As non-informative prior distributions, and are utilized. All the hyper-parameters and are supposed to be known in advanced. Here stands for gamma distribution having shape parameter and scale parameter and stands for the uniform distribution over the interval to . We provide the hyper-parameters as , and .
The Metropolis Hasting Algorithm and Gibbs Sampler were used to estimate the parameters in the models fitted with the preceding prior density function and likelihood Eqs. (23) and (24), Metropolis Hasting Algorithm and Gibbs Sampler was utilized. Geweke test and Gelman-Rubin statistics, as suggested by Geweke  and Gelman et al. , show that the Markov chain converges to a stationary distribution. We used trace plots, coupling from the past plots, and sample autocorrelation plots to examine the chain’s behavior, as well as to determine the burn-in period and autocorrelation lag.
It is important to decide which model provides the best fit to the dataset, the comparison of models was done using Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Deviance Information Criteria (DIC) and Bayes factor.
6. Application in real life data
The models’ applicability was tested by applying them to infectious illness data relating to kidney infection that occurred during catheter implantation . It includes 38 patients’ first and second recurrence times of infection from catheters used with portable dialysis equipment. For each patient in a cluster, these two times of infection are clustered together. Other pertinent data includes infection duration, patient age, gender (0 for male and 1 for female), and illness kinds such as Glomerulo Nephritis (GN), Acute Nephritis (AN), and Polycystic Kidney Disease (PKD).
To begin, the Kolmogorov–Smirnov test is used to determine the goodness of fit for kidney infection data, and the p-values obtained for the first and second recurrences are large enough to rule out the hypothesis that the first and second recurrence times follow the distributions with survival functions as given in Eqs. (11) and (14) in univariate case and it is also assumed to be appropriate for bivariate case. The corresponding p-values are given in Table 1. The posterior summary of the proposed models are presented in Tables 2 and 3. It consists of estimate (posterior mean), standard error, 95% lower and upper credible limits, GR statistics values with p-values and Geweke test values. From Tables 2 and 3, It is observed that we can observe that regression coefficients for all the models are more or less same. Also for all these proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates are seems to be significant. To test the models’ accuracy, we created 95% and 50% predictive intervals from a generated random sample based on a predictive distribution as described by Sahu et al. , and counted the total number of actual recurrence times for first and second kidney infections that fell within the intervals. The 95 percent and 50 percent predictive intervals are contained in the 95% and 50% predictive intervals for Models I and II, respectively, 76, 58, and 76, 60 out of 76 observations. This demonstrates that the two models are appropriate for the data. Model-I is a better model in terms of AIC, BIC, and DIC values, since it has lower AIC, BIC, and DIC values than Model-II in Table 4. However, because the difference between AIC, BIC, and DIC values for Model I and Model II is so little, AIC, BIC, and DIC values are not suitable for deciding between the two models. To compare model with model , we use the Bayes factor (Table 5). Model-I is better than Model-II, since the equivalent value of is larger than 10, suggesting that there is a very strong positive to favor Model-I over Model-II for the provided dataset, confirming our earlier conclusion in Table 4. As a result of all of the demonstrated comparison criteria, we can conclude that Model-I is superior to Model-II in terms of modeling kidney infection data.
|Parameter||Estimate||Standard error||Lower credible limit||Upper credible limit||Geweke values||p values||Gelman & Rubin values|
|Burn in period = 5600; autocorrelation lag = 275|
|Parameter||Estimate||Standard error||Lower credible limit||Upper credible limit||Geweke values||p values||Gelman & Rubin values|
|Burn in period = 6800; autocorrelation lag = 280|
|Numerator model against denominator model||Range||Evidence against model in denominator|
|against||63.23936||Very Strong Positive|
In this study, we examined a new Lindley shared frailty model under generalized Rayleigh and exponential power as baseline distributions.
To suit all of the proposed models, the Metropolis-Hastings and Gibbs sampler was used. The proposed models were used to assess kidney infection data, and the best model was suggested. To conduct the analysis, we used self-composed programs in the R statistical software.
All of the exhibited comparison criteria indicated that the Lindley shared frailty model with generalized Rayleigh baseline distribution is superior to exponential power baseline distribution and without frailty models for modeling kidney infection data under the identical baseline distributions. The estimates of frailty variance are 0.9415 and 0.9739, which are high in all the proposed models indicating that there is a strong evidence of a high degree of heterogeneity among the patients in the population. A few patients are anticipated to be exceptionally inclined to infection compared to others with the same covariate values. Some patients are expected to be very prone to infection compared to others with the same covariate values. Also we can say that there is a strong positive correlation between the two infection times for the same patient.
The most important properties of the proposed models that were not mentioned in the previous study are the estimates of the frailty variances are high in all proposed models as compared to previous study given by McGilchrist and Aisbett  on log-normal frailty, Hanagal and Bhambure , the disease type GN and AN has lower infection rates as compared to other covariates. All the covariates are significant factors for kidney infection, but the disease type are insignificant in the previous proposed frailty models (see ). It is very crucial to be mention that Lindly shared frailty model based on generalized Rayleigh baseline distribution is performed better to analyze kidney infection data than other frailty models [4, 19].