p-Values of K-S Statistics for goodness of fit test for kidney infection data set.

## Abstract

Frailty models are intended for use in survival analysis to explain unobserved heterogeneity in an individual caused by various hereditary variables or environmental influences. A shared frailty model was utilized to examine the data. It is based on the idea that frailty affects the hazard rate in a multiplicative manner. In this manuscript, we introduce a new frailty model called the Lindley shared frailty model with exponential power and generalized Rayleigh as baseline distributions. The The Bayesian method of the Monte Carlo method of the Markov chain is used to estimate the parameters used in the model; simulation studies are also carried out to compare the actual and calculated values of the parameters; the proposed model is compared with the Bayesian comparison method Compare and propose the best model of infectious disease data.

### Keywords

- Bayesian technique
- exponential power distribution
- generalized Rayleigh distribution
- Lindley frailty
- MCMC

## 1. Introduction

The term frailty was coined by Vaupel et al. [1]. The frailty model is typically represented as an unobservable random variable that multiplies the risk function, with the frailty random variable supposed to be one of the parameter distributions, such as gamma, log-normal, positive stable, inverse Gaussian, power variance function, and so on. Let

where

where

where

Frailty models have gained more attention in the recent medical research due to the uniqueness property of the frailty parameter. Generally, gamma distribution, log-normal distribution and inverse Gaussian distributions are the most commonly used frailty distributions [2, 3]. Hanagal and Dabade [4] introduced new Compound negative binomial shared frailty models for bivariate survival data using Weibull and generalized exponential as baseline distributions. Pandey et al. [5] compared gamma, inverse Gaussian and positive frailty models with generalized Pareto as baseline distribution. Pandey et al. [6] also compared gamma and inverse Gaussian frailty distributions under additive property.

To extract the features of the Lindley shared frailty model, we used Lindley as a frailty distribution with right censored data under generalized Rayleigh and exponential power as baseline distributions. The survival periods are dependent in this case because the frailty variable follows the Lindley distribution. The predicted value of the frailty distribution variance influences the population’s degree of heterogeneity. The higher the variance of the frailty distribution, the more heterogeneity there is in the population under consideration. The frailty distribution becomes degraded when zero variance is observed. As a baseline distribution, the exponential power distribution is used. Because it exhibits a rising hazard rate, which is typical in real-life distributions, the exponential power distribution is chosen as the baseline distribution. The Lindley distribution with one parameter was first proposed by Lindley [7] for analyzing failure times data. It belongs to an exponential family, but it is used as an alternative to the exponential distribution. Lindley distribution is alluring due to the ability of modeling failure time data with increasing, decreasing, unimodal and bathtub shaped hazard rates. Ghitany et al. [8, 9] discussed different properties of Lindley distribution and also showed that Lindley distribution is better than the exponential distribution for modeling failure time data when considering hazard rate is unimodal or bathtub shaped. It is also shown that Lindley distribution is more flexible than exponential distribution in modeling lifetime data. Many authors have discussed and introduced different generalization of Lindley distribution. Bakouch et al. [10] introduced extended Lindley distribution. Ghitany et al. [11] proposed the power Lindley distribution and Shanker et al. [12] proposed two parameter Lindley distribution, which could also be reduced to one parameter case. The mean of a two parameter Lindley distribution is always greater than the mode indicating that the distribution is positively skewed.

The classic approach and the Bayesian approach are two widely utilized techniques in general. We can employ prior distributions here, therefore we’ll estimate the model parameters using the Bayesian Markov Chain Monte Carlo (MCMC) approach. Furthermore, because characteristics with diverse posterior distributions may be easily generated, the results and model selection criteria can be clearly interpreted. Run after thinning mean and autocorrelation plots, follow-up plots, past plot couplings, sample autocorrelation plots dictate chain behavior, burn duration, autocorrelation delay, and how observations are made It’s utilized for cognitive confirmation on its own. We also give simulation experiments to back up the model’s performance. All of the model’s estimation processes are detailed, as well as infection statistics relating to kidney infections.

In Sections 2 and 3, the introduction of the Lindley shared frailty model and baseline distributions are given, followed by proposed models and estimation strategies in Sections 4 and 5. In Sections 6 and 7, application of the proposed model and discussion are given.

## 2. Lindley shared frailty model

Let a continuous random variable

and the Laplace transform is

The mean and variance of frailty variable are

and the Laplace transform is

with variance of

where

And for without frailty, the model becomes

## 3. Baseline distributions

As a starting point, we’ll look at the generalized Rayleigh distribution. Surles and Padgett [13] proposed the two-parameter Burr type X distribution, dubbed the generalized Rayleigh distribution, and demonstrated that the two-parameter generalized Rayleigh distribution may be utilized to describe strength and general lifetime data rather efficiently. The two-parameter generalized Rayleigh distribution can be utilized well in survival analysis to describe strength data as well as general lifetime data. If a continuous random variable

where

The second baseline distribution considered here is exponential power distribution. A continuous random variable

where

The hazard function is decreasing function at time

## 4. Proposed models

The unconditional survival function is obtained by replacing the cumulative hazard functions of generalized Rayleigh distribution and exponential power distribution in Eqs. (9) and (10). Then,

The Eqs. (19) and (20) are Lindley shared frailty model with generalized Rayleigh and exponential power as baseline distributions, called as Model-I and Model-II and Eqs. (21) and (22) are without frailty models under the same baseline distributions, called as Model-III and Model-IV.

## 5. Estimation strategies

By assuming independence between censoring scheme and individual lifetimes, the likelihood function associated with failure times for the

where

and

Putting Eq. (24) in Eqs. (23) and (24), we get the likelihood functions for the Lindley shared frailty models under generalized Rayleigh and exponential power baseline distributions and likelihood function for without frailty models under the same baseline distributions.

The joint posterior density of the parameters given failure times is given as

where

The expression of the likelihood function in Eqs. (23) and (24) are not easy to solve by using the Newton–Raphson method. MLEs fail to converge as it involved a large number of parameters. As a result, the Bayesian approach was used to estimate the parameters involved in the models, which is free of such issues.

Prior distributions are utilized as follows: for a frailty parameter with a small value of

The Metropolis Hasting Algorithm and Gibbs Sampler were used to estimate the parameters in the models fitted with the preceding prior density function and likelihood Eqs. (23) and (24), Metropolis Hasting Algorithm and Gibbs Sampler was utilized. Geweke test and Gelman-Rubin statistics, as suggested by Geweke [16] and Gelman et al. [17], show that the Markov chain converges to a stationary distribution. We used trace plots, coupling from the past plots, and sample autocorrelation plots to examine the chain’s behavior, as well as to determine the burn-in period and autocorrelation lag.

It is important to decide which model provides the best fit to the dataset, the comparison of models was done using Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Deviance Information Criteria (DIC) and Bayes factor.

## 6. Application in real life data

The models’ applicability was tested by applying them to infectious illness data relating to kidney infection that occurred during catheter implantation [18]. It includes 38 patients’ first and second recurrence times of infection from catheters used with portable dialysis equipment. For each patient in a cluster, these two times of infection are clustered together. Other pertinent data includes infection duration, patient age, gender (0 for male and 1 for female), and illness kinds such as Glomerulo Nephritis (GN), Acute Nephritis (AN), and Polycystic Kidney Disease (PKD).

To begin, the Kolmogorov–Smirnov test is used to determine the goodness of fit for kidney infection data, and the p-values obtained for the first and second recurrences are large enough to rule out the hypothesis that the first and second recurrence times follow the distributions with survival functions as given in Eqs. (11) and (14) in univariate case and it is also assumed to be appropriate for bivariate case. The corresponding p-values are given in Table 1. The posterior summary of the proposed models are presented in Tables 2 and 3. It consists of estimate (posterior mean), standard error, 95% lower and upper credible limits, GR statistics values with p-values and Geweke test values. From Tables 2 and 3, It is observed that we can observe that regression coefficients for all the models are more or less same. Also for all these proposed models, the value zero is not a credible value for all the credible intervals of the regression coefficients, so all the covariates are seems to be significant. To test the models’ accuracy, we created 95% and 50% predictive intervals from a generated random sample based on a predictive distribution as described by Sahu et al. [15], and counted the total number of actual recurrence times for first and second kidney infections that fell within the intervals. The 95 percent and 50 percent predictive intervals are contained in the 95% and 50% predictive intervals for Models I and II, respectively, 76, 58, and 76, 60 out of 76 observations. This demonstrates that the two models are appropriate for the data. Model-I is a better model in terms of AIC, BIC, and DIC values, since it has lower AIC, BIC, and DIC values than Model-II in Table 4. However, because the difference between AIC, BIC, and DIC values for Model I and Model II is so little, AIC, BIC, and DIC values are not suitable for deciding between the two models. To compare model

Distribution | Recurrence times | |
---|---|---|

First | Second | |

Generalized Rayleigh | 0.98078 | 0.99889 |

Exponential power | 0.96291 | 0.75766 |

Parameter | Estimate | Standard error | Lower credible limit | Upper credible limit | Geweke values | p values | Gelman & Rubin values |
---|---|---|---|---|---|---|---|

Burn in period = 5600; autocorrelation lag = 275 | |||||||

0.3716 | 0.0312 | 0.3118 | 0.4283 | 1.0007 | −0.0048 | 0.4980 | |

0.4253 | 0.0455 | 0.3326 | 0.5044 | 1.0003 | −0.0045 | 0.4981 | |

0.0032 | 0.0004 | 0.0023 | 0.0041 | 1.0008 | −0.0017 | 0.4992 | |

0.0026 | 0.0004 | 0.0018 | 0.0034 | 1.0031 | −0.0052 | 0.4979 | |

1.1287 | 0.0423 | 1.0722 | 1.2196 | 1.0032 | −0.0095 | 0.4961 | |

0.0153 | 0.0041 | 0.0083 | 0.0238 | 1.0052 | −0.0042 | 0.4983 | |

−1.0792 | 0.2740 | −1.6013 | −0.5350 | 1.0001 | 0.0070 | 0.4983 | |

0.0021 | 0.0004 | 0.0012 | 0.0029 | 1.0008 | −0.0060 | 0.4975 | |

0.0031 | 0.0004 | 0.0022 | 0.0040 | 1.0005 | −0.0010 | 0.4995 | |

−0.2149 | 0.0514 | -0.3031 | −0.0947 | 1.0008 | 0.0059 | 0.5023 |

Parameter | Estimate | Standard error | Lower credible limit | Upper credible limit | Geweke values | p values | Gelman & Rubin values |
---|---|---|---|---|---|---|---|

Burn in period = 6800; autocorrelation lag = 280 | |||||||

0.4440 | 0.0251 | 0.3878 | 0.4905 | 1.0010 | −0.0065 | 0.4973 | |

0.5040 | 0.0368 | 0.4263 | 0.5689 | 1.0002 | 0.0032 | 0.5012 | |

0.3120 | 0.0471 | 0.2353 | 0.4021 | 1.0001 | 0.0029 | 0.5011 | |

0.2150 | 0.0445 | 0.1530 | 0.3166 | 1.0016 | −0.0053 | 0.4978 | |

1.2061 | 0.0496 | 1.1197 | 1.3013 | 0.9999 | 0.0026 | 0.5010 | |

0.0001 | 0.0001 | 1.7e-05 | 0.0002 | 1.0003 | 0.0036 | 0.5014 | |

−2.5247 | 0.3867 | −3.2854 | −1.7454 | 1.0021 | 0.0071 | 0.5014 | |

0.0020 | 0.0004 | 0.0012 | 0.0029 | 1.0006 | 0.0119 | 0.5047 | |

0.0031 | 0.0004 | 0.0021 | 0.0040 | 1.0003 | 0.0107 | 0.5042 | |

−0.9916 | 0.4466 | −1.8481 | −0.1704 | 1.0027 | 0.0001 | 0.5000 |

Model no. | AIC | BIC | DIC |
---|---|---|---|

Model I | 638.5262 | 654.9020 | 625.1190 |

Model II | 700.3005 | 716.6763 | 686.8069 |

Model III | 691.5817 | 706.3200 | 720.7843 |

Model IV | 702.0827 | 716.8210 | 689.8978 |

Numerator model against denominator model | Range | Evidence against model in denominator | |
---|---|---|---|

63.23936 | Very Strong Positive |

## 7. Discussion

In this study, we examined a new Lindley shared frailty model under generalized Rayleigh and exponential power as baseline distributions.

To suit all of the proposed models, the Metropolis-Hastings and Gibbs sampler was used. The proposed models were used to assess kidney infection data, and the best model was suggested. To conduct the analysis, we used self-composed programs in the R statistical software.

All of the exhibited comparison criteria indicated that the Lindley shared frailty model with generalized Rayleigh baseline distribution is superior to exponential power baseline distribution and without frailty models for modeling kidney infection data under the identical baseline distributions. The estimates of frailty variance are 0.9415 and 0.9739, which are high in all the proposed models indicating that there is a strong evidence of a high degree of heterogeneity among the patients in the population. A few patients are anticipated to be exceptionally inclined to infection compared to others with the same covariate values. Some patients are expected to be very prone to infection compared to others with the same covariate values. Also we can say that there is a strong positive correlation between the two infection times for the same patient.

The most important properties of the proposed models that were not mentioned in the previous study are the estimates of the frailty variances are high in all proposed models as compared to previous study given by McGilchrist and Aisbett [18] on log-normal frailty, Hanagal and Bhambure [19], the disease type GN and AN has lower infection rates as compared to other covariates. All the covariates are significant factors for kidney infection, but the disease type are insignificant in the previous proposed frailty models (see [4]). It is very crucial to be mention that Lindly shared frailty model based on generalized Rayleigh baseline distribution is performed better to analyze kidney infection data than other frailty models [4, 19].