Bedside Linear Regression Equations to Estimate Equilibrated Blood Urea

Three decades ago Sargent and Gotch established the clinical applicability of Kt/V, a dimensionless ratio which includes clearance of dialyzer (K),duration of treatment(t) and volume of total water of the patient (V), as an index of Hemodialysis (HD) adequacy (Gotch & Keen, 2005). This parameter, derived from single-pool(sp) urea(U) kinetic modelling, has become the gold standard for HD dose monitoring and it is widely used as a predictor of outcome in HD populations (Locatelli et al., 1999; Eknoyan et al., 2002; Locatelli, 2003). However, this spKt/V overestimates the HD dose because it does not take into account the concept of U rebound (UR). UR begins immediately at the end of HD session and it is completed 30-60 minutes after. UR is related to disequilibriums in blood/cell compartments as well as the flow between organs desequilibriums, both produced during HD treatment. Therefore, equilibrated (Eq) Kt/V is the true HD dose and it requires the measurement of a true eqU when UR is completed. A blood sample to obtain an eqU concentration has several drawbacks that make this option impractical (Gotch and Keen,2005). For this reason in the last decade several formulas were developed to predict the eqU and also (Eq) Kt/V eliminating the need of waiting for a equilibrated urea mesurement. For instance, the “rate formula” (Daurgidas et al., 1995) is the most popular and validated equation. It is based in the prediction of (Eq)Kt/V as a linear function of (sp)Kt/V and the rate of dialysis(K/V). Another approach has been proposed by Tattersall, a robust formula based on double–pool analysis (Smye et al.1999). However, spite this eqU prediction approach is conceptually rigorous, it is not accurate (Gotch, 1990; Guh et al., 1999; Fernandez et al., 2001). Consequently, the availability of a model to predict subject-specific equilibrated concentration will be very helpful. Although the behaviour of urea is non-linear since its extraction from blood follows some exponential family model as a function of time, we found that prediction of its equilibrated concentration after the end of the treatment session by means of linear models is accurate. In this study, we have shown how to build linear models to predict equilibrated urea based on two statistical procedures and a machine learning method that can be implemented in hemodialysis centres. The fitted model can be used for daily treatment monitoring and is


Introduction
Three decades ago Sargent and Gotch established the clinical applicability of Kt/V, a dimensionless ratio which includes clearance of dialyzer (K),duration of treatment(t) and volume of total water of the patient (V), as an index of Hemodialysis (HD) adequacy (Gotch & Keen, 2005). This parameter, derived from single-pool(sp) urea(U) kinetic modelling, has become the gold standard for HD dose monitoring and it is widely used as a predictor of outcome in HD populations (Locatelli et al., 1999;Eknoyan et al., 2002;Locatelli, 2003). However, this spKt/V overestimates the HD dose because it does not take into account the concept of U rebound (UR). UR begins immediately at the end of HD session and it is completed 30-60 minutes after. UR is related to disequilibriums in blood/cell compartments as well as the flow between organs desequilibriums, both produced during HD treatment. Therefore, equilibrated (Eq) Kt/V is the true HD dose and it requires the measurement of a true eqU when UR is completed. A blood sample to obtain an eqU concentration has several drawbacks that make this option impractical (Gotch and Keen,2005). For this reason in the last decade several formulas were developed to predict the eqU and also (Eq) Kt/V eliminating the need of waiting for a equilibrated urea mesurement. For instance, the "rate formula" (Daurgidas et al., 1995) is the most popular and validated equation. It is based in the prediction of (Eq)Kt/V as a linear function of (sp)Kt/V and the rate of dialysis(K/V). Another approach has been proposed by Tattersall, a robust formula based on double-pool analysis (Smye et al.1999). However, spite this eqU prediction approach is conceptually rigorous, it is not accurate (Gotch, 1990;Guh et al., 1999;Fernandez et al., 2001). Consequently, the availability of a model to predict subject-specific equilibrated concentration will be very helpful. Although the behaviour of urea is non-linear since its extraction from blood follows some exponential family model as a function of time, we found that prediction of its equilibrated concentration after the end of the treatment session by means of linear models is accurate. In this study, we have shown how to build linear models to predict equilibrated urea based on two statistical procedures and a machine learning method that can be implemented in hemodialysis centres. The fitted model can be used for daily treatment monitoring and is easily implemented in common available spreadsheets. A linear model is based on linear combinations of unknown parameters which must be estimated from data. The first step in looking for an appropriate model relies on prior knowledge or basic assumptions about the problem at hand that should be expressed in a hypothesized mathematical structure. The model can be expressed as E(Y)=f (X,β) , where E(Y) is the expected value of the output vector, "f " is a linear function, i.e.     01 12 2 , .....
, X is a matrix of input variables and β is a vector of parameters that needs to be estimated. In this way a set of potential mappings has been defined. The second step implies the estimation of the components of the vector β. This step includes the selection of a specific mapping (a 'proper' β) from the set of possible ones, choosing the parameter vector β that performs best according to some optimization criteria. There are several techniques to find a proper β when using a linear model, being β an estimation of β vector. Each of them has its own assumptions and requirements. Here we explore three different approaches for the estimation of the parameters of the β vector. They are: the Ordinary Least Square (OLS) procedure, based on the minimization of the sum of squared residuals which assume independence on the X matrix columns. The Partial Least Square (PLS) method based on decomposition schema maximizing the estimated covariance between the input and its outputs, and which is able to handle co-linearity or lack of independence among the X matrix columns. Finally, we use the Support Vector Machine algorithm (SVM) which is based on the minimization of the empirical risk over ε-sensitive loss functions. In this study, the three regression procedures were used to estimate the β coefficients in order to predict the equilibrated urea concentration at the end of the dialysis session. The input variables were the intradialysis urea concentrations (U 0 , U 120 , U 240 ), the predialysis body weight and ultrafiltration patient data. Data analysis and modeling requires performing several tasks. In this work we use the Knowledge Discovery in Data Base (KDD) strategy as an ordered analysis framework. In this sense several steps involving different KDD stages such as problem/data understanding, collection, cleaning, pre-processing, analysismodeling and results interpretation were implemented.

Data collection 2.1.1 Patients
One hundred and nine stable patients were selected from two dialysis units as follows: sixty one from Unit1 (mean age 563.5 years and mean time on dialysis (MTD) 3212.3 months) and 48 from Unit2 (mean age 5818.0 and MTD of 4223.5). All patients were from Buenos Aires, Argentina, and were subjected to chronic HD treatment for at least 3 months. The selection criteria to include patients in the study were: (1) patients without infection or hospitalization in the previous 30 days; (2) patients with an A-V fistula (70% autologous fistula and 30% prosthetic fistula) with a blood flow rate (QB) of  300 ml/min, and (3) patients having consented to participate in the study. The study protocol complied with the Helsinki Declaration and was approved by the Ethical Committee of the Catholic University of Córdoba. All patients received HD three times a week with current hemodialysis machines using variable bicarbonate and sodium. Hollow-fiber polysulfone and cellulose diacetate dialyzers were used (see Fernandez et al, 2001 for more details). For the purpose of this study, all patients were dialyzed over 240 min and the flows of blood (QB) and dialysate (QD) were fixed at 300 and 500 ml/min, respectively. It is known that hemodialysis dose is influenced by several factors including dialysis time, hemodialysis schedule and blood and dialysate flow (Daugirdas et al. 1997). In order to decrease the complexity, such variables were handled externally, fixing their values to control their effects on the equilibrated urea prediction model.

The input and output variables
Blood samples were obtained at the mid-week HD session. They were taken from the arterial line at different times to obtain urea determinations: 1) predialysis urea (U 0 ), at the beginning of the procedure; 2) intradialysis urea (U 120 ), in the middle of the HD session (at 120 min from the beginning); 3) postdialysis urea (U 240 ), at the end of the HD session. For the intradialysis urea (U 120 ) and postdialysis urea (U 240 ), QB was slowed to 50 ml/min and blood was sampled 15 seconds later. At this point, access recirculation ceased and the dialyzer inlet blood reflected the arterial urea concentration. Regarding the protocols for intradialysis samples, it is worth noting that originally Smye et al. 1997 proposed taking them within 60 min from the beginning of the session and at 20 min before its finalization. We, however, decided to take the intradialysis sample 120 min after the beginning of the HD session (U 120 ), which allowed us to compare our results with those reported by Guh et al. 1999. Urea (U) determinations were performed in triplicate on each blood sample using autoanalyzers (see Fernandez et al, 2001 for more details). The urea averages were calculated and recorded with an accuracy of 1% for both machines. For information about the pre-and post-treatment status of the patient, we used the pre-and post-dialysis body weights (BW 0 , BW 240 ). Both variables are commonly used in clinical practice to decide the treatment schedule as well as to calculate the treatment dose. These variables were recorded in the same dialysis session when the blood samples were taken. The output variable was the equilibrated urea. For the purpose of this study, the patients were retained one hour in the dialysis center and the equilibrated urea levels (U eq ) were extracted 60 min after the end of HD. The summary statistics for the input and output variables are shown in

Ordinary least squares
The Ordinary Least Square approach estimates the β coefficient vector by minimizing the sum of squared residuals from the data x the "i-th" row of the input matrix X. The algorithm looks for the β that minimize (1). This is achieved taking derivatives of equation 1 and setting them to zero, yielding the following closed solution: where "t" means "transpose" and   t  XX  is a singular matrix with X  the extended input matrix holding

Partial least squares
Partial Least Squares not only generalizes but also combines features from regression and Principal Component Analysis, to deal with correlated explanatory variables in linear models (abdi, 2003, Shawe-Taylor & Cristianini, 2005. It is particularly useful when one or several dependent variables (outputs) must be predicted from a large and potentially highly correlated set of independent variables (inputs). In the PLS algorithm (Wood et al., 2001), X and Y are expressed as: where A is the number of PLS factors (A  p) and H and R are error matrices. The columns of T and U ("score" matrices) provide a new representation of the X and Y variables in an orthogonal space. The matrices P and C are the projections ("loadings") of the X and Y columns into the new set of variables in T and U. The T matrix is calculated as T=X·W where W=U(P´U) -1 . In the PLS algorithm, U and P are built iteratively (Wood et al.,2001) by means of matrix products between consecutive deflations of the original matrices X and Y. Thus, the T matrix is also a good estimator of Y, so where C 1xA is the "loadings" matrix of Y that projects it over the new space represented by T. The error term in E represents the deviations between the observed and predicted responses. Replacing T in the above equation yields: where Ŷ is the predicted output.
The number of factors chosen impacts the estimation of the regression coefficients. In a model with "A" factors, the β coefficients are calculated as follows: In the PLS algorithm the input and output data are centered prior to calculate the different matrices. In addition the input training matrix X could be scaled dividing each column by its standard deviation. Thus, regression coefficients estimated by means of equation (7) lives in the scaled X domain. The values of the β coefficients in the raw data domain are calculated as follows: where Ŷ is the estimated Ueq, V is a diagonal matrix of standard deviations for each column of X and X is the vector of columns means from X. Y is the mean of the response variable from the training data set, and is the intercept.

Support vector machine
In previous cases, the sum of squared deviation of the data can be viewed as a loss function measuring the amount of loss associated with the particular estimation of β. In the Support Vector Machine framework (Vapnik, 2000), the loss function only provides information on those data points from which the loss is beyond a threshold ε yielding to where Ŷ is the estimated Ueq, V is a diagonal matrix of standard deviations for each column of X and X is the vector of columns means from X. The mean and standard deviation of Ueq from training data set are Y and sd y , respectively. The intercept is expressed as 1 0ˆr aw SVM sd Y    y V β X .

Statistical modeling of equilibrated urea
The three estimation procedures (OLS, PLS, and SVM) to obtain the regression coefficients β of a linear model where applied to build bed side equations to estimate equilibrated urea from intradialysis urea samples and anthropometric data in 109 hemodialyzed patients. Estimation, selection and validation of the model were implemented in R language (www.rproject.org) (see appendix).Prior to fit a model, the appropriate number of factors (A) ,the best cost (C) and epsilon (ε) pairs values were chosen for PLS and SVM, respectively. For this purpose, a 15 fold cross validation strategy was applied over 70% randomly chosen patients from the data set. In the PLS case, models including 1 to A factors with A=1, 2, 3, 4 and 5 were tested. For each model the cross validation root mean prediction error (RMPE) was calculated. Then the expected value of the RMPE over all partitions was obtained. The model achieving the smaller RMPE mean was chosen. For the linear SVM case, a Cxε 10x10 grid searches was performed. The ranges were from 4 to 6 for C and from 0.001 to 2 for ε. A linear SVM model was built for each (C,ε) pairs and the cross validation RMPE was calculated and compared. The smaller RMPE mean was used as selection criteria. The www.intechopen.com predictive ability of the fitted models was evaluated using a 20 fold cross-validation strategy over the whole data set. The data set was split in 20 consecutive sets of equal size and 19 were alternatively used for β estimation and one for prediction from the estimated model.

Results
In table 2, cross validation statistics for PLS models with different number of factors is shown. In Fig.1  Once the PLS and SVM models where selected, i.e. a 3 PLS factor model and a SVM trained with C= 4.2222 and ε= 0.2223, the 3 methods (OLS, PLS A=3 and SVM C=4.222,ε=0.2223 ) where evaluated over the whole data set with a 20-fold cross-validation strategy. In Fig. 2 the relative prediction error (%PE) vs. true equilibrated Urea and its corresponding smooth trend are shown for the three estimation strategies. In open circles the OLS (dashed smooth trend) approach, in * PLS errors (dot-dashed smooth trend) and in "+" symbol the SVM errors (dotted smooth trend). It is possible to see that OLS and PLS performs almost equal with a small tendency to increased over estimation for PLS in high Ueq values (the PLS smooth trend curve shows greater %PE than in the other cases). On the contrary, SVM performs better for low Ueq (dotted smooth trend closer to zero %PE). In the midrange of Ueq the three methods performs similar. All the methods tend to overestimate small Ueq values and under estimate high Ueq values. Fig. 2. 20-Fold cross-validation % prediction errors (%PE) for each tested model. Open circles for OLS model, "*" for PLS and "+" for SVMR. The smooth trend curve for each model is also presented (see text for references) In Table 3, summary statistics for PE and the number of data points which have a %PE in the ±10 and ±20 ranges is shown. The PLS model achieves the lowest %PE and SVM the highest but with lesser standard deviation across runs. In terms of median we can see that all the methods tend to overestimate the response, however SVM presents the lower median of %PE suggesting robustness to outliers.  Table 3. Summary statistics for prediction errors and number of data points laying in the ±10 and ±20 %PE interval In Fig. 3 the distribution for the β coefficients that weights each input variable (β 1 for U0, β 2 for U 120 , β 3 for U 240 , β 4 for Bw 0 , and β 5 for Uf) in the input scale (equation 8 for PLS and 13 for SVM) are shown. It is possible to see that coefficient β 5 (associated to Uf) is very variable. This coefficient is mainly estimated as positive by OLS, negative by PLS case and both by SVM. In the first two cases, β 5 was statistically different from zero ("t test" p<0.01). SVM estimation of β seems to be more robust than the other cases. In particular, the β coefficient related to Uf (β 5 ) shows significant less dispersion than in the other models. In the OLS and PLS cases, all except Uf coefficient, show similar behaviour. The Uf coefficient for PLS is the most variant among the rest. www.intechopen.com

Bed side equations for equilibrated urea prediction
Final models were built using the whole patients and using the parameters found in the previous section (for PLS and SVM). We found that the coefficients estimated using the full data set (equations 14 to 15) where similar to the mean of the cross validation coefficients for OLS and SVM. On the contrary, coefficients estimated by PLS where different when using the whole data set compared to those estimated in the cross validation evaluation.
In the OLS case the final bed side equation was the following: The SVM identify 77 support vectors. This means that the β coefficients were estimated using only %70 of the data base. On the contrary, the other two methods require the full data set to build the solution.

Discussion
In this work we show how to build linear models from three different linear regression estimation procedures relying on different optimization algorithms. Ordinary Least squares is based on the minimization of the sum of squared residuals while Partial Least Squares uses maximization of co-variance information by means of repetitive deflation of the input and output matrices based on correlation. Finally, the Support Vector Machine Regression is based on the empirical risk minimization of non-linear loss function. Theoretically, none of the method requires any specific assumption; however, it is known that if the observed variable (the equilibrated urea in this case) follows a normal distribution, the statistical significance of the β coefficients estimated by OLS and PLS can be proved. Even though all the models predict similarly well, they show different estimates not only in value but also in sign for U 0 , body weight and ultrafiltration. Analyzing the "raw" data relationships between these variables (see Fig. 4) and urea rebound   240 eq eq UU U  it is possible to see the known [Gotch & Kleen, 2005] slightly inverse relationship (see smooth trend curves) between BW and Uf with urea rebound. This behaviour seems to be capture for Uf by PLS (negative β 5 ). The β 5 estimated by OLS method seems to follow the positive linear relationship mostly found in the Uf vs Ueq pairs plot. The SVM finds a solution in between, estimating much smaller values for β 5 than the others two. For the case of body weight coefficient (β 4 ), estimations by OLS and SVM are smaller than for PLS, however, SVM method captures the known small tendency between BW and urea rebound. In this sense, PLS is able to capture known biological relationships while still providing broad ranges for the estimation of the Uf coefficient. On the opposite OLS does not reflect the biological effect of Uf. The SVM method provides an in-between solution providing small estimates of the Uf coefficient. Thus, those methods that account for co-linearity (PLS and in some extent SVM) provide better solutions than OLS which do not account for it.
www.intechopen.com We showed that by means of linear models we were able to build bedside equations that can be easily implemented in any calculator or electronic spreadsheet such as Excel®. All the presented methods performed better than traditional methods (Smye et al, 1999) over the same data (Fernández et al, 2001) suggesting the appropriateness of the simple linear approaches. In addition, each hemodialysis centre can build its own predictor based on its own patient population by following the described process or implementing the accompanying source code (see appendix). In this work we show that the use of an intradialysis sample (U120) provided valuable information to predict the equilibrated urea. Smye et al. (1999) were the first to use an intradialysis sample to model Ueq. In clinical practice the extraction of an additional blood urea sample could be very problematic. In a recent publication (Fernandez et al, 2008) we showed that a linear model built without this urea sample can also provide accurate Ueq estimation. Future challenges for Ueq prediction by linear models are emerging with the implementation of different HD schedule proposals based on the variation of session time and/or weekly frequency.

Appendix: R source code for OLS, PLS and SVM linear models for estimate equilibrated urea
In order to apply the R (www.r-project.org) algorithm to build the linear models presented in this work, we assume that the patient data base is stored in a comma separated values (CSV) file as follows (any electronic spreadsheet program allows to save CSV files).