Early identification of individuals with sepsis is very useful in assisting clinical triage and decision-making, resulting in early intervention and improved outcomes. This study aims to develop an explainable machine learning model with the clinical interpretability to predict sepsis onset before 6 hours and validate with improved prediction risk power for every time interval since admission to the ICU. The retrospective observational cohort study is carried out using PhysioNet Challenge 2019 ICU data from three distinct hospital systems, viz. A, B, and C. Data from A and B were shared publicly for training and validation while sequestered data from all three cohorts were used for scoring. However, this study is limited only to publicly available training data. Training data contains 15,52,210 patient records of 40,336 ICU patients with up to 40 clinical variables (sourced for each hour of their ICU stay) divided into two datasets, based on hospital systems A and B. The clinical feature exploration and interpretation for early prediction of sepsis is achieved using the proposed framework, viz. the explainable Machine Learning model for Early Prediction of Sepsis (xMLEPS). A total of 85 features comprising the given 40 clinical variables augmented with 10 derived physiological features and 35 time-lag difference features are fed to xMLEPS for the said prediction task of sepsis onset. A ten-fold cross-validation scheme is employed wherein an optimal prediction risk threshold is searched for each of the 10 LightGBM models. These optimum threshold values are later used by the corresponding models to refine the predictive power in terms of utility score for the prediction of labels in each fold. The entire framework is designed via Bayesian optimization and trained with the resultant feature set of 85 features, yielding an average normalized utility score of 0.4214 and area under receiver operating characteristic curve of 0.8591 on publicly available training data. This study establish a practical and explainable sepsis onset prediction model for ICU data using applied ML approach, mainly gradient boosting. The study highlights the clinical significance of physiological inter-relations among the given and proposed clinical signs via feature importance and SHapley Additive exPlanations (SHAP) plots for visualized interpretation.
Part of the book: Infections and Sepsis Development