Developing and Deploying a Sepsis Deterioration Machine Learning Algorithm

Rohith Mohan; Alexandra King; Sarma Velamuri; Andrew Hudson

doi:10.5772/intechopen.111557

Abstract

A sepsis deterioration index is a numerical value predicting the chance of a patient become septic by a predictive model. This model usually has pre-specified input variables that have a high likelihood of predicting the output variable of sepsis. For the purposes of predicting sepsis deterioration, we will primarily be using regression to determine the association between variables (also known as features) to eventually predict an outcome variable which in this case is sepsis. Among the cohort examined in our model at Cedars Sinai, we found patients who met or exceeded the set threshold of 68.8 had an 87% probability of deterioration to sepsis during their hospitalization with sensitivity of 39% and a median lead time of 24 hours from when the threshold was first exceeded. There is no easy way to determine an intervention point of the deterioration predictive model. The author’s recommendation is to continually modify this inflection point guided by data from near-misses and mis-categorized patients. Collecting real-time feedback from end-users on alert accuracy is also crucial for a model to survive. An ML deterioration model to predict sepsis produces ample value in a healthcare organization if deployed in conjunction with human intervention and continuous prospective re-assessment.

Keywords

sepsis
deterioration
ML
AI in medicine
deterioration index
algorithm deployment

Author Information

Show +

Rohith Mohan*
- Cedars Sinai Medical Center, Los Angeles, CA, United States
Alexandra King
- Cedars Sinai Medical Center, Los Angeles, CA, United States
Sarma Velamuri
- Cedars Sinai Medical Center, Los Angeles, CA, United States
Andrew Hudson
- Cedars Sinai Medical Center, Los Angeles, CA, United States

*Address all correspondence to: rmohan1992@gmail.com

1. Introduction to sepsis

1.1 Defining sepsis

Sepsis is the body’s exaggerated response to an infection where a cascade of inflammation can potentially lead to multiorgan failure or death [1]. It is a condition that could impact patients across the healthcare continuum whether they are well-appearing neonates or geriatric patients with an abundance of medical problems. It is pervasive in its ability to affect nearly every organ system requiring comprehensive multi-specialty care. Healthcare providers have been grappling with treatment of sepsis for as long as medicine has been practiced. As a field, we have made great strides in the ability to identify and treat sepsis, but it still kills nearly 270,000 people annually in the United States. We have a variety of therapeutics to treat the source of infection but one area that remains elusive is the ability to predict sepsis prior to onset.

1.2 Financial burden of sepsis

“Septicemia” is the most common diagnosis treated in US hospitals, having surpassed osteoarthritis in 2011. The number of aggregate sepsis-related hospitalizations has grown exponentially, with the numbers of additional hospitalizations having tripled when comparing 1997–2011 averaging 48,650 hospitalizations/year to 2011–2018 averaging 160,700 hospitalizations/year [2]. In 2018, the US spent more than $41.5 billion on hospital care for patients with sepsis, accounting for a disproportionate amount of total hospital costs (10.3%). Of the top 10 most common diagnoses, it ranks as the second most costly, averaging $18,700/stay, after acute myocardial infarction [2]. Hospitalizations with sepsis as the principal diagnosis also claim the highest 30-day re-admission rate with 8.3% patients getting re-admitted, and the highest average readmission costs at $19,800 per re-admission [2].

2. Current sepsis evaluation and scoring

In the landmark Sepsis-1 paper, the authors stressed the importance of having a specific definition for sepsis to identify where along the sepsis continuum a patient presents [3]. Since the formalized definitions of SIRS and sepsis were published in 1991, a multitude of different scoring systems have been proposed, tested, and validated to predict deterioration and/or risk of mortality. Each system offers a distinct group of variables with weighted sums or point systems attempting to optimally determine which patients are at the highest risk of deterioration.

Variables used as criteria for scoring have transformed with the evolution of sepsis’ definition. The pivot in sepsis’ definition from SIRS with concomitant infection to focusing more on the spectrum of end-organ dysfunction resulted in an increased reliance on laboratory values in diagnosis of sepsis. Inclusion or exclusion of each variable in a scoring system was the result of iterative assessments of the variable’s ability to predict the risk of an adverse outcome (deterioration and/or death) and its sensitivity in allowing for timely intervention (Figure 1).

Figure 1.
Included variables in sepsis scoring systems.

2.1 Systemic inflammatory response syndrome (SIRS)

The presence of SIRS is defined by derangements in temperature, ventilation, increased heart rate, or leukocytes—all markers of systemic inflammation. The authors of Sepsis-1 specified that SIRS was more for the recognition of sepsis rather than a tool for grading severity of sepsis. Despite this, studies have validated SIRS to have prognostic value. Higher SIRS scores correlate with more rapid progression to sepsis and are positively correlated with mortality rates [4, 5]. Higher SIRS scores also showed stepwise increases in mortality as sepsis severity increased comparing patients with SIRS, sepsis, severe sepsis, and septic shock [4]. However, SIRS scores were found to be overly sensitive and poorly specific in predicting mortality leading to overdiagnosis and treatment of sepsis.

2.2 Sequential organ failure assessment (SOFA)

Recognizing sepsis’ dependence on SIRS and its inherent limitations in characterizing end-organ damage, Sepsis-3 re-defined sepsis as a “life threatening organ dysfunction caused by a dysregulated host response to infection” and endorsed the sequential organ failure assessment (SOFA) score as a scoring system for mortality [6]. SOFA’s summative score of multiple organ systems reflects PaO2/FiO2 (respiratory), platelet count (coagulation), bilirubin (liver), hypotension (cardiovascular), Glasgow coma score (GCS-neurologic), and creatinine or urine output (renal). The creators of SOFA aimed to keep it simple to allow for repeated assessments over time. The worst daily value is used to trend the risk of mortality in patients who are admitted to the intensive care unit. SOFA only uses variables that are obtained routinely and implemented a scoring system from zero to four to stratify a patient’s risk rather than using binary categorization.

Key limitations to SOFA include its simplicity in characterizing only six organ systems as detailed above. It is unclear if bilirubin is the best biomarker for the hepatic system given hyperbilirubinemia takes days to manifest and is also the most frequently missing variable if the lab is not ordered [7]. GCS as a measure of neurologic function is at risk of being uninterpretable in hospitalized patients, a patient population in whom sedatives are frequently used.

2.3 Early warning systems (EWS): modified EWS (MEWS) and national early warning score (NEWS)

Compared with mortality risk scores, early warning systems (EWS) monitor a patient’s vital signs at shorter intervals and screen for early indications of clinical decline. The first EWS described was the modified Early Warning Score (MEWS) in 1997 and it was the first to gain wide acceptance in the United States. MEWS was later modified and then adopted as the United Kingdom’s National Early Warning Score (NEWS). There are over 100 EWS but we will only discuss two here [8]. Figure 2 compares several well-known scoring systems.

Figure 2.
Early warning score performance for sepsis discrimination.

2.4 Modified early warning score (MEWS)

The modified early warning score (MEWS) evaluates temperature, pulse, respiratory rate, systolic blood pressure, and level of consciousness by the AVPU score (Alert, Reacting to Voice, Reacting to Pain, or Unresponsive). Each parameter is assigned a score from zero to three based on the degree of abnormality and then parameters are aggregated as a MEWS score. Different scoring thresholds trigger pre-determined interventions. For example, a score of 3 may recommend increasing frequency of patient assessment to every 8 hours. MEWS has been validated for use in all hospitalized patients with its parameters easily attainable at bedside allowing for generalized use [9]. MEWS at a cut-point of 4 demonstrated higher sensitivity when utilized with a clinician’s input compared with using MEWS with a cutoff score of 4 alone (56.6% vs. 72.4%) [10]. This highlights the need for clinical context when utilizing a deterioration score for decision-making.

2.5 National early warning score (NEWS)

The national early warning score (NEWS) was adapted from MEWS and encompasses all the same variables – temperature, pulse, respiratory rate, systolic blood pressure, AVPU score—while adding oxygen saturation, shown to be a strong independent predictor of mortality [11] and considers the need for respiratory support. It has been validated for use in all admitted patients for predicting deterioration, escalation of care, or death within 24 hours. When compared with thirty-three other EWS, NEWS, based on AUROC (Area Under Receiver Operator Curve), outperformed the others in predicting adverse events and in predicting sepsis prior to onset. The thirty-three EWS compared to NEWS all included the following common variables: heart rate, respiratory rate, systolic blood pressure, temperature, AVPU score for consciousness, oxygen saturation, urine output and age. Each had specific weighted scores and thresholds assigned for triggering specific response [12].

2.6 Key variables

Urine Output: The three EWS that shared the same variables as NEWS that incorporated urine output into their score had minor increases in AUROC, particularly for predicting mortality within 24 hours, but not for clinical deterioration [13, 14, 15]. Oliguria and anuria are signs of organ dysfunction and are therefore logically associated with predicting pending mortality.

Age: Age showed statistical significance in predicting the following adverse outcomes:, ICU (Intensive Care Unit) admission, attendance of the cardiac arrest team at a cardiorespiratory emergency, or death at 60 days with a small increase in AUROC from 0.67 to 0.72 [9]. This contradicts a subsequent study that compared multiple EWS and revealed that only the EWS that included age as co-variable did not outperform those excluding age in predicting deterioration [16].

2.7 Pediatric early warning score (PEWS)

Although sepsis is less common in the pediatric population compared with adults, it is more challenging to detect because symptoms like fever and tachycardia, heralding signs in adults, frequently accompany mild illness in pediatric patients. Thus, the ability to detect and differentiate patients at risk for deterioration is even more crucial.

The bedside pediatric early warning score (PEWS) includes seven variables determined by expert consensus: heart rate (HR), capillary refill time (CRT), respiratory rate (RR), respiratory effort, systolic blood pressure (sBP), transcutaneous oxygen saturation, and oxygen therapy [17]. Each variable’s ability to discriminate between control and case patients was assessed by logistic regression. HR, RR, respiratory effort, and oxygen therapy had AUROC > 0.75 while CRT, oxygen saturation, sBP, and temperature had intermediate AUROC scores between 0.65 and 0.74 [17]. Temperature was ultimately excluded as a variable due to little added value. Bedside PEWS’ AUCROC was 0.91 with a sensitivity of 82% and specificity 93% at a threshold score of eight [17]. Bedside PEWS is sensitive in detecting deterioration with scores increasing 24 hours prior to urgent escalation of care and can identify patients at risk within at least 1 hour’s notice of sepsis [18].

2.8 Sepsis deterioration index

A sepsis deterioration index is a numerical value predicting the chance of a patient becoming septic by a predictive model. This model usually has pre-specified input variables that have a high likelihood of predicting the output variable of sepsis. With the proliferation of healthcare data in the last two decades due to the mandated use of electronic health records, we are now approaching an era where there is enough data to train machine learning models to predict sepsis. The electronic health record (EHR) system Epic is estimated to have approximately 30,000 data points per patient [19]. While large volumes of data are now becoming available, the data must be formatted in a way that can be processed by machine learning models. Healthcare data within EHR repositories tends to be heterogenous and require extensive cleansing before becoming usable for this purpose. Clinical data is rarely standardized and is entered into the EHR without the intention of being utilized for back-end data analysis. Prior studies of de-identified Epic-derived data have characterized these issues and encouraged standardized data entry by clinical staff on the front-end [20]. This is a lofty goal which may be attained at some point in the future. For now, data can be entered into machine learning models through feature extraction followed by creative cleansing and wrangling methods to be discussed. We will first describe in detail the derivation of our institution’s sepsis deterioration index. Afterwards we will discuss how our model was trained and compare it to existing models.

3. Methodology: creating a sepsis deterioration machine learning model

The dataset to create our Cedars- Sinai Deterioration Index (CS-DI) consisted of 1521 hospital admitted patients from June 1st, 2021– September 1st, 2021, and is a representation of a standard medical/surgical unit patient population, containing 157,845 encounters. We used 70% (110,492) encounters for training, and 30% (47,353) encounters for testing. The average age of patients in the dataset is 63.22 years. 95,844 of patients identified as male, 61,203 of patients identified as female, and 798 of patients identified as other. 89,517 patients identified as Caucasian, 13,430 patients identified as Asian, 23,568 patients identified as Black or African American, 401 patients identified as American Indian or Alaska Native, and 29,624 patients identified as Other/Unknown. The dataset includes lab results, nursing assessments, vital signs, and a predictor for an event, which is a binary indicator for an escalation of care, classified as a transfer to an Intensive Care Unit (ICU), Respiratory or Cardiac Arrest (Code Blue), or Death (Mortality).

3.1 Key variables for the CS-DI

In our model, the CS-DI, our patient cohort was extracted based on meeting the following inclusion criteria:

Inpatient Hospital Admission
Inpatient Admission Date between 6/1/2021-9/1/2021
Hospital Problem List ICD-10 Diagnosis including the following:
A41.2 Sepsis due to unspecified staphylococcus.
A41.51 Sepsis due to Escherichia coli [E. coli].
A41.52 Sepsis due to Pseudomonas.
A41.59 Other Gram-negative sepsis.
A41.81 Sepsis due to Enterococcus.
A41.9 Sepsis, unspecified organism.
R65.10 SIRS of noninfectious origin w/o acute organ dysfunction.
R65.11 SIRS of non-infectious origin w acute organ dysfunction.
R65.20 Severe sepsis without septic shock.
R65.21 Severe sepsis with septic shock.

Based on these inclusion criteria, our annual patient cohort ranged from approximately from 4462 to 5729 patients per year. The following variables were extracted from our database to be used as features in the CS-DI:

3.2 Demographics

Patient MRN/Patient ID
Age

3.3 Admission encounters—Diagnosis, ICU LOS

Discharge Diagnosis
Admission Source
Discharge Disposition
LOS
ICU LOS
30-day readmission (Y/N)

3.4 Clinical variables

Respiratory Rate (breaths per minute)
Oxygen Saturation SpO2 (%)
Temperature (F)
Systolic Blood Pressure (mmHg)
Heart Rate/Pulse Rate (bpm)
Partial Pressure Co2 (mmHg)
PaO2
Urine Output
Consciousness Level/Mental Status
A,V,P,U Scale
White Blood Cells (mm³)
Bands (%)
Bilirubin (Liver Function)
Platelet Count (Coagulation Function)
Serum Creatinine (Renal Function)

3.5 Data wrangling

There are numerous EHR systems within the United States but to train a machine learning model with reasonable predictive power, it requires a large enough volume of data and a wide variety of features. Epic is one of the largest EHR systems in the United States and had the most data available from its back-end Caboodle warehouse making it an ideal choice as the data source for our model. Our model was developed at Cedars Sinai which uses Epic as its EHR. Our data was trained with patient data from Cedars Sinai, but our methodologies could be used by other health systems using Epic-derived data if features are defined in a similar fashion to our methodology.

Most machine learning algorithms require data to be converted into numerical values before entry into the model. Clinical data, particularly for lab values, can be extremely noisy with values documented in non-standardized formats in flowsheets. For example, when reporting the results of white blood cell counts in a urine sample, the data could be reported as 0, 1+, 2+, 3+, 4+ or none, or as some, few, many white blood cells with variations in how the text is entered by each technician. A data analyst must go through each data element entered in the algorithm and use code to replace text data or strings into numerical values. This is a painstaking process requiring meticulous data review. Afterwards, a clinician, preferably a clinical informaticist should comb through the data to identify outliers or mis-entered data that would not fit in the dataset with an understanding of the data from a clinical perspective.

Once individual data elements have been cleansed, data elements from different tables will be converted to a format allowing tables to be joined. When extracted from the Epic Caboodle Data Warehouse, data is often stored in rows for each encounter. To merge with data from another table such as vital signs, each lab value needs to be converted to a column for interpretation by the machine learning model. We utilize a pivot function to reformat this data from rows into columns using the common identifiers of medical record number, encounter identifier, measurement value, measurement time, and measurement unit. We then used a merge function to combine data elements from different tables to create a usable dataset. Please refer to the following link for details on our code:

https://github.com/rohith-mohan/caboodledatacleanse/commit/7d17c05fc3eeb043d22cb97e454701d2fbe81075

4. Choice of machine learning (ML) model

In the realm of data science, the choice of the appropriate machine learning model is critical in gaining the most information out of the data extracted while also being mindful of the computing resources needed to run the model (Figure 3).

Figure 3.
Rationale for selecting a machine learning algorithm.

For the purposes of predicting sepsis deterioration, we will primarily be using regression to determine the association between variables (also known as features) to eventually predict an outcome variable of sepsis. As seen in our section above regarding validated sepsis scoring methodologies, all of our features are numerical making regression a reasonable choice for our model. The mathematics behind most forms of regression are complex but we will go through the basic premise of a few common types of regression.

Linear regression is the simplest form of regression which most people are familiar with. It simply uses a set of dependent variables with coefficients that dictate the strength of association with an independent variable (y = mx + b).
Bayesian linear regression is useful for small datasets since its features are based on a weighted sum of other variables to reduce dependence on the output a single point of data [21].
Decision forest regression use a set of binary decision branch points to eventually reach a decision node. This is a very popular choice of model given its efficiency in use of computing power, accuracy even when presented with heterogenous data, and speed to train [22].
Neural network regression determines the relationship between features and output variables through use of “neurons” in “layers” that associate weights to features in the model. They can be used for structured and unstructured data to create highly accurate models but are slow to train and require high amounts of computational power [23].

4.1 Epic deterioration index (EDI)

The epic deterioration index (EDI) is a proprietary prediction model implemented in over 100 U.S. hospitals to support clinical-decision support in diagnosis of sepsis [24]. The EDI aims to detect patients who are deteriorate and require higher levels of care. Its score ranges from 0 to 100, in which the higher numbers denote a greater risk of experiencing a composite adverse outcome of requiring rapid response, resuscitation, ICU-level care, or death in the next 12–38 hours. The EDI uses a cumulative link model, a specific type of ordinal logistic regression model, that uses two parallel linear combinations of clinical inputs drawing two decision boundaries in the space of prediction using proportional odds assumptions. The details of the implementation are proprietary, and Epic has not shared this information publicly or described it in their published literature, but the accuracy of this model is 47.4% [24].

4.2 Cedars Sinai deterioration index (CS-DI)

After evaluating the accuracy of the EDI model, and other early warning systems, a decision was made at our organization to create a Cedars-Sinai deterioration index (CS-DI) machine learning algorithm that uses data from the patient’s electronic medical record and calculates a percentage value that predicts the likelihood of a patient deterioration with an escalation of care. The predefined intervention point would automatically be activated if the calculated deterioration percentage value is reached and generate an alert notifying care providers to intervene sooner and possibly prevent further deterioration. Once trained, the CS-DI was deployed as a clinical decision support application to identify patients at risk for sepsis in real-time. Seventy percent of the cohort was used as the training set for the model while the other 30% was used as the test set. We used the CS-DI percentage value calculated to predict a composite outcome of further deterioration, intensive care unit-level care, mechanical ventilation, or hospital death.

Among the cohort examined, we found patients who met or exceeded the set threshold of 68.8 had an 87% probability of a composite outcome during their hospitalization with sensitivity of 39% and a median lead time of 24 hours from when the threshold was first exceeded. Among the patients hospitalized for at least 48 hours who had not experienced a composite outcome, 13% never exceeded 37.9 with a negative predictive value of 90% and a sensitivity above the threshold of 92%. When run against the MEWS early warning system, NEWS early warning system, and the EDI, the CS-DI predicted deterioration on average a full hour ahead of the other deterioration index models.

4.3 Unstructured data in ML models

Recent studies have shown that incorporating non-numerical data including key words from clinical documentation and diagnostic imaging can increase the accuracy of models [25]. This data is first converted to a format usable by machine learning algorithms via natural language processing (NLP). NLP uses sophisticated methods of text analytics to convert text into numerical data usable by an algorithm [26]. Goh et al. use a method of text analytics known as latent Direchlet allocation to group texts that are similar into topics. They identified 100 common text topics that were grouped into one of the following seven categories: (1) clinical status, (2) communication, (3) laboratory tests, (4) non-clinical status, (5) social relationships, (6) symptom, and (7) treatment. The numerical values derived from this text data were combined with structured numerical data like those used in the numerical regression models such as patient demographics, vitals data and laboratory data. By adding text data into the mix, the AUC of their Sepsis early risk assessment (SERA) model was as high as 0.87 with a lead time of 48 hours before the onset of sepsis.

Unstructured data in these models increases the accuracy and lead time as expected. Healthcare professionals rely on a multitude of unstructured data including all the data above and a physical assessment of the patient. The more of these features we can incorporate into models, the more accurate they can become. Humans cannot be omniscient to continuously monitor all the data they are presented and make real-time assessments on every patient in the hospital. If we can train a machine to think like an astute healthcare professional with the processing power of a supercomputer, we can ideally reduce the incidence of sepsis in our healthcare systems before it occurs.

5. Deployment of the CS-DI model

5.1 Prospective scaling of CS-DI

Our model has been used utilized prospectively to determine the risk of patients deteriorating into sepsis. The data was extracted from the Epic Caboodle Data Warehouse and pushed every 15 minutes to an S3 datastore and then to an Amazon Redshift Cloud Data Warehouse. The code to cleanse the data and run the features through our model was stored on docker containers to allow the data to be analyzed prospectively and at scale. The algorithm would calculate a percentage value from 0 to 100% and visually display a near-real time swim lane on an intuitive user interface in our command center. If patients neared a predefined intervention point, a protocol for escalation by the triaging Rapid Response Team (RRT) was initiated.

A crucial step in realizing the potential of ML algorithms is to work closely with the facility’s IT department to integrate them into the clinical workflow while minimizing alert-fatigue. Ultimately, the successful integration of ML algorithms should aim to enhance the productivity of clinical teams while avoiding any attempt to replace them entirely.

5.2 Institutional considerations for deployment of sepsis model at cedars-sinai

The deployment of sepsis AI alerting systems can be categorized into two approaches - passive and active, each with distinct staffing models. The passive approach involves a central hub of trained personnel monitoring sepsis alerts at one location such as a command center.

Despite claims of successful implementation at some institutions, this approach has huge dependency on a small group of people and is much more expensive than the second approach which we will cover shortly.

Effective staffing of this passive model requires careful consideration of the number and distribution of generated alerts. The distribution of alerts over time must also be considered. Due to the workflow of data collection that feeds the alert, the distribution of alerts may be bimodal or trimodal. Most alerts may occur during specific times of the day such as when labs are reported, vital signs are entered by nursing or during changes of shift. To adequately manage the expected volume and timing of alerts, staffing requirements should be calculated. Specifically, the team should be capable of handling 30–40 alerts within a 3-hour period, with occasional alerts occurring during off-peak times. In practice, this will likely require the hiring of an additional three full-time equivalent (FTE) nurses for the day shift and 2 FTE nurses for the night shift, with float coverage provided during weekends and vacations. The active approach or the fractal-behavior model is one in which humans work collaboratively with the AI model. In this approach each nurse is responsible for managing their own 4 or 5 patients assigned to them during the shift. There are two phases to the management of a patient based on whether they have sepsis or not.

Phase 1—When the sepsis alert is prevented from firing because the nurse has proactively screened the patient for sepsis using a standardized rule-based ML algorithm that uses a multivariate decision tree—i.e., non-linear decision making. In this case each nurse is consistently evaluating every patient at shift change, or when they first have the patient assigned to them. This method captures data before it is readily available in the EHR (e.g., patient’s mental status, clinical appearance, and subjective judgment around source of infection). If a patient screens positive for infection—more action can be taken at that time to implement a diagnostic or treatment workflow.

Phase 2—When the sepsis alert fires—the bedside provider activates a workflow that allows them to perform a secondary clinical evaluation (SCE) to evaluate the alert in the context of the patient’s clinical status. Frequently the decentralized active approach is criticized for failing because bedside nurses and providers fail to respond to alerts due to alert fatigue [26, 27, 28]. However, this approach only fails when the institution is relying solely on the EHR to mobilize the alert.

Hospital systems should consider adopting a user-centered design (UCD) instead of relying on traditional EHR interfaces. UCD involves the development of an interface that is tailored to clinical workflows thereby maximizing efficiency. Ruminski et al. found that displaying a visual monitor significantly reduced the rate of sepsis [27]. Furthermore, studies have shown that color coding and screen positioning in the user’s visual field can improve provider satisfaction and reduce sepsis rates by over 50%. It is vital to align clinical end-users with the facility’s IT department to ensure that the product meets clinical expectations while remaining compatible with the EHR.

This approach establishes a highly reliable two-step method that when repeated by hundreds of nurses daily resembles a fractal that is made of repeated behaviors. It is independent of staffing and nursing ratios, does not require additional FTE hires and is more economically feasible the cost of several million dollars a year less in staff salaries to implement than the passive model.

5.3 Model surveillance

Machine learning (ML) models for sepsis are notorious for creating alerts that are not actionable. In addition, these models’ predictive performance degrades over time especially when deployed on populations not resembling their training sets. Concept drift, or the change in the underlying data distribution over time, is often not considered in the deployment of ML models. Many companies that provide sepsis ML detection systems fail to account for new data or changes in patient demographics.

For example, let us examine the following example (Figure 4) [28]. Models built in states with low death rates will perform poorly when being deployed in states with high death rates and vice versa due to overfitting to a particular population/dataset. Both data drift and concept drift can occur at the same time, leading to inaccurate predictions and reduced model efficacy. It is crucial to incorporate methods that can handle data drift, concept drift and population drift in the maintenance and deployment of ML models, especially in the clinical setting where predictions have an impact on patient outcomes. One solution to these issues is continuously incorporating prospective data to re-calibrate the model. In the case of the CS-DI, if the model predicted sepsis when a patient was not septic, the model should eventually be retrained to correctly categorize that patient.

Figure 4.
Septicemia mortality by state.

5.4 Governance of the model

Given patient safety concerns, the governance of a sepsis deterioration model falls under the jurisdiction of a medical executive committee (MEC). Additionally, hospitals now have sepsis steering sub-committees and patient safety committees in advisory roles. Patient risk—especially with respect to false negatives—should be presented and all non-treatment decisions that lead to poor patient outcomes should be examined on a quarterly basis at minimum to ensure patient safety.

Machine learning is how a computer learns to predict a particular outcome based on prior data. Artificial intelligence (AI) is the ability to take the information and translate it into an actionable insight. There remains weariness of AI and fear that it will replace human decision-making capacity. As shown in the development of the model above, AI takes human-derived knowledge but augments the ability to act on that knowledge via computing power. AI is a good servant but a terrible master—all treatment and non-treatment decisions remain with a licensed independent practitioner.

5.5 Determining an intervention point

There is no easy way to determine an intervention point based on the predictive model. The beauty of deploying an ML model based on the active method described above is that one will be able to set an intervention point on when to alert an end user (theoretically at 90% sensitivity and 90% specificity) and leave the decision to intervene with the clinical end user. The author’s recommendation is to continually modify this inflection point guided by data from near-misses and mis-categorized patients. Collecting real-time feedback from end-users on alert accuracy is also crucial for a model to survive. In conclusion, an ML deterioration model to predict sepsis produces ample value in a healthcare organization if deployed in conjunction with human intervention and continuous prospective re-assessment.

6. Conclusion

Sepsis is a ubiquitous condition across healthcare continuum causing millions of deaths annually and incurring high costs on the healthcare system. We have made great strides in the ability to identify and treat sepsis, but it still kills nearly 270,000 people annually in the U.S. A sepsis deterioration index is a numerical value predicting the chance of a patient becoming septic by a predictive model. This model usually has pre-specified input variables that have a high likelihood of predicting the output variable of sepsis. For the purposes of predicting sepsis deterioration, we used regression to determine the association between variables (also known as features) to eventually predict sepsis. Among the cohort examined in our model at Cedars Sinai, we found patients who met or exceeded the set threshold of 68.8 had an 87% probability of deterioration to sepsis during their hospitalization and a median lead time of 24 hours from when the threshold was first exceeded. Another model incorporating unstructured text into their deterioration model, had an AUROC (Area Under Receiver Operator Curve) as high as 0.87 with a lead time of 48 hours before the onset of sepsis. There is no easy way to determine an intervention point of the deterioration predictive model. The author’s recommendation is to continually modify this inflection point guided by data from near-misses and mis-categorized patients. Collecting real-time feedback from end-users on alert accuracy is also crucial for a model to survive. An ML deterioration model to predict sepsis produces ample value in a healthcare organization if deployed in conjunction with human intervention and continuous prospective re-assessment.

Acronyms and abbreviations

SIRS	systemic inflammatory response syndrome
SOFA	sequential organ failure assessment (formally sepsis organ failure assessment)
PaO2	partial pressure of oxygen
FiO2	fraction of inspired oxygen
GCS	Glasgow Coma Score
EWS	early warning system
MEWS	Modified Early Warning Score
HR, bpm	heart rate, beats per minute
RR	respiratory rate
sBP	systolic blood pressure
NEWS	National Early Warning Score
AVPU	alert, reacting to voice, reacting to pain, unresponsive
AUROC	area under the receiver operator curve
ICU	intensive care unit
PEWS	bedside Pediatric Early Warning Score
CRT	capillary refill time
EHR	electronic health record
CS-DI	Cedars sinai deterioration index
ICD-10	International Classification of Disease, 10th revision
ML	machine learning
EDI	epic deterioration index
NLP	natural language processing
AUC	area under the curve
SERA	sepsis early risk assessment
RRT	rapid response team
IT	information technology
AI	artificial intelligence
FTE	full-time equivalent

References

1. What is Sepsis? Available from: https://www.cdc.gov/sepsis/what-is-sepsis.html
2. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); Feb 2006. Available from: https://www.ncbi.nlm.nih.gov/books/NBK52651/
3. Bone RC, Sprung CL, Sibbald WJ. Definitions for sepsis and organ failure. Critical Care Medicine. 1992;20(6):724-726
4. Rangel-Frausto MS et al. The natural history of the systemic inflammatory response syndrome (SIRS). A prospective study. JAMA. 1995;273(2):117-123
5. Jones GR, Lowes JA. The systemic inflammatory response syndrome as a predictor of bacteraemia and outcome from sepsis. QJM: An International Journal of Medicine. 1996;89(7):515-522
6. Singer M et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801-810
7. Moreno R et al. The sequential organ failure assessment (SOFA) score: Has the time come for an update? Critical Care. 2023;27(1):15
8. Kramer AA, Sebat F, Lissauer M. A review of early warning systems for prompt detection of patients at risk for clinical decline. Journal of Trauma and Acute Care Surgery. 2019;87(1S Suppl 1):S67-S73
9. Subbe CP et al. Validation of a modified Early Warning Score in medical admissions. QJM: An International Journal of Medicine. 2001;94(10):521-526
10. Fullerton JN et al. Is the Modified Early Warning Score (MEWS) superior to clinician judgement in detecting critical illness in the pre-hospital environment? Resuscitation. 2012;83(5):557-562
11. Buist M, Bernard S, Anderson J. Epidemiology and prevention of unexpected in-hospital deaths. The Surgeon. 2003;1(5):265-268
12. Smith MEB, Chiovaro JC, O’Neil M, et al. Early Warning System Scores: A Systematic Review [Internet]. Washington (DC): Department of Veterans Affairs (US); Jan 2014. Available from: https://www.ncbi.nlm.nih.gov/books/NBK259026/
13. Goldhill DR, McNarry AF. Physiological abnormalities in early warning scores are related to mortality in adult inpatients. British Journal of Anaesthesia. 2004;92(6):882-884
14. Barlow GD, Nathwani D, Davey PG. Standardised early warning scoring system. Clinical Medicine (London, England). 2006;6(4):422-423. author reply 423-4
15. von Lilienfeld-Toal M et al. Observation-based early warning scores to detect impending critical illness predict in-hospital and overall survival in patients undergoing allogeneic stem cell transplantation. Biology of Blood and Marrow Transplantation. 2007;13(5):568-576
16. Smith GB et al. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84(4):465-470
17. Parshuram CS, Hutchison J, Middaugh K. Development and initial validation of the Bedside Paediatric Early Warning System score. Critical Care. 2009;13(4):R135
18. Parshuram CS et al. Implementing the Bedside Paediatric Early Warning System in a community hospital: A prospective observational study. Paediatrics & Child Health. 2011;16(3):e18-e22
19. Estabrooks PA et al. Harmonized patient-reported data elements in the electronic health record: Supporting meaningful use by primary care action on health behaviors and key psychosocial factors. Journal of the American Medical Informatics Association. 2012;19(4):575-582
20. Mohan R et al. Deriving a de-identified dataset from EPIC-derived data for use in clinical research: Pearls and pitfalls in database creation and analysis. Pediatrics. 2021;147(3_MeetingAbstract):7-8. DOI: 10.1542/peds.147.3MA1.7
21. Introduction to Bayesian Linear Regression. Available from: https://www.simplilearn.com/tutorials/data-science-tutorial/bayesian-linear-regression
22. Decision Forest Regression component. Available from: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/decision-forest-regression
23. RS. A Walk-through of Regression Analysis Using Artificial Neural Networks in Tensorflow. Available from: https://www.analyticsvidhya.com/blog/2021/08/a-walk-through-of-regression-analysis-using-artificial-neural-networks-in-tensorflow/
24. Epicshare, <h1 data-v-6822abc2="" id="PostTitle" style="box-sizing: border-box; margin-bottom: 0.5rem; margin-top: 0px; line-height: 1.2; font-size: 2rem; scroll-margin-top: 9.75rem; color: rgb(58, 41, 46); font-family: Program, sans-serif;">Saving Lives with AI: Using the Deterioration Index Predictive Model to Help Patients Sooner. Available from: https://www.epicshare.org/share-and-learn/saving-lives-with-ai
25. Goh KH et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nature Communications. 2021;12(1):711
26. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: An introduction. Journal of the American Medical Informatics Association. 2011;18(5):544-551
27. Ruminski CM et al. Impact of predictive analytics based on continuous cardiorespiratory monitoring in a surgical and trauma intensive care unit. Journal of Clinical Monitoring and Computing. 2019;33(4):703-711
28. Stats of the States- Septicemia Mortality. Available from: https://www.cdc.gov/nchs/pressroom/sosmap/septicemia_mortality/septicemia.htm

[1] 1. What is Sepsis? Available from: https://www.cdc.gov/sepsis/what-is-sepsis.html

[2] 2. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); Feb 2006. Available from: https://www.ncbi.nlm.nih.gov/books/NBK52651/

[3] 3. Bone RC, Sprung CL, Sibbald WJ. Definitions for sepsis and organ failure. Critical Care Medicine. 1992;20(6):724-726

[4] 4. Rangel-Frausto MS et al. The natural history of the systemic inflammatory response syndrome (SIRS). A prospective study. JAMA. 1995;273(2):117-123

[5] 5. Jones GR, Lowes JA. The systemic inflammatory response syndrome as a predictor of bacteraemia and outcome from sepsis. QJM: An International Journal of Medicine. 1996;89(7):515-522

[6] 6. Singer M et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801-810

[7] 7. Moreno R et al. The sequential organ failure assessment (SOFA) score: Has the time come for an update? Critical Care. 2023;27(1):15

[8] 8. Kramer AA, Sebat F, Lissauer M. A review of early warning systems for prompt detection of patients at risk for clinical decline. Journal of Trauma and Acute Care Surgery. 2019;87(1S Suppl 1):S67-S73

[9] 9. Subbe CP et al. Validation of a modified Early Warning Score in medical admissions. QJM: An International Journal of Medicine. 2001;94(10):521-526

[10] 10. Fullerton JN et al. Is the Modified Early Warning Score (MEWS) superior to clinician judgement in detecting critical illness in the pre-hospital environment? Resuscitation. 2012;83(5):557-562

[11] 11. Buist M, Bernard S, Anderson J. Epidemiology and prevention of unexpected in-hospital deaths. The Surgeon. 2003;1(5):265-268

[12] 12. Smith MEB, Chiovaro JC, O’Neil M, et al. Early Warning System Scores: A Systematic Review [Internet]. Washington (DC): Department of Veterans Affairs (US); Jan 2014. Available from: https://www.ncbi.nlm.nih.gov/books/NBK259026/

[13] 13. Goldhill DR, McNarry AF. Physiological abnormalities in early warning scores are related to mortality in adult inpatients. British Journal of Anaesthesia. 2004;92(6):882-884

[14] 14. Barlow GD, Nathwani D, Davey PG. Standardised early warning scoring system. Clinical Medicine (London, England). 2006;6(4):422-423. author reply 423-4

[15] 15. von Lilienfeld-Toal M et al. Observation-based early warning scores to detect impending critical illness predict in-hospital and overall survival in patients undergoing allogeneic stem cell transplantation. Biology of Blood and Marrow Transplantation. 2007;13(5):568-576

[16] 16. Smith GB et al. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84(4):465-470

[17] 17. Parshuram CS, Hutchison J, Middaugh K. Development and initial validation of the Bedside Paediatric Early Warning System score. Critical Care. 2009;13(4):R135

[18] 18. Parshuram CS et al. Implementing the Bedside Paediatric Early Warning System in a community hospital: A prospective observational study. Paediatrics & Child Health. 2011;16(3):e18-e22

[19] 19. Estabrooks PA et al. Harmonized patient-reported data elements in the electronic health record: Supporting meaningful use by primary care action on health behaviors and key psychosocial factors. Journal of the American Medical Informatics Association. 2012;19(4):575-582

[20] 20. Mohan R et al. Deriving a de-identified dataset from EPIC-derived data for use in clinical research: Pearls and pitfalls in database creation and analysis. Pediatrics. 2021;147(3_MeetingAbstract):7-8. DOI: 10.1542/peds.147.3MA1.7

[21] 21. Introduction to Bayesian Linear Regression. Available from: https://www.simplilearn.com/tutorials/data-science-tutorial/bayesian-linear-regression

[22] 22. Decision Forest Regression component. Available from: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/decision-forest-regression

[23] 23. RS. A Walk-through of Regression Analysis Using Artificial Neural Networks in Tensorflow. Available from: https://www.analyticsvidhya.com/blog/2021/08/a-walk-through-of-regression-analysis-using-artificial-neural-networks-in-tensorflow/

[24] 24. Epicshare, <h1 data-v-6822abc2="" id="PostTitle" style="box-sizing: border-box; margin-bottom: 0.5rem; margin-top: 0px; line-height: 1.2; font-size: 2rem; scroll-margin-top: 9.75rem; color: rgb(58, 41, 46); font-family: Program, sans-serif;">Saving Lives with AI: Using the Deterioration Index Predictive Model to Help Patients Sooner. Available from: https://www.epicshare.org/share-and-learn/saving-lives-with-ai

[25] 25. Goh KH et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nature Communications. 2021;12(1):711

[26] 26. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: An introduction. Journal of the American Medical Informatics Association. 2011;18(5):544-551

[27] 27. Ruminski CM et al. Impact of predictive analytics based on continuous cardiorespiratory monitoring in a surgical and trauma intensive care unit. Journal of Clinical Monitoring and Computing. 2019;33(4):703-711

[28] 28. Stats of the States- Septicemia Mortality. Available from: https://www.cdc.gov/nchs/pressroom/sosmap/septicemia_mortality/septicemia.htm