## 1. Introduction

Physically demanding tasks, environmental heat and humidity and various clothing requirements combine to create heat stress for workers. The associated physiological responses to that stress, e.g. increased body core temperature (Tc), heart rate (HR) and sweating, are collectively known as physiological strain. Physiological strain rises with the heat stress, and if not controlled, may diminish the quality and productivity of job performance. Left unchecked, high levels of heat strain may also result in increased accident rates and an increased risk of heat-related disorders including unconsciousness and cardiac arrest. Heat casualties are a concern to the military, first responders and industrial workers [1, 2, 3].

High Tc is one of the most reliable predictor of heat-related disorders and the ability to accurately monitor this variable could help mitigate the risk of heat injuries [4]. However, the measurement of Tc in an ambulatory setting is not straightforward. Traditional methods of Tc measurement typically require probes (e.g. rectal and oesophageal) but these are impractical for an ambulatory setting. While ingestible thermometer capsules (e.g. Philips Respironics, Murrysville, PA) have been used with success in laboratory and field settings, these instruments are relatively expensive, are unsuitable for individuals with food and drug administration contraindications, and while still in the stomach or upper intestine can suffer acute inaccuracies when cold fluids are ingested. This means that in many situations, the continuous ambulatory monitoring of Tc is still impractical. Alternative Tc surrogate methods, which seek some non-invasive core temperature correlate (e.g. surface heat flux), can be difficult to use consistently across different environments and lose precision when predicting for individuals [5].

Wearable activity trackers have emerged as an increasingly popular method for individuals to assess their daily physical activity and energy expenditure through sensing of physiological data, e.g. HR and surface skin temperature (ST) [6]. One means of overcoming Tc measurement problem is to estimate Tc based on other more readily available data obtained from such body worn sensors. From physiology, both HR and ST are closely related to work and heat stress. Serial HR measurements contain information about heat production [7] and heat transfer since HR is related to skin perfusion [8]. Similarly, because heat can be conducted from deep tissues to skin, an increase in Tc can lead to an elevation of ST over time [9]. Previous studies have also shown the promise of using HR and ST to estimate heat strain [10, 11].

Tapping on the wide availability of physiological measurements from increasingly ubiquitous wearable activity trackers and the physiological basis of associations between Tc with HR and ST, we applied the Kalman filter (KF) technique to track individual-specific Tc over time using time series observations of HR and ST. KF-based methods utilise a prediction-correction scheme to dynamically track and adjust both the system states (Tc for our application) and its uncertainty to agree with measurements (HR and ST) as they are made [12]. The system model expressed as a function of the state variable is used to iterate the distribution of Tc forward in time to produce a prediction, which is then corrected to both adjust the prediction and collapse its uncertainty.

The pursuit of reliable KF models to predict Tc is a subject of active investigation. Buller and co-authors have used the KF technique to estimate Tc by capturing the linear or quadratic relationship between time-varying HR and Tc [13, 14, 15]. Their results have indicated that 95% of all predictions fell within ±0.48–0.63°C for different study cohorts. However, the developmental datasets contained only a limited amount of data at high Tc (≥39°C) and thus most of these statistics are based on the lower Tc values, which may limit the model’s ability to reliably predict hyperthermic body temperatures. Further, the validity of the Tc estimates in human subjects with differing demographics and working in a predominantly hot and humid climate was unclear. We implemented an extended Kalman filter (EKF) model using a non-linear (cubic) state space model (ST versus Tc) with a stage-wise, autoregressive exogenous model (incorporating HR) as the time update model [11]. We showed that the EKF model predicted Tc more precisely [root mean square error (RMSE) was 0.29°C] compared to KF models that relied only on HR as an explanatory variable (RMSD = 0.33°C). However, our model was developed using only laboratory data as developmental data and thus lacked assessment against data measured in the field settings.

While practical, the aforementioned KF models require previous estimates of Tc for continuous prediction of this latent variable. One major inherent limitation of such models is that when the forecast horizon increases, errors in the prediction would accumulate, which would progressively increase the prediction uncertainty even with the Kalman gains. This may give rise to grave clinical consequences since large prediction errors at high core temperature zones (for an individual who works continuously) could delay the application of cooling measures on heat casualties.

The main aim of this paper was to develop and investigate the potential of using online Kalman filter (OKF) models to improve the estimation of Tc over long time horizons as encountered during extended duration high intensity physical tasks, e.g. foot march. The OKF models comprised a time update equation that depends on the initial value of Tc and time-current value of the measurable exogenous variables such that the value of Tc at any time point is directly predicted. The second aim was to assess the comparative accuracy of Tc predictions by the EFK and OFK models vis-à-vis-observed Tc.

## 2. Methods

### 2.1. Data

Data for model development were derived from laboratory- and field-based heat strain profiling studies that involved different participants. The study protocols used in all studies were approved by the Institutional Review Board. All volunteers were briefed on the purpose, risks and benefits of the study and each gave their written informed consent prior to participation.

#### 2.1.1. Study 1 (laboratory study)

A total of 29 male volunteers [mean (range); age = 30 (26–33years), bodyweight = 68.4 (48.9–87.6 kg), height = 1.71 (1.61–1.81 m), body mass index (BMI) = 23.7 (17.3–28.0 kg/m^{2}), body surface area (BSA) = 1.80 (1.52–2.07m^{2})] performed a military 16 km foot march in a climatic chamber. During the trials, all participants donned a standard infantry full battle order (FBO), comprising camouflage uniform, combat boots, body armour, load bearing vest with standard accessories, Kevlar helmet, rifle replica and a backpack filled with additional accessories, for the foot march. All back packs used in the study were packed in the same configuration. The foot march was composed of three rounds of 4 km followed by one round each of 3 km and 1 km marches on the treadmill at 5.3 km/h and 0% gradient, with each exercise bout separated by 15 min seated rest. Water was provided *ad libitum* to all participants. Environmental conditions in the climatic chamber represented those present in hot-humid environments, with a mean dry bulb temperature of 32°C, relative humidity of 70%, solar radiation of 250 W/m^{2} and wind speed of 1.5 m/s. The mean completion time of the full 16 km route march was 255 min.

#### 2.1.2. Study 2 (field study)

A total of 43 male volunteers [age = 24 (18–33 years), bodyweight = 66.4 (49.9–89.3 kg), height = 1.72 (1.58–1.92 m), BMI = 22.4 (17.7–27.6 kg/m^{2}), BSA = 1.79 (1.54–2.09 m^{2})], outfitted in FBO, performed a military 16 km foot march together as a group in the field. The foot march was conducted in the morning with cloudy skies (mean dry bulb temperature, relative humidity and wind speed during the trials were 27°C, 86% and 1.1 m/s, respectively). The foot march was composed of three rounds of 4 km followed by one round each of 3 km and 1 km marches on paved terrain, with each exercise bout separated by 15 min seated rest. All participants had *ad libitum* access to fluid from their water containers, which were refilled during each recess period. The total duration of the trials was approximately 285 min.

#### 2.1.3. Physiological measures

For all heat profiling studies, Tc, HR and ST were recorded every 15 s using a chest belt physiological monitoring system (Equivital EQ02 LifeMonitor^{®}, Hidalgo Ltd., Cambridge, UK) with an associated ingestible thermometer capsule (Philips Respironics, Murrysville, PA). Participants ingested one thermometer capsule at least 8 h prior to the foot march in order to ensure that the capsule had travelled far enough in the intestinal tract to avoid errors from ingested fluids. Each participant’s real-time data were checked for accurate reporting of Tc, HR and ST prior to the trials. Tc data were not used if there were evident signs of fluid signatures (rapid decrease in Tc to below 32°C and slow recovery to normal body temperature).

For data modelling in the present study, Tc, HR and ST measured using the physiological monitoring system were reduced to 1 min intervals by taking the median of four 15 s samples for each 1 min epoch.

### 2.2. Assessment of model performance

Predictive performance of each model against data from study 1 and study 2 was assessed separately using in-sample and out-of-sample analyses. Conducting an in-sample analysis entailed using the model to estimate all observed Tc that formed the database for model training. Out-of-sample analysis: estimating observed Tc time series that was not part of the database for model training: was implemented using a four-fold cross-validation.

For cross-validation, the full dataset from study 1 and study 2 was randomly divided into four groups, each containing 25% of the participants (Tc measurements belonging to the same participant were kept in the same group). Four different subsets of three groups (i.e. 3 × 25% of the studied profiles) were constituted to form four different index groups. Each remaining 25% of the studied profiles constituted a separate test group, generating four independent test groups. Then, a final model was separately identified using the four different index groups. To assess the predictive performance of the final model, the parameter estimates from each of the four subsets (i.e. index group) were used to predict the individual Tc time series in the respective test group.

Various evaluation criteria were used to assess the model performance. These were RMSE, Bland-Altman limits of agreement (LoA) [16] and percentage of prediction-data deviation (i.e. error) that were within ±0.1, 0.3 and 0.5°C [percentage of target attainment (PTA)].

The prediction error is computed using:

where *i*th participant and Tc_{t,i} is the measured (based on the thermometer capsule) value of Tc.

RMSE, a measure of the precision in the predicted Tc, is computed using:

where N and T denote the total number of participants in the relevant dataset and the total number of Tc measurements per participant, respectively.

LoA, which indicates the limits within which 95% of all prediction errors should fall assuming that the errors are normally distributed, is computed using:

where

## 3. Kalman filter models

In this section, we describe the KF approaches proposed by Buller and his co-authors [13, 14, 15], as well as the EKF [11] and the OKF models developed by our group. In the state-space models, Tc is not directly observed but considered as a latent state variable, while the other measurable physiological variables (e.g. HR, ST) are used as observable exogenous variables.

### 3.1. Kalman filter

The KF algorithm uses observed exogenous variables to estimate the latent or unobservable variable. The algorithm recursively operates on streams of noisy input variables to produce statistically optimal estimate of the state variable in a hypothesised state system. Without loss of generality, the system can be represented by a state-space model:

where the functions h(·) and g(·) are differentiable for each state. The transition function is derived from the observation function and the time update equations. The innovations v_{t}, ϵ_{t} and ω_{t} are assumed to follow a Gaussian distribution with mean zero and constant variance. The partial derivatives of the Jacobian matrix can be derived as:

The KF algorithm consists of two steps: predict and update. At any forecast origin t, we have.

Predict:

Update:

where the Kalman Gain

Buller et al. [13] proposed a KF model to predict Tc by tracking the observed exogenous HR time series. The KF model is represented as:

To incorporate the nonlinear dependence between Tc and HR, Buller et al. [14, 15] further proposed a quadratic state space model, which was found to provide better fit in real data analysis:

### 3.2. Extended Kalman filter

Our group extended the aforementioned work by proposing an EKF model in which both HR and ST are considered in the time update function and the nonlinear dependence is used in the time update function [11]. Moreover, work-rest regime-switching models were proposed to describe the different Tc dependency on HR and ST during the march (work) and the recess (rest) states. By permitting different formulations for the march and the rest time periods, we were able to harness the *a priori* knowledge of the work-rest cycles in the developmental data to enhance Tc estimates. Our EFK model is formulated as follows:

EKF:

### 3.3. Online Kalman filter

The classical KF-type models depend on the previous forecasts of Tc, which may introduce significant uncertainty in the estimates when the forecast horizon increases and the prediction errors accumulate. To avoid concatenating forecast errors, we propose using a direct predictive model that relies on the dependence of Tc on its initial value and the latest information of the observed exogenous variables. We name this direct predictive model the online KF (OKF) model. Similar to the EFK model, the OKF model incorporated a regime-switching framework to better account for the varying dependence of Tc on the observed exogenous variables during work and rest periods. At each stage, the latest values of Tc, HR and ST are used to predict Tc:

OKF:

The EKF and the OKF models were seeded with the actual starting Tc as measured by the ingestible thermometer capsule, with the assumption that initial Tc during real-life events could be either estimated or measured prior to the start of a physical activity.

## 4. Results

A total of 17,646 Tc-HR-ST data points were available for model development. The mean and range of Tc were 38.2 and [32.0, 40.1] ^{o}C, respectively. Approximately 5% of all Tc measurements were greater than or equal to 39.0°C.

### 4.1. Final model

For the sake of illustration, parameter estimates for the final EKF and OKF models trained using data from Study 1 (Laboratory Study) are reproduced in this paper. The EKF model is described in the equations below.

(21) |

(22) |

The transition functions are:

(23) |

(24) |

The equations for the final OKF model are provided below, with different values for the four model parameters [φ_{0t}, φ_{1t}, φ_{2t}, φ_{3t}] at different time points. The corresponding author may be contacted for values of these parameters.

### 4.2. In-sample analysis

**Figure 1** and **Table 1** summarise the performance of the final EKF model and the final OKF model on the study 1 data. **Figure 2** and **Table 2** summarise the performance of the final EKF model and the final OKF model on the study 2 data.

Model | RMSE (°C) | LoA (°C) | PTA ± 0.1°C (%) | PTA ± 0.3°C (%) | PTA ± 0.5°C (%) |
---|---|---|---|---|---|

EKF | 0.37 | 0.02 ± 0.72 | 24 | 60 | 82 |

OKF | 0.25 | 0.00 ± 0.49 | 40 | 78 | 95 |

Model | RMSE (°C) | LoA (°C) | PTA ± 0.1°C (%) | PTA ± 0.3°C (%) | PTA ± 0.5°C (%) |
---|---|---|---|---|---|

EKF | 0.51 | 0.07 ± 0.99 | 18 | 49 | 70 |

OKF | 0.27 | 0.00 ± 0.54 | 33 | 75 | 92 |

For both study 1 and study 2, the agreement between the observed and predicted Tc across the range of Tc was greater in the OKF model compared to the EKF model. For instance, under study 1, the LoA attained with the OKF model was [−0.49, 0.49]°C while that derived from the EKF model was [−0.70, 0.74]°C. For Study 2, the scatter plot of the observed versus predicted Tc departed from the line of identity markedly (observed Tc = 0.42 × predicted Tc + 22.15; units = ^{°}C) under the EKF model. By contrast, the scatter plot of the observed Tc versus the OKF model-predicted Tc for the same set of data was randomly distributed along the line of identity. Combined across study 1 and study 2, the OKF model reduced the RMSE by 0.18°C. In addition, for both study 1 and study 2, the proportions of prediction errors within ±0.1, 0.3 and 0.5°C under the OKF model were also higher compared to those under the EKF model. In particular, the PTA ±0.3°C under the OKF model was 75%, which was about 25% higher compared to the PTA ±0.3°C under the EKF model. Collectively, the results indicated that the overall performance of the OKF model was superior to that of the EKF model based on the developmental data.

### 4.3. Out-of-sample analysis

**Tables 3**–**6** report the RMSE, LoA and PTA ±0.1, 0.3 and 0.5°C obtained in each of the four index sets under study 1 and study 2 based on the EKF and OKF approaches. Similar to the in-sample analysis, the comparison between the observed and predicted Tc showed a smaller RMSE and a greater agreement under the OKF model compared to the EKF model.

Index Set | RMSE (°C) | LoA (°C) | PTA ± 0.1°C (%) | PTA ± 0.3°C (%) | PTA ± 0.5°C (%) |
---|---|---|---|---|---|

1 | 0.41 | 0.19 ± 0.71 | 33 | 62 | 72 |

2 | 0.42 | 0.04 ± 0.82 | 18 | 57 | 78 |

3 | 0.30 | 0.06 ± 0.58 | 34 | 69 | 87 |

4 | 0.37 | −0.13 ± 0.68 | 20 | 53 | 87 |

Overall | 0.38 | 0.04 ± 0.70 | 27 | 60 | 81 |

Index Set | RMSE (°C) | LoA (°C) | PTA ± 0.1°C (%) | PTA ± 0.3°C (%) | PTA ± 0.5°C (%) |
---|---|---|---|---|---|

1 | 0.23 | −0.06 ± 0.43 | 40 | 80 | 98 |

2 | 0.45 | 0.08 ± 0.87 | 23 | 57 | 75 |

3 | 0.31 | −0.08 ± 0.58 | 32 | 69 | 89 |

4 | 0.34 | 0.05 ± 0.65 | 29 | 74 | 91 |

Overall | 0.33 | 0.00 ± 0.63 | 31 | 70 | 88 |

Index Set | RMSE (°C) | LoA (°C) | PTA ± 0.1°C (%) | PTA ± 0.3°C (%) | PTA ± 0.5°C (%) |
---|---|---|---|---|---|

1 | 0.47 | −0.02 ± 0.92 | 21 | 55 | 76 |

2 | 0.47 | 0.25 ± 0.78 | 17 | 48 | 70 |

3 | 0.43 | −0.08 ± 0.83 | 23 | 57 | 80 |

4 | 0.58 | 0.07 ± 1.12 | 12 | 36 | 59 |

Overall | 0.49 | 0.06 ± 0.91 | 18 | 49 | 71 |

Index Set | RMSE (°C) | LoA (°C) | PTA ± 0.1 °C (%) | PTA ± 0.3 °C (%) | PTA ± 0.5 °C (%) |
---|---|---|---|---|---|

1 | 0.53 | 0.00 ± 1.04 | 27 | 70 | 87 |

2 | 0.45 | 0.18 ± 0.81 | 26 | 61 | 83 |

3 | 0.39 | −0.06 ± 0.75 | 31 | 68 | 87 |

4 | 0.56 | −0.06 ± 1.09 | 24 | 65 | 84 |

Overall | 0.48 | 0.01 ± 0.92 | 27 | 66 | 85 |

When averaged across all the index sets and both study 1 and study 2, the RMSE fell by 0.03°C and the PTA increased by 13% under the OKF model vis-a-vis the EKF model. In addition, the overall agreement between the observed and predicted Tc was closer under the OKF model. These trends were also evident at the index set level. Using index set 1 of study 1 dataset as an example, the RMSE under the EKF model was 0.41°C, which was larger compared to the OKF model’s RMSE (0.23°C). As a further indication of the superior performance of the OKF model, the LoA under the OKF model was narrower compared to that under the EKF model [(−0.49, 0.37)^{°}C versus (−0.52, 0.9) ^{°}C].

**Figure 3** shows a comparison between the mean observed and EKF/OKF-predicted Tc time series for study 1 and study 2. The results showed that the mean Tc versus time profile generated by the OKF model largely matched that of the observed mean Tc time series. By contrast, mean Tc predictions produced from the EKF model were observed to deviate from the observed mean Tc and lie outside of the 95% confidence interval of the Tc measurements at various time periods during the foot march.

**Figure 4** compares the mean error time series from the EKF and OKF models in study 1 and study 2. While the mean errors (prediction bias) were observed to be generally stable and contained to under approximately ±0.1°C across all time instances for the OKF model, those of the EKF model were comparatively larger in magnitude. In addition, the mean error from the EKF model was also observed to expand in magnitude with increasing time for both the laboratory and field datasets.

## 5. Discussion

In this study, the EKF and OKF models were validated against Tc measurements obtained from volunteers who participated in a high intensity foot march typically performed in the military. When pooled across study 1 and study 2, approximately 5% of all Tc measurements were equal to or greater than 39°C. This represented a respectable data volume for model assessment at the high thermal work zone. Using only measures of HR and ST, our results showed that the models estimated Tc with a small overall bias of 0.03°C, which was within the individual physiological variation of ±0.25°C [17]. In addition, the overall RMSE of the EKF and OKF models (0.31 and 0.45°C) were also comparable to those found in other comparisons of different measures of human core temperature (rectal probe versus oesophageal probe, rectal probe versus thermometer capsule and oesophageal probe versus thermometer capsule) [18].

The aforementioned observations notwithstanding, our results clearly indicated differences in the accuracy of the EKF and OKF approaches in Tc time series prediction during the studied high intensity foot march in both laboratory and outfield conditions. Classical Kalman filter strategies fundamentally rely on known model and noise information. Consequently, as depicted by results from the EKF approach, they cannot compensate for the effect of model-process mismatch and concatenating noise uncertainty. Our results showed that the OKF approach can estimate Tc continuously across time with less error than EKF model. Moreover, prediction bias arising from the OKF model appeared to be more stable in magnitude over time compared to that of the EKF model. This is significant in the practical settings because a progressively larger prediction error under a longer forecast horizon may lead to more false positives or false negatives for high thermal work strain. If the EKF model is deployed for tracking individualised heat strain, healthy workers with no imminent heat injury risk may be withdrawn from the physical activity prematurely (reducing work efficiency and processes) or actual heat casualties may not be identified, with the second scenario (false negatives) a more problematic one compared to the first (false positives). This makes the OKF method a more promising approach than the EKF method for predicting Tc based on real-time wearable sensor data in a continuous manner.

Technologies that reliably assess Tc in a non-invasive manner are expected to play a crucial role in supporting the development of tools, methods and techniques to enhance productivity, safety and well-being of military, first responders and industrial workers. During military training and operations, real-time monitoring of Tc can allow each soldier’s thermo-physiological state to be assessed, which permits commanders to take effective measures to intervene and mitigate heat injuries. Monitoring of Tc of every firefighter in the fireground can provide objective information to either empower the trooper to stay in longer to finish a job or warn the trooper to exit the fireground sooner. In addition, the use of physiological monitoring, coupled with work physiology and ergonomics concepts, can foster the creation of innovative workforce management procedures allowing enhancements not only in productivity, but also in civilian workers’ well-being and safety.

The main limitation of the present study is the usage of only Tc measurements from the military foot march for modelling. Such developmental data may limit the model’s ability to reliably calculate Tc of human subjects in non-military tasks, e.g. first responders operating in uncompensable heat stress environments, civilian construction workers and professional sports athletes geared with light clothing. In the future, the strong influence of ST on Tc in our mathematical model will be verified in human subjects operating in clothing systems that either severely limit heat dissipation or facilitate sweat evaporation under less humid conditions. The current study did not assess the reliability of the model on repeated Tc measures derived on different trial occasions. Future work will include testing our Kalman filter model’s reliability and precision on different test occasions based on repeated measures data from the same subjects. Last, while we showed that the OKF approach can estimate Tc with less error than the EKF model, appreciable variability in the Tc still remains unexplained by HR and ST. Future work will include the evaluation of breathing rate to improve Tc estimations since hyperthermia has been shown to increase ventilation [19].

## 6. Conclusions

In this paper, we have reported two different Kalman filter approaches for predicting real-time Tc trajectories of subjects engaged in a high intensity physical activity. In particular, we introduced the OKF model where the time update equation depends only on the initial value of Tc and time-current values of the exogenous variables. Both models leverage time-varying values of ST and HR to predict subject-specific Tc. Overall, Tc predictions from the OKF model matched the observed Tc better compared to those from the EFK models. Future work includes testing and qualification of our model against additional heat strain datasets including those derived from non-foot march tasks, and investigation of the influence of further exogenous observations, such as body acceleration, on Tc. While this approach may not be a complete replacement for direct Tc measurement, it offers a simple and promising new method to estimate subject-specific Tc in a non-invasive manner, and is accurate and practical enough for real-time monitoring of thermal work strain.