Performance of plans when the in-control TBEs are Weibull distributed with scale = 0.035 and shape = 1.25.

## Abstract

There are control charts for Poisson counts, zero-inflated Poisson counts, and over dispersed Poisson counts (negative binomial counts) but nothing on counting processes when the time between events (TBEs) is Weibull distributed. In our experience the in-control distribution for time between events is often Weibull distributed in applications. Counting processes are not Poisson distributed or negative binomial distributed when the time between events is Weibull distributed. This is a gap in the literature meaning that there is no help for practitioners when this is the case. This book chapter is designed to close this gap and provide an approach that could be helpful to those applying control charts in such cases.

### Keywords

- average run length
- counts
- monitoring
- time between events

## 1. Introduction

Statistical process control and monitoring (SPCM) methods originally arose in the context of industrial/manufacturing applications, developed during and after World War II. Since then, it has become a popular way of monitoring all processes. Today, large volumes of data are often available from a variety of sources, in a variety of environments that need to be monitored. This means one needs to make sense of these data and then be able to make efficient monitoring decisions based on them. While constructing and applying monitoring tools, a fundamental assumption, necessary to justify the end results, involves the assumption about the distribution that the data have been generated from. This is the heart of outbreak detection of events particularly for estimating the in-control false discovery rates.

Selecting an appropriate probability distribution for the data is one of the most important and challenging aspects of data analysis. The estimates of outbreak false discovery rates often hinge on this crucial selection. We focus on the distribution for the time between events (TBEs) because there are often many of these events as compared to counts. Therefore, it is easier to fit an appropriate distribution using these many TBE values. The most commonly assumed distribution in the application of TBE is the Weibull distribution which is asymmetric and sometimes severely skewed. However, depending on the context, other distributions may also be used, such as the exponential distribution (which is a special case of a Weibull distribution) or the gamma distribution. If the TBE distribution is exponentially distributed, then the related counts are Poisson distributed. This book chapter focusses on counting processes when the distribution of TBE values is known to be Weibull distributed.

The challenge of making and meeting the distributional assumption is faced by all practitioners and data analysts. In many monitoring settings, event data are collected in a nearly continuous stream, and it is often more meaningful to monitor the individual TBE data [1, 2, 3, 4] when outbreaks are of large magnitude. This individual event data are aggregated over fixed time intervals (e.g., daily) to form counts. In this chapter, these counts are monitored to detect outbreaks resulting from small changes in the incident of events. The focus is on the steady-state situation because this is the most common situations in event monitoring. Note that we cannot stop the process and investigate the out-of-control situation because often in nonmanufacturing settings it is not under our control. Events may include warranty claims of a product, health presentation at emergency departments, sales of an online products, etc. Here the term “quality of the process” is used in a general sense, which is context dependent. In the case of sales, an outbreak would represent an increased sales opportunity, provided the inventory stock can support this outbreak and the products are not sold out before the next order arrives. However, for warrantee claims, this would represent an undesirable outbreak of increased claims which may require a failure mode effect analysis [5]. Monitoring of in-control nonhomogeneous counting processes has traditionally been carried out using either Poisson or the negative binomial distributions for the counts. Many statistical tools used in SPCM for counts data are documented in Sparks et al. [1, 2, 3], Sparks et al. [6], Weiß [7, 8], Weiß and Testik [9, 10], Yontay et al. [11], and Albarracin et al. [12]. These control charts are perhaps the most well-known count monitoring methodologies. The control chart graphic is a time series plot of a signal-to-noise ratio designed for the user to make decisions about the outbreak of events.

In any case, before defining an “in-control” process, we need the information about the probability distribution of the events being monitored. When this information is available, it is possible to calculate the probabilities (or the chance) from the event in-control distribution, which could be defined by the TBEs or the counting of events within a fixed time interval. Deciding on whether event distribution constitutes an event outbreak is based on whether the event distribution is extreme compared to what is usual, i.e., counts are higher than expected. This is usually gauged by some upper threshold for the signal-to-noise ratios.

In a vast majority of SPCM applications, it is common to assume that the underlying probability distribution is of a (given) known form. In this chapter we assume that the TBEs are distributed as a Weibull distribution. However, we explore approaches to monitoring the counts of these events over fixed periods of varying length to find the period width that leads to earlier detection of outbreaks in terms of the average time to signal (ATS). The distribution of these counts when the TBEs are Weibull distributed is neither Poisson nor negative binomial distributed. Therefore, this chapter offers a different approach compared to others in the literature. In addition, the appropriate period of aggregation for the counts is explored in terms of how it influences early detection of outbreak events.

In practice event data are often collected in a continuous stream defined by the TBEs. Besides, wherever outbreaks are of large magnitude, it is more meaningful to monitor these individual TBE values [4]. In such situations, one also needs to deal with the issue of autocorrelation which may be thought of as the effect of time, as data values in close proximity with respect to time and space are likely dependent. This violates one of the basic assumptions in process monitoring, and a common way to deal with this issue is by fitting a time series model and monitoring the standardized residuals. The assumption of a distribution is also an important part of this analysis. In this chapter, our focus is on the Weibull distributional assumption for the TBE values. However, rather than monitoring the TBEs, we monitor the counts over a fixed time interval, because this improves the early detection of smaller outbreaks (Sparks et al., [4]).

## 2. Monitoring homogeneous counts

Considering a fixed distribution for TBEs during the monitoring period, we assume that the time of the day and date stamp of all events are available. For a series comprised of n events, let the day numbers for events be denoted as

with the event times within days which are defined as

Note that these times are measured in fractions of a day, e.g.,

where * i*th events. We flag an outbreak wherever

Denote these counts as * i*th day. We define the exponentially weighted moving average (EWMA) statistic for these daily homogeneous counts as follows:

where

## 3. Monitoring in-control nonhomogeneous counts

For most of the real-world cases, TBE distribution may vary due to different circumstances while not experiencing an outbreak. For instance, if we aggregate the TBEs in daily intervals, occurring intervals for the events may vary based on the day of the week, and even working and non-working days may affect the distribution of TBEs. As a result, we often face nonhomogeneous count processes. Hence, we define an adaptive exponentially weighted move average (AEWMA) statistic for nonhomogeneous daily counts as

where* t*, and

The

## 4. Simulation results

To assess the validity and applicability of our proposed adaptive method, we employ simulations studies. We restrict our attention to plans that have ** –**4. The lowest ATS values are colored in the tables below in black bold text to make it easier to see which plans are more efficient in certain situations.

EWMA counts (), shape = 1.25 | ||||||||
---|---|---|---|---|---|---|---|---|

Threshold | 31.835 | 32.162 | 32.524 | 32.807 | 33.207 | 33.470 | 33.803 | 34.108 |

Scale | ||||||||

0.04 | 0.06 | 0.08 | 0.10 | 0.125 | 0.15 | 0.175 | 0.20 | |

0.035 | 100.05 | 101.16 | 100.41 | 101.95 | 100.31 | 100.02 | 100.55 | 101.21 |

0.034 | 41.952 | 40.255 | 41.552 | 40.954 | 38.936 | 40.372 | 43.733 | |

0.033 | 19.116 | 18.757 | 19.098 | 19.564 | 19.106 | 19.538 | 20.302 | |

0.032 | 13.054 | 11.924 | 11.721 | 11.359 | 10.733 | 11.370 | 11.843 | |

0.031 | 8.598 | 8.231 | 8.156 | 7.860 | 7.847 | 7.515 | 7.566 | |

0.030 | 6.774 | 5.937 | 5.943 | 5.749 | 5.628 | 5.508 | 5.437 | |

0.029 | 5.824 | 4.715 | 4.724 | 4.696 | 4.426 | 4.211 | 4.183 | |

0.028 | 4.408 | 4.149 | 3.912 | 3.739 | 3.660 | 3.424 | 3.446 | |

0.027 | 3.699 | 3.358 | 3.386 | 3.025 | 3.002 | 2.886 | 2.832 |

EWMA counts (), shape = 1.15 | ||||||||
---|---|---|---|---|---|---|---|---|

Threshold | 36.20 | 37.058 | 37.4917 | 37.904 | 38.315 | 38.725 | 39.1025 | 39.515 |

Scale | ||||||||

0.04 | 0.06 | 0.08 | 0.10 | 0.125 | 0.15 | 0.175 | 0.20 | |

0.03 | 205.09 | 200.54 | 201.68 | 201.21 | 200.23 | 200.82 | 200.96 | 200.51 |

0.029 | 50.792 | 54.813 | 53.764 | 52.916 | 54.227 | 56.616 | 61.987 | |

0.028 | 22.027 | 21.034 | 20.747 | 21.624 | 22.292 | 23.361 | 23.265 | |

0.027 | 12.791 | 11.790 | 11.953 | 11.655 | 11.344 | 11.397 | 12.319 | |

0.026 | 8.678 | 8.414 | 7.811 | 7.414 | 7.211 | 7.333 | 7.497 | |

0.025 | 6.718 | 6.163 | 5.718 | 5.508 | 5.444 | 5.243 | 5.162 | |

0.024 | 5.078 | 4.650 | 4.462 | 4.283 | 4.228 | 3.921 | 3.920 | |

0.023 | 4.482 | 4.010 | 3.742 | 3.558 | 3.465 | 3.190 | 3.155 |

EWMA counts (), shape = 0.95 | ||||||||
---|---|---|---|---|---|---|---|---|

Threshold | 41.35 | 41.9028 | 42.4899 | 42.9932 | 43.5669 | 44.1448 | 44.7079 | 45.1527 |

Scale | ||||||||

0.04 | 0.06 | 0.08 | 0.10 | 0.125 | 0.15 | 0.175 | 0.20 | |

0.025 | 302.19 | 301.69 | 302.03 | 301.51 | 302.29 | 301.85 | 301.29 | 301.83 |

0.024 | 56.110 | 59.550 | 65.442 | 63.928 | 66.311 | 78.925 | 74.448 | |

0.023 | 23.359 | 22.686 | 22.031 | 22.855 | 23.860 | 25.744 | 26.173 | |

0.022 | 12.681 | 11.357 | 11.807 | 11.412 | 11.493 | 12.104 | 12.056 | |

0.021 | 8.784 | 8.253 | 7.723 | 7.394 | 7.326 | 7.196 | 7.231 | |

0.0205 | 7.474 | 6.866 | 6.855 | 6.410 | 5.997 | 6.113 | 5.977 | |

0.020 | 6.788 | 5.962 | 5.416 | 5.368 | 5.075 | 5.038 | 5.136 | |

0.019 | 4.840 | 4.707 | 4.415 | 4.032 | 3.942 | 3.844 | 3.805 |

EWMA counts (), shape = 0.85 | ||||||||
---|---|---|---|---|---|---|---|---|

Threshold | 48.7256 | 49.5976 | 50.3141 | 50.9389 | 51.6532 | 52.35 | 53.0104 | 53.5827 |

Scale | ||||||||

0.04 | 0.06 | 0.08 | 0.10 | 0.125 | 0.15 | 0.175 | 0.20 | |

0.02 | 401.92 | 401.28 | 399.92 | 400.89 | 399.89 | 401.09 | 401.08 | 399.93 |

0.0195 | 121.88 | 123.74 | 135.49 | 144.80 | 145.77 | 145.82 | 156.46 | |

0.019 | 51.948 | 51.531 | 57.694 | 57.722 | 63.461 | 66.637 | 69.652 | |

0.018 | 19.344 | 18.374 | 17.494 | 17.663 | 18.892 | 19.627 | 19.487 | |

0.0175 | 12.846 | 12.711 | 12.370 | 12.263 | 12.272 | 12.651 | 12.765 | |

0.017 | 10.966 | 10.696 | 9.583 | 9.032 | 9.052 | 9.179 | 9.074 | |

0.0165 | 8.261 | 8.510 | 7.599 | 7.301 | 6.946 | 6.948 | 6.954 | |

0.016 | 7.191 | 6.386 | 6.198 | 5.803 | 5.666 | 5.429 | 5.456 | |

0.015 | 5.474 | 4.868 | 4.354 | 4.119 | 4.031 | 4.047 | 3.862 |

Table 1 shows the performance results for the plans employed to monitor counting process where the in-control data are Weibull with scale and shape parameters of 0.035 and 1.25, respectively.

Table 2 shows the performance results of the plans when the shape and the scale parameters of the Weibull distribution are equal to 1.15 and 0.03, respectively. The

Table 3 shows the performance results of the plans when the shape and the scale parameters of the Weibull distribution are equal to 0.95 and 0.025, respectively. The

## 5. Real-world example

In this section, we apply our proposed method to a real-world example. The counting process to monitor is the number of presentations at Gold Coast University Hospital emergency department for a broad definition of influenza. The events are presentations at Gold Coast University Hospital with flu symptoms. Data is gathered for four consecutive years starting from January 2015. We first check if the TBEs are Weibull distributed. We fitted a Weibull regression model to data, and all the parameters of the conditional distribution of the response variable are modeled using explanatory variables as the hour of the day harmonics, seasonal harmonics, and day of the week. To do so, we employ “gamlss” [13] R package for statistical modeling. This package includes functions for fitting the generalized additive models for location, scale, and shape introduced by Rigby and Stasinopoulos [14]. The R procedure for the model fitting is as follows:

where wd, nday, and hr. are week day, number of the day in a year, and the time of day (0–24), respectively. Figure 1 summarizes the analysis of the residuals for the fitted model. As shown in Figure 1, since residuals do not represent any particular pattern in data, and the model describes the response variables quite well, then we conclude that TBEs are Weibull distributed. Details for the analyzing the model adequacy is presented in Appendix B.

As mentioned earlier, the threshold,

## 6. Concluding remarks

In this chapter, we proposed an adaptive EWMA surveillance plan to monitor a counting process of which the time between its events is Weibull distributed. The proposed method can be applied to both homogeneous and nonhomogeneous processes. To implement the proposed surveillance plan, the scale and shape parameters for the underlying distribution of the TBEs are estimated using a distributional regression approach [15]. Then the threshold for the counts is established using the estimated parameters and the desired ARL. The proposed plan is applied to both simulated and real data. Simulation results indicate that the proposed method is applicable for detecting outbreaks of any magnitude and also signals them in a reasonable time after their incidence. In addition, simulations revealed that for the detection of the large outbreak, plans with larger smoothing parameter are superior. However, for the early detection of small outbreaks, we need to employ smaller smoothing weights. Applying the proposed surveillance method to real data, we conclude that the proposed method is capable of detecting outbreaks in nonhomogeneous counting processes.

The thresholds for the linear model fitted

The multiply R-squared for all these models is equal to 1 corrected to the sixth decimal place. The square of the fitted values for these models provides an starting estimate of the threshold for the respectively in-control ARL of 100, 200, 300, or 400. This can be further revised using simulation (Table A1).

In-control ARL | 100 | 200 | 300 | 400 |
---|---|---|---|---|

(Intercept) | 55.27965623580790 | 14.50063556867970 | 35.4751774678155 | 1.62174314752660 |

scale | −626.65874427126700 | −750.65355682164500 | −853.0616160220110 | −666.97214806267300 |

shape | 54.99181051902600 | 6.91623162316955 | 26.9351743566771 | −2.52567948128196 |

log(scale) | −8.64992403297571 | −8.66215508213318 | −8.6812026091707 | −8.66460977364899 |

log(shape) | 16.34706116414410 | 6.99770664515964 | 12.4345299969720 | 3.32335746029235 |

sqrt(shape) | −132.57740079984300 | −43.82444799901730 | −84.7132341475612 | −21.35094544552350 |

scale:shape | −224.25112834314800 | −263.89677049308800 | −301.0663396807370 | −236.64136324217100 |

scale:log(scale) | −69.61113437236530 | −70.49847595504570 | −69.8492890948244 | −69.64522997112020 |

scale:log(shape) | −160.56646815370100 | −201.15404356846600 | −235.8054471494210 | −174.62069894929300 |

shape:log(scale) | 2.19016501325625 | 2.17689719954047 | 2.2105503473413 | 2.20073389465297 |

shape:log(shape) | −7.97657987867648 | 4.93365063336234 | 0.1464213924397 | 6.88699339675467 |

log(scale):log(shape) | −3.19688371718715 | −3.19951671299892 | −3.1780225086612 | −3.20431683519426 |

scale:sqrt(shape) | 795.53717813564600 | 957.71276755290300 | 1098.4086506150800 | 848.24798318601300 |