Simulation-Based Comparative Analysis of Nonparametric Control Charts with Runs-Type Rules

In this chapter, we study well-known distribution-free Shewhart-type monitoring schemes based on order statistics. In order to empower the in- and out-of-control performance of the control charts being under consideration, several runs-type rules are enhanced. The simulation-based experimentation carried out reveals that the proposed schemes achieve remarkable efficiency for detecting possible shifts in the distribution of the underlying process.


Introduction
Statistical process control is widely applied to monitor the quality of a production process, where no matter of how thoroughly it is maintained, a natural variability always exists. Control charts help the practitioners to identify assignable causes so that the state of statistical control can be accomplished. Generally speaking, when a cast-off shift in the process takes place, a control chart should detect it as quickly as possible and produce an out-of-control signal.
Shewhart-type control charts were introduced in the early work of Shewhart [1], and since then, several modifications have been established and studied in detail. For a thorough study on statistical process control, the interested reader is referred to the classical textbooks of Montgomery [2] or Qiu [3]. Most of the monitoring schemes are distribution-based procedures, even though this assumption is not always realized in practice. To overcome this obstruction without disrupting the primary formation of the traditional control charts, several nonparametric (or distribution-free) monitoring schemes have been proposed in the literature. The plotted statistics being utilized for constructing such type of control charts are related to well-known nonparametric testing procedures. Among others, a variety of distribution-free control charts appeared already in the literature are based on order statistics; see, e.g., Chakraborti et al. [4], Balakrishnan et al. [5], or Triantafyllou [6,7]. For an up-to-date account on nonparametric statistical process control, the reader is referred to the review chapter of Koutras and Triantafyllou [8], the recent monograph of Chakraborti and Graham [9], or Qiu [10,11].
In the present chapter, we study well-known distribution-free Shewhart-type monitoring schemes based on order statistics. The general setup of the control charts being in consideration is presented in Section 2, while their performance characteristics are investigated based on the algorithm described in Section 3. In order to enhance the ability of the proposed monitoring schemes for detecting possible shifts of the process distribution, some well-known runs-type rules are considered. In Section 4, we carry out extensive simulation-based numerical comparisons that reveal that the underlying control charts outperform the existing ones under several out-of-control scenarios.

The setup of well-known nonparametric control charts based on order statistics
Let us assume that a reference random sample of size m, say X 1 , X 2 , … , X m , is drawn independently from an unknown continuous distribution F, namely, when the process is in-control. The control limits of the distribution-free monitoring scheme are determined by exploiting specific order statistics of the reference sample. In the sequel, test samples are drawn independently of each other (and also of the reference sample) from a continuous distribution G, and the decision whether the process is still in-control or not rests on suitably chosen test statistics. The framework for constructing nonparametric control charts based on order statistics calls for the following step-by-step procedure.
Step 1. Draw a reference sample of size m, namely, X 1 , X 2 , … , X m , from the process when it is known to be in-control.
Step 2. Form an interval by choosing appropriately a pair of order statistics from the reference sample (say, e.g., Step 3. Draw independently future (test) samples of size n, namely, Y 1 , Y 2 , … , Y n , from the underlying process.
Step 4. Pick out l order statistics (0 < l ≤ n) from each test sample.
Step 5. Determine the number of observations of each test sample, say R that lie between the limits of the interval X a , X b ð Þ.
Step 6. Configure the signaling rule by utilizing both the statistics R and the l ordered test sample observations as monitoring statistics.
The implementation of the above mechanism does not require the assumption of any specific probability distribution for the underlying process (measurements). The reference sample (usually of large size) is drawn from the underlying in-control process, while test (Phase II) samples are picked out from the future process in order to decide whether the process remains in-control or it has shifted to an out-ofcontrol state. The proposed monitoring scheme is likely to possess the robustness feature of standard nonparametric procedures and is, consequently, less likely to be affected by outliers or the presence of skewed or heavy-tailed distributions for the underlying populations.
It is straightforward that the proposed framework requires the construction of more than one control charts, which monitor simultaneously the underlying process. In fact, the design parameter l is connected to the number of the control charts which are needed to be built for trading on the aforementioned mechanism. Indeed, for each one of the l order statistics from the test sample, a separate two-sided control chart should be constructed.
The family of distribution-free monitoring schemes presented earlier includes as special cases some nonparametric control charts, which have been already established in the literature. For example, the monitoring scheme established by Balakrishnan et al. [5] calls for the following plotted statistics: • A quantile Y j:n of the test sample which is compared with the control limits X a , X b ð Þ • The number of observations from the test sample that lie between the control limits It goes without saying that the control chart introduced by Balakrishnan et al. [5] (Chart 1, hereafter) belongs to the family of monitoring schemes described previously. In fact, the BTK chart could be seen as a special case of the aforementioned class of distribution-free control schemes with l ¼ 1. According to Chart 1, the process is declared to be in-control, if the following conditions hold true X a:m ≤ Y j:n ≤ X b:m and R ≥ r, where r is a positive integer. In addition, the monitoring scheme introduced by Triantafyllou [6] (Chart 2, hereafter) takes into account the location of two order statistics of the test sample drawn from the process along with the number of its observations between the control limits. In other words, the aforementioned control chart could be viewed as a member of the general class of nonparametric monitoring schemes with l ¼ 2. According to Chart 2, the process is declared to be in-control, if the following conditions hold true X a:m ≤ Y j:n ≤ Y k:n ≤ X b:m and R ≥ r, where r is a positive integer. In a slightly different framework, Triantafyllou [7] proposed a distribution-free control chart based on order statistics (Chart 3, hereafter) by taking advantage of the position of single ordered observations from both test and reference sample. More precisely, Chart 3 asks for an order statistic of each test sample (say Y j:n ) to be enveloped by two prespecified observations X a:m and X b:m of the reference sample, while at the same time an ordered observation of the reference sample (say X i:m ) be enclosed by two predetermined values of the test sample Y c:n , Y d:n ð Þ . Chart 3 makes use of an in-control rule, which embraces the following three conditions: Condition 1. The statistic Y j:n of the test sample should lie between the observations X a:m and X b:m of the reference sample, namely, X a:m ≤ Y j:n ≤ X b:m . Condition 2. The interval Y c:n , Y d:n ð Þformulated by two appropriately chosen order statistics of the test sample should enclose the value X i:m of the reference sample, namely, Y c:n ≤ X i:m ≤ Y d:n .
Condition 3. The number of observations of the Y-sample that are placed enclosed by the observations X a:m and X b:m should be equal to or more than r, namely, R ≥ r.

The simulation procedure and some results
In the present section, we describe the step-by-step procedure which has been followed in order to determine the basic performance characteristics of monitoring schemes mentioned previously. Two well-known runs-type rules are implemented in order to improve the performance of the control charts being considered. More precisely, if we denote by LCL and UCL the lower and the upper control limit of the underlying monitoring scheme, we apply the following runs rules • The 2-of-2 rule. Under this scenario, an out-of-control signal is produced from the control chart, whenever two consecutive plotted points fall all of them either on or above the UCL or all of them fall on or below the LCL (see, e.g., Klein [12]).
• The 2-of-3 rule. Under this scenario, an out-of-control signal is produced from the control chart, whenever two out of three consecutive plotted points fall outside the control limits LCL, UCL ð Þof the corresponding scheme.
We next illustrate the detailed procedure for determining the performance of Chart 2 enhanced with the 2-of-3 rule. It goes without saying that a similar algorithm has been constructed in order to study the corresponding characteristics of the remaining control schemes, namely, Chart 1 and Chart 3 enhanced with either the 2-of-2 or the 2-of-3 runs rule.
Step 1. Generate a reference sample of size m from the in-control distribution F and k 2 test samples of size n from the out-of-control distribution G.
Step 2. Determine the control limits of the monitoring scheme Chart 2, by selecting appropriately the parameters a, b, r.
Step 3. Calculate the test statistics Y j , Y k , R for each test sample, and examine whether Chart 2 produces an out-of-control signal or not, namely, whether at least one of the conditions mentioned in (2) is violated.
Step 4. Define a dummy variable T i , i ¼ 1, 2, … , k 2 for each test sample separately. The variable T i takes on the value 0 when all conditions in (2) are satisfied, while it takes on the value 1 otherwise.
Step 5. Determine all consecutive (uninterrupted) triplets consisting of T i 0 s elements, namely, all triplets Define the dummy variable D j , j ¼ 1, 2, … , k 2 À 2 for each triplet separately. The variable D j takes on the value 0 when the triplet consists of at least two 0s, while it takes on the value 1 otherwise.
Step 6. Calculate the alarm rate of the monitoring scheme as AR ¼ Þ. When F = G, the aforementioned probability indicates the false alarm rate of the monitoring scheme, while in case of different distributions F, G the AR corresponds to its out-of-control alarm rate.
Step 7. Define a variable RL h , h ¼ 1, 2, … , H which counts the number of D j 0 s elements, till the first appearance of a D j equal to 1. The so-called average run length of the monitoring scheme is calculated as ARL ¼ P H h¼1 RL h =H. When F = G F 6 ¼ G ð Þ, the aforementioned quantity indicates the in-control (out-of-control) average run length of the monitoring scheme.
All steps 1-7 are repeated k 1 times and the performance characteristics of the proposed Chart 2 enhanced with 2-of-3 runs rule, namely, the false alarm rate (FAR, hereafter), the out-of-control alarm rate (AR out , hereafter), the in-control average run length (ARL in , hereafter), and the out-of-control run length (ARL out , hereafter) are estimated as the mean values of the corresponding k 1 results produced by steps 6 and 7, respectively.
In order to ascertain the validity of the proposed simulation procedure described above, we shall first apply the algorithm without embodying any runs-type rule and compare the simulation-based outcomes to the corresponding results produced by the aid of the theoretical approximation appeared in Triantafyllou [6]. The simulation study has been accomplished based on the R software environment and involves 10.000 replications. Table 1 displays several designs of the monitoring scheme mentioned as Chart 2 with a nominal level of in-control performance. Since we consider the same designs as those presented by Triantafyllou [6], the exact FARs have been taken from his Table 1. As it is easily observed, the simulationbased results seem to be quite close to the exact values in all cases considered. For example, let us assume that we draw a reference sample of size m = 60 and test samples of size n = 5. In order to achieve a prespecified in-control performance level, namely, FAR equal to 1%, the remaining parameters are determined as a = 1, j = 2, k = 4, and r = 1. Under the aforementioned design, Triantafyllou [6] computed the exact FAR equal to 0.0096, while the simulation-based procedure proposed in the present chapter gives a corresponding FAR value equal to 0.0116.
A different approach for appraising the ability of a monitoring scheme to detect a possible shift in the underlying distribution is based on its run length. We next focus on the waiting time random variable N, which corresponds to the amount of random test samples up to getting the first out-of-control signal from the monitoring scheme, in order to evaluate its performance. Table 2 displays the exact and the simulation-based average run length for several designs of Chart 2 that meet a desired nominal level of in-control performance. The exact values of ARL needed for building up Table 2, have been picked up from Triantafyllou [6] and more specifically from Table 2 therein.
As it is readily observed, the simulation-based results seem to be quite close to the exact values in all cases considered. For example, let us assume that we draw a reference sample of size m = 400 and test samples of size n = 5. In order to achieve a prespecified in-control performance level, namely, ARL in equal to 370, the remaining parameters are determined as a = 5, b = 379, j = 2, k = 3, and r = 2. Under the aforementioned design, Triantafyllou [6] computed the exact ARL in equal to 378.7, while the simulation-based procedure proposed in the present chapter gives a corresponding ARL in value equal to 379.1. We next focus on the ability of the distribution-free monitoring scheme defined in (1), under the assumption that the process has shifted to an out-of-control state. When the process has shifted from distribution F to G, then the ability of the scheme to detect the underlying alteration is associated with the function G∘F À1 . For example, under the well-known Lehmann-type alternative (see, e.g., van der Laan and Chakraborti [13]), the out-of-control distribution function can be expressed as G ¼ F γ , where γ > 0. Table 3 sheds light on the out-of-control performance of Chart 2 by offering the corresponding alarm rate of the proposed scheme under the Lehmann alternatives with parameter γ ¼ 0:2, 10. Since we consider the same designs as those presented by Triantafyllou [6], the exact values of AR out have been copied from his Table 3, while the simulated results have been produced by following the procedure described earlier. Each cell contains the ARs attained for γ = 0.2 (upper entry) and γ = 10 (lower entry). Based on the above table, it is evident that the proposed simulation algorithm seems to come to an agreement with the corresponding exact values of the out-ofcontrol alarm rate of Chart 2. For example, for a design (a, j, k, r) = (10, 2, 4, 2) with reference sample size m = 200 and test sample size n = 5, the exact alarm rate for a shift to Lehmann alternative with parameter γ = 0.2 (10) equals to 86.09% (66.31%), while the simulation-based alarm rate of Chart 2 is quite close to the exact one, namely, it equals to 86.61% (64.14%).

The proposed control charts enhanced with runs-type rules
In this section, we carry out an extensive numerical experimentation to appraise the ability of the distribution-free monitoring schemes Chart 1, Chart 2, and Chart 3  Underlying distributions: Exponential with mean equal to 2 (in-control) and 1 (out-of-control) respectively. enhanced with runs-type rules for detecting possible shifts of the underlying distribution. The computations have been made by the aid of the simulation procedure presented in Section 3. Tables 4 and 5 display the improved out-of-control performance of Chart 1, when the 2-of-3 runs-type rule is activated. We first compare the performance of the control charts by using a common ARL in and then evaluating the respective ARL out for specific shifts. Consider the case of a process with underlying in-control exponential distribution with mean equal to 2 and out-of-control exponential distribution with mean equal to 1. In Table 4, we present the ARL out values of Chart 1 and the proposed Chart 1 enhanced with 2-of-3 runs-type rule, for ARL in ¼ 370, 500, m ¼ 100, 500, and n ¼ 5, 11. The remaining design parameters a, b, j, r, were determined appropriately, so that ARL in takes on a value as close to the nominal level as possible. It is evident that the proposed monitoring scheme performs better than the one established by Balakrishnan et al. [5] for all cases considered. The fact that the ARL out s that exhibit Chart 1 with 2-of-3 runs-type rule  Table 6.
Comparison of the AR out s with the same FAR for Chart 2.
are smaller than the respective ones of Chart 1 indicates its efficacy to detect faster the shift of the process from the in-control distribution.
Underlying distributions: exponential with mean equal to 2 (in-control) and 1 (out-of-control), respectively.
It goes without saying that the nonparametric control charts are robust in the sense that their in-control behavior remains the same for all continuous distributions. However, it is of some interest to check over their out-of-control performance for different underlying distributions. We next study the performance of the proposed Chart 1 enhanced with the 2-of-3 runs-type rule under normal distribution (θ, δ).
More specifically, the in-control reference sample is drawn from the standard normal distribution, while several combinations of parameters θ, δ have been examined. Table 5 reveals that the proposed Chart 1 with 2-of-3 runs-type rule is superior compared to the existing Chart 1 for almost all shifts of the location parameter θ and the scale parameter δ considered.
We next study the out-of-control performance of Chart 2 presented in Section 2. In Table 6, three different FAR levels and several values of the parameters m, n have been considered. For each choice, the AR values under two specific Lehmann alternatives corresponding to γ ¼ 0:4 and γ ¼ 0:2 are computed via simulation for both Chart 2 and Chart 2 enhanced with 2-of-2 runs-type rule. Table 6 clearly indicates that, under a common FAR, the proposed Chart 2 with the 2-of-2 runs-type rule performs better than Chart 2, with respect to AR values, in all cases considered. For example, calling for a reference sample of size m ¼ 500, test samples of size n ¼ 11, and nominal FAR ¼ 0:0027, the proposed Chart 2 with the 2-of-2 runs-type rule achieves alarm rate of 0.6099 (0.9724) for γ ¼ 0:4 (γ ¼ 0:2), while the respective alarm rate for Chart 2 is 0.4559 (0.9479).
Tables 7 and 8 shed more light on the out-of-control performance of the proposed Chart 2 enhanced with appropriate runs-type rules. More specifically, the schemes being under consideration are designed such as a nominal in-control ARL performance is attained. From the numerical comparisons carried out, it is straightforward that Chart 2 with 2-of-2 rule becomes substantially more efficient than Chart 2. Under Lehmann alternative with parameter γ = 0.5, the proposed chart exhibits smaller out-of-control ARL than Chart 2, and therefore it seems more capable in detecting possible shift of the process distribution.
Comparison of the ARL out s with the same ARL in for Chart 2.
In addition, Table 8 depicts the out-of-control ARL performance of Chart 2 under normal distribution. More precisely, several shifts of both location and scale parameter have been considered, and Chart 2 with 2-of-3 rule detects the underlying shift sooner than Chart 2 does in almost all cases examined.
Finally, Tables 9 and 10 present simulation-based comparisons of the nonparametric monitoring scheme Chart 3 with 2-of-2 runs rule versus Chart 3 established by Triantafyllou [7]. For delivering the numerical results displayed in Tables 9 and 10, the Lehmann alternatives have been considered as the out-of-control distribution.

Conclusions
In the present chapter, we investigate the in-and out-of-control performance of distribution-free control charts based on order statistics. Several runs-type rules are  Table 8.
Comparison of the ARL out s with the same ARL in for Chart 2 under normal distribution (θ, δ).
employed in order to enhance the ability of the aforementioned nonparametric monitoring schemes to detect possible shifts in distribution process. The AR and the ARL behavior of the underlying control charts is studied under several out-ofcontrol situations, such as the so-called Lehmann alternatives and the exponential or the normal distribution model. The numerical experimentation carried out depicts the melioration of the proposed schemes with the runs-type rules. It is of some research interest to branch out the incorporation of such runs rules (or even more complicated) to additional nonparametric control charts based on well-known test statistics.  Table 9.
Comparison of the AR out s with the same FAR for Chart 3.

Author details
Ioannis S. Triantafyllou Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece *Address all correspondence to: itriantafyllou@uth.gr © 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Each cell contains the AR's attained for Y = 0.5 (upper entry) and Y = 0.2 (lower entry). Table 10.
Comparison of the ARL out s with the same ARL in for Chart 3.