Evaluation of the Long-Term Stability of Metrology Instruments

This chapter aims to emphasize the issue of the long-term stability of instruments used in metrology. This issue is a concern mentioned in the IEC/ISO17025:2017 standard and the JCGM100:2008 guide. Control charts are mentioned in these key documents as tools to assess whether a measurement process is under statistical control or not. Control charts (Shewhart charts, CUSUM chart, EWMA chart) are introduced and tested with simulated and real datasets from metrology instruments that operate at the ionizing department of the BIPM. The interest and the limits of such statistical analysis are discussed. They take their basis in a measurement model composed of Gaussian white noise. Although a measurement monitored over a relatively short period may be consistent with this model, it has been observed that the autocorrelation of the measurement data acquired over a long period limits the relevance of control charts. In this case, time series analysis seems more appropriate than conventional control charts. As an illustration, an optimal Bayesian smoother is introduced to demonstrate how to deconvolve the low-frequency random noise and refine the evaluation of uncertainty according to the measurement model for long-term measurement.


Introduction
Long-term reproducibility in experimental sciences is a critical metrological topic recently discussed in [1]. In laboratories, the quality system is the frame in which such an issue is controlled. It is notably required in the ISO/IEC 17025 [2] that laboratories must ensure the validity of their results. For this, monitoring of the state of operating of a measurement system must be set up by periodic checks and statistical analysis of the obtained process control value.
Techniques to assess the reproducibility of a process is a matter of concern since the industrial era. An important step has been taken in 1924 with the invention of control charts by Walter Andrew Shewhart (Bell Telephone). Control charts are efficient tools to control if a process is under statistical control and is now wildly used in many domains and recognized by the Project Management Institute as one of the seven basic tools of quality [3]. Moreover, control charts are explicitly proposed in ISO/IEC 17025 [2] and the JCGM100:2008 Guide [4] as tools to implement to analyze periodic checks using control standards.

Introduction to control charts 2.1 The x-chart
Periodical control using a check standard consists of periodically reproduce the measurement x i at the time t i and forms the measurement set x 1 , x 2 , … , x i , … , x N ð Þ where x N is the last recorded point. In the x-chart, a central line is drawn by the moving empirical average x N over a period of 31 samples such as , for N > 31: The control limits (LCL and UCL) are given by the multiple of the standard deviations: s , for N > 31, 8 > > > > > > < > > > > > > : (2) such as where the multiplication factor k is set equal to 2 for the first control level and 3 for the second control level.
Failure criteria are defined according to a white Gaussian model that will be introduced later. It is important to note that the relevance of the analysis is low when the number of measures N < 10, relevant for 10 ≤ N ≤ 31, and good for N ≥ 31. Some criteria can be founded in the ISO 7870 standard [5]. As an example, the following rules are proposed where a measurement process is not considered under statistical control when: • more than 0.35% of the points are above or below the control limits with the multiplication factor k ¼ 3, • more than 20% of the points are above or below the control limits with the multiplication factor k ¼ 2, • more than 9 points are in a row one side of x N , • more than 6 points are in a row increasing or decreasing, • more than 14 points are in a row alternating between increase and decrease.

The R-chart
In the R-chart, the plotted points R i are the derivatives of the measurements: The central line and two levels of control limits are obtained in the same manner as for the x-chart by calculating the moving empirical average and standards deviations of the values R i . Also, the same failure criteria can be applied.

The CUSUM chart
The cumulative of deviations C i is obtained by the following recursive formula: In this chart, no central line is displayed but two control limits V i are defined by the so-called V-mask: where k ¼ 0:1 and d ¼ 147:22. Any points out of the V-mask mean that a significant drift of the process control value is detected.

The EWMA chart
The EWMA chart is obtained by plotting the exponentially smoothed data points z i by using the following recursive formula: And where λ is a smoothing parameter. The control limits CL are defined by With the smoothing parameter λ ¼ 2=N. Any point out of these control limits means that a significant drift of the process control value is detected.
It must be noted that these two families of control charts, the Shewhart charts on the one hand and the CUSUM and EWMA charts on the other hand, are complementary.
Indeed, the Shewhart charts aim to detect only large shifts above 1:5 s x i ð Þ (detection of outliers), whereas the CUSUM and EWMA charts aim to detect only small shifts below 1:5 s x i ð Þ and more generally drifts in the process. In other words, Shewhart charts perform an analysis in the high-frequency domain, whereas the CUSUM and EWMA charts are more suitable to analyze in the low-frequency domain.

The Gaussian white noise model
N ¼ 100 normally distributed data points x i are simulated using the random generator from the numpy.py library.
The model of the simulated process is the following: with μ i : the expectation parameter at the time t i , N : the normal distribution, and σ 2 i : the variance parameter of the measurement noise ε i at the time t i . It is referred below as a Gaussian white noise model.
The reference dataset is defined by setting, for all i ∈ ⟦1, N⟧, the parameters μ i ¼ 0 and σ 2 i ¼ 1. The control charts are presented below, and the tests conclude-as expected-that the simulated process is under statistical control (Figures 1-4).

The Gaussian white noise model with a deterministic drift
Now, a slight positive drift is considered in the following model referred to as a deterministic drift model in a Gaussian white noise such as the expectation parameter takes values such as: This leads to a detection of this trend in the x-chart because more than nine points are in a row on one side of the central line. Also, the CUSUM chart and the EWMA chart detect the trend with some points out of their control limits (Figures 5-8).

The autocorrelated model
The third model studied here is the autocorrelated model (or the colored noise model) where a low-frequency Brownian noise is introduced in the following autocorrelated model: where the measurement noise is ε i $ N 0, σ 2 i À Á (idem as in Eq. (9)) and a process noise δ i $ N 0, τ 2 i À Á is added with a variance parameter τ 2 i . In the example below,     Again, this type of deviation from the Gaussian white noise model is detected by the value chart with more than nine points in a row on one side of the central line and by CUSUM and EWMA charts (Figures 9-12).
These three examples underline the fact that control charts are-by definition-build to verify whether a dataset fits with a Gaussian white noise model or not. Any deviation from this model is efficiently revealed by control charts. An operator could therefore conclude by the fact that the process is not under statistical control. This conclusion is good in the case of deterministic trends that are that a metrology system seeks to avoid maintaining the references that it produces. However, it has been demonstrated that the autocorrelated noise could also lead to an alert by control charts. On the contrary to the deterministic drift, the latter is a natural random process that appears in every system, which is observed on enough long-term periods. This particular limit of control charts was already pointed out by many authors [7][8][9][10]. This issue is an emphasis in the real cases presented below.

Test of control charts with real data
Two data are here considered. The first one comes from the periodic control of a recent metrology system that operates at the BIPM for radionuclide metrology and is called the ESIR (extended SIR) [11,12]. The second example comes from the periodic control of a metrology service, the SIR, which operates at the BIPM for radionuclide metrology since 1976 [13].

Test with 1-year period data
The ESIR is under development, and the periodical control using check standards has been set up for 1 year. Figures 13-16 show the analysis of the measured data by control charts. The monitored value is a comparison indicator obtained by using a check source of 14 C. The analysis from the four-control charts leads to the conclusion that the process is under statistical control. In other words, the measurement model complies with a Gaussian white noise model. At least, it is true for this short period of observation. The activity estimation is of the 14 C check source is stable and equal to 24:3373 AE 0:0184 A.U. There is no unit associated with the  measurand because the aim of the ESIR is only to deliver a measure proportional to the activity but is very reproducible during the time. The relative standard uncertainty associated with this measurement is about 0.08%.

Test with 20-year period data
The SIR produces international references for decades. The 226 Ra source is used as a control standard to monitor the reproducibility of the measured current. In this case, the control charts are applied to a 20-year period of data acquisition. They have detected that the process is not under statistical control. However, these low-frequency fluctuations could certainly be a random colored noise and not a deterministic drift. This phenomenon appears when measurement values are observed over a very long period such as here. The only conclusion that can be drawn from control charts is that the data do not fit with the Gaussian white noise model and that a type A evaluation of the uncertainty associated with the measurement values (from t 1 to t N ) cannot be applied. Nevertheless, it is no need to refine     the measurement model for the SIR measurement because the quantity delivered by the system is a ratio of a punctual current measurement (at a time t i ) from which this low-frequency component revealed here is suppressed by the method (Figure 17-20).
We have underlined, through these simulated and real datasets, the limitation of control charts to deal with the evaluation of the long-term stability of metrology instruments. In this case, time series analysis could be used as more appropriate tool to address this objective.
An example of time series analysis is introduced below.

Introduction to time series analysis applied to a long-term measurement model
It has been seen that the autocorrelated model introduced in Eq. (10) is appropriate to represent measurement data when observed over a long-term period.  Eq. (10) is here rewritten in a state-space model where the measurement is decomposed by the following: • the fluctuation of the expectation value x 1 , … , x i , … , x N ð Þ with the lowfrequency random noise at measurement times t 1 , … , t i , … , t N ð Þ , • and the measurements gathered in the vector y 1 , … , y i , … , y N À Á where the Gaussian white noise is finally added to The model becomes, The drift velocity _ Þdescribes a random walk parametrized by a variance τ 2 . And the measurement noise ε i $ N 0, σ 2 ð Þis Gaussian white noise with a variance σ 2 . The idea behind time series analysis is to deconvolve the low-frequency fluctuation of the expectation values x 1 , … , x i , … , x N ð Þ from the observations y 1 , … , y i , … , y N À Á . At first, the measurement variance σ 2 is evaluated by using the median of all the punctual type A evaluation of the standard deviation s y i À Á which are (by definition) isolated from the dynamic component. with, is the set of measurements obtained at the time t i during a period that allowing type A evaluation according to the Gaussian white noise model. Here, control charts can be used to verify this hypothesis. Eq. (12) is transformed in the following matrix form: with the "state vector"

The Kalman filter
Given the Gaussianity of the noises (both dynamic noise δ i and measurement noise ε i ) of this linear model (in the sense of time series), the Kalman filter (KF) provides an optimal estimation of the states X i given the current and past measurements y 1:i . The optimality of KF to solve Eq. (15) is established through both leastsquare interpretation and the probabilistic Bayesian analysis [14,15]. Hence, KF is a recursive Bayesian filter where the states X i and the measurements y i are Markov processes implying KF evaluates, at each time step i, the parameters of the Gaussian posterior distribution p X i jy 1:i À Á in two steps: • The first step consists of predicting the state X i given past measurement data y 1:iÀ1 . The posterior distribution p X iÀ1 jy 1:iÀ1 À Á at t iÀ1 is used to build a prior distribution at the time t i according to the linear model (see Eq. (15)) from t iÀ1 to t i such as • The second step aims to update the posterior distribution at t i given the predicted prior p X i jy 1:iÀ1 À Á , and the likelihood p y i jX i À Á of the measurement y i given the estimate X i .
The posterior state estimateX i|i and the posterior of the covariance P i|i of the posterior distribution p X i jy 1:i À Á are evaluated through the Kalman filtering process described below.
From i ¼ 1 to i ¼ N, KF predicts the state X i and the measurement y i at the time step i given the previous state at i À 1: With the state and measurement covariances, P i and S i such that: Then, the pre-fit residual (innovation) e i of the estimation and the Kalman gain K i are calculated by Finally, the updated state estimation iŝ with I: the identity matrix.

The Rauch-Tung-Striebel smoother
The KF estimates the statesX i|1:i given present and past observations from t 1 to t i . To refine the estimation, a KF backward pass (smoothing pass) processes the filtered estimates data from t N to t 1 . This procedure is called the Rauch-Tung-Striebel smoother (RST) [16,17]. Symmetrically to the forward KF optimal filter, the optimal smoothing also consists in: • a prediction step where the state X iþ1 is estimated given measurement data y 1:i and the forecasted state backward transition according to the linear model from t iþ1 to t i .
• The second step aims to update the posterior distribution at t i given all the data p X i jy 1:N À Á .
The RTS smoother calculates the Maximum A Posteriori (MAP) estimateX i|1:N of the posterior probability p X i jy 1:N À Á given all the observations (past, present, and future) from t 1 to t N .
From i ¼ N to i ¼ 1, the residual e 0 i between the filtered statex iþ1|i and the predictionx iþ1|1:N is used to calculate a new Kalman gain G i such as The optimal state estimation is finally given bŷ

The auto-tuning of the RTS smoother
As the measurement noise variance σ 2 is evaluated, by definition, through measurement repeated during a period free from long-term fluctuation (Eq. (17)), the variance of the dynamic noise τ 2 is the remaining unknown parameter of the model. The auto-tuning procedure proposed here to evaluate τ starts with a loss function L defined by The valueτ minimizing L τ ð Þ aims to minimize the residual between the measurements y i and the estimations of the measurementsŷ i|iÀ1 as well as the distance between two consecutive filtered estimationsx i|1:N andx iÀ1|1:N . On the one hand, if this last smoothing constraint is not considered, τ will diverge toward high values describing a totally unstable system. On the other hand, if the prediction error of the model is not considered, τ will be canceled by describing a stable system with constantx i|i values. So, the loss function of Eq. (27) permits to produce a model of measurement of a controlled process containing a low-frequency dynamic noise. The variance of the low-frequency dynamic noise is determined through an iterative procedure minimizing L τ ð Þ such that, The auto-tune RTS smoother provides filtered valuesx i|1:N where the measurement noise, ε i , is removed given the knowledge of the variances of the measurement noise σ 2 (see Eqs. (13) and (14)) and the dynamic noiseτ 2 . So, the global hidden uncertainty component from the long-term process random noise can be evaluated from the deconvolved datax i|1:N such that, Thanks to the intrinsic unfolding of the measurement noise (Gaussian white) and the process noise (Gaussian red) defined in the dynamic model (Eq. 12), the standard uncertainty u x Nþ1 ð Þof a new measurement at t Nþ1 can be obtained by the quadratic summation of the type A uncertainty evaluation s x i Nþ1 and the type B uncertainty evaluation evaluated using all historical data y 1 , … , y i , … , y N À Á through the above-described auto-tunned RTS smoother.
Therefore, the standard uncertainty u 2 x Nþ1 ð Þof the next measurement at t Nþ1 should be inflated considering the impact of the dynamic noise such that, With, This auto-tuning of a RTS smoother applied to measurement data recorded over a long-term period fully address the recommendation of the JCGM100:2008 guide [4]: "Because the mathematical model may be incomplete, all relevant quantities should be varied to the fullest practicable extent so that the evaluation of uncertainty can be based as much as possible on observed data. Whenever feasible, the use of empirical models of the measurement founded on long-term quantitative data, and the use of check standards and control charts that can indicate if a measurement is under statistical control, should be part of the effort to obtain reliable evaluations of uncertainty. The mathematical model should always be revised when the observed data, including the result of independent determinations of the same measurand, demonstrate that the model is incomplete. A well-designed experiment can greatly facilitate reliable evaluations of uncertainty and is an important part of the art of measurement."

Application of time series analysis on simulated and real measurement data
The time series analysis through the auto-tuned RST smother is presented in Figure 21 for the simulated dataset from Section 3.3 obtained with the autocorrelated model. Black points are the measurement points y i with error bars representing the standard deviation s y i À Á . Green points are the estimations of the low-frequency fluctuations (red noise) of the measurementx i with error bars representing the standard deviation sx i ð Þ. The red line is the global expectation value where the thickness of the line represents +/À 1 standard deviation.
It can be seen the good deconvolution of the low-frequency random noise shown in green points. This leads to a standard uncertainty from this autocorrelated noise, sx i|1:N À Á ¼ 1:31: This value is significant when compared to type A evaluation of individual measurement points σ 2 ¼ 1:74. Also, the time series analysis through the auto-tuned RST smother is presented in Figure 22 when applied to the periodic control of the SIR metrology system. Again, black points are the measurement points y i with error bars representing the standard deviation s y i À Á . Green points are the estimations of the low-frequency fluctuations (red noise) of the measurementx i with error bars representing the standard deviation sx i ð Þ. The red line is the global expectation value where the thickness of the line represents +/À 1 standard deviation.
The analysis using control charts (see Section 4.2) has led to conclude that the system is not under statistical control although autocorrelated noise is expected for  such a long-term period of monitoring. The optimal Bayesian smoother permits to properly deconvolve this component (see green points) and to evaluate the standard uncertainty associated with it: sx i|1:N À Á ¼ 0:0183. This value is significant when compared to type A evaluation of individual measurement points σ 2 ¼ 0:025.

Conclusion
It has been emphasized through simulated and experimental data that control charts (Shewhart, CUSUM, EWMA charts) are convenient tools to assess whether a measurement system is under statistical control or not. They test the compliance of the measured dataset with a Gaussian white noise model. It can notably be used to determine the possibility to apply a type A evaluation of uncertainty on the measurement data.
However, care should be taken when monitoring over long periods of time. A low-frequency random noise will inevitably appear. Despite its stochastic nature, the latter will induce detection by control charts. In this case, it is not appropriate to conclude by a loss of control of the system but rather to re-evaluate the measurement model to include this long-term component. To that end, time series analysis could be implemented to better deal with this particular case of measurement data.
An auto-tuned Rauch-Tung-Striebel smoother, based on the optimal Bayesian filter called Kalman filter, is introduced to illustrate how time series analysis can help to deconvolve the components of the long-term measurement model. The lowfrequency random fluctuation (aka red noise) can be estimated and used to evaluate the standard uncertainty component it induces on the measurement.