Values of the calibration certificate of a thermometer.
This chapter presents and explains the most used methodologies for the evaluation of measurement uncertainty in metrology with practical examples. The main topics are basic concepts and importance, existing documentation, the harmonized methodology of the Guide to the Expression of Uncertainty in Measurement, types of uncertainty, modeling of measurement systems, use of alternative methods (including the GUM supplement 1 Monte Carlo numerical method), evaluation of uncertainty for calibration curves, correlated uncertainties, uncertainties arising from the calibration of instruments, and the main proposals for the new revised GUM. The chapter also discusses the GUM as a tool for the technical management of measurement processes.
- Monte Carlo
Measurement uncertainty is a quantitative indication of the quality of measurement results, without which they could not be compared between themselves, with specified reference values or to a standard. Uncertainty evaluation is essential to guarantee the metrological traceability of measurement results and to ensure that they are accurate and reliable. In addition, measurement uncertainty must be considered whenever a decision has to be taken based on measurement results, such as in accept/reject or pass/fail processes.
Considering the context of globalization of markets, it is necessary to adopt a universal procedure for evaluating uncertainty of measurements, in view of the need for comparability of results between nations and for mutual recognition in metrology. As an example, laboratories accredited under the ISO/IEC 17025:2017 standard  need to demonstrate their technical competence and the ability to properly operate their management systems, and so they are required to evaluate the uncertainty for their measurement results.
In addition, the use of uncertainty evaluation methods as a tool for technical management of measurement processes is extremely important to reduce, for example, the large number of losses that occurs in the industry, which can be highly significant in relation to the gross domestic product (GDP) of some countries. One of the probable causes of the waste can be attributed to instruments whose accuracy is inadequate to the tolerance of a certain measurement process.
In this chapter, detailed steps for uncertainty evaluation are given.
2. Main references for uncertainty evaluation
In order to harmonize the uncertainty evaluation process for every laboratory, the Bureau International des Poids et Mesures (BIPM) published in 1980 the Recommendation INC-1  on how to express uncertainty in measurement. This document was further developed and originated the “Guide to the Expression of Uncertainty in Measurement”—GUM in 1993, which was revised in 1995 and lastly in 2008. This document provides complete guidance and references on how to treat common situations on metrology and how to deal with uncertainties in metrology. Currently, it is published by International Organization for Standardization (ISO) as the ISO/IEC Guide 98-3 “Uncertainty of measurement—Part 3: Guide to the expression of uncertainty in measurement” (GUM), and by the Joint Committee for Guides in Metrology (JCGM) as the JCGM 100:2008 guide . The JCGM was established by BIPM to maintain and further develop the GUM. They are in fact currently producing a series of documents and supplements to accompany the GUM, four of which are already published [4, 5, 6, 7].
Evaluation of uncertainty, as presented by the JCGM 100:2008, is based on the law of propagation of uncertainties (LPU). This methodology has been successfully applied for several years worldwide for a range of different measurement systems and is currently the most used procedure for uncertainty evaluation in metrology. However, since its twentieth anniversary in 2013, JCGM decided to revise it again [8, 9, 10]. In this new revision, uncertainty terms and concepts  will be aligned with the current International Vocabulary of Metrology (VIM)  and with the new GUM supplements [5, 6]. Aspects such as a new Bayesian approach, the redefinition of coverage intervals and the elimination of the Welch-Satterthwaite formula to evaluate the effective degrees of freedom will be covered . In late 2014, a first draft of the newly revised version of the GUM was circulated among National Metrology Institutes. Remarkable changes were made that could affect the way laboratories deal with the measurement uncertainty results. This revision is still being discussed, and some information about it has also been released elsewhere .
In the field of analytical chemistry, there is also another document worth mentioning that is the “Quantifying Uncertainty in Analytical Measurement” guide , produced by a joint EURACHEM/CITAC Measurement Uncertainty Working Group. This document was first published in 1995 and further revised in 2000 . This last edition had a widespread implementation and is among the most highly cited publications in chemical metrology area . Recently, a new revised edition was published in 2012 with improved content and added information on developments in uncertainty evaluation . This document basically presents the uncertainty evaluation process following the suggestions of the GUM, but also contains several examples in the analytical chemistry area.
3. Using the GUM approach on uncertainty evaluation
The following main steps summarize the methodology presented by the GUM.
3.1. Definition of the measurand and of input quantities
It must be clear to the analyst which quantity will be the final object of the measurement in question. This quantity is known as the measurand. In addition, it is important to identify all the variables that directly or indirectly influence the measurand. These variables are known as the input quantities. As an example, Eq. (1) shows a measurand as a function of three different input quantities: , , and
3.2. Modeling the measurement process
In this step, the measurement procedure should be modeled in order to have a functional relationship expressing the measurand as a result of all the input quantities. The measurand in Eq. (1) could be modeled, for example, as in Eq. (2)
The modeling step is critical for the uncertainty evaluation process as it defines how the input quantities impact the measurand. The better the model is defined, the better its representation of reality will be, including all the sources that impact the measurand on the uncertainty evaluation. The modeling process can be easily visualized by using a cause-effect diagram (Figure 1).
Example: To illustrate these steps, let us consider a measurement model for a torque test. Torque is a quantity that represents the tendency of a force to rotate an object about an axis. It can be mathematically expressed as the product of a force and the lever-arm distance. In metrology, a practical way to measure it is by loading a known mass to the end of a horizontal arm while keeping the other end fixed (Figure 2).
Note: This example is also presented, with a few adaptations, in other publications by the same authors .
A simple model that describes this experiment can be expressed as follows:
where is the torque (N.m), is the mass of the applied load (kg), is the local gravity acceleration (m/s2), and is the total length of the arm (m). Thus, , , and are the input quantities for this model. This example will be further discussed in the subsections ahead.
3.3. Evaluating the uncertainties of the input quantities
This step is also of great importance. Here, uncertainties for all the input quantities are individually evaluated. The GUM classifies uncertainty sources as being of two main types: Type A, which usually originates from some statistical analysis, such as the standard deviation obtained in a repeatability study; and Type B, which is determined from any other source of information, such as a calibration certificate or deduced from personal experience.
Type A uncertainties from repeatability studies are evaluated by the GUM as the standard deviation of the mean obtained from the repeated measurements. For example, if a set of indications about a quantity are available, the uncertainty due to repeatability of the measurements can be expressed by as follows in Eq. (4):
where is the mean value of the repeated measurements, is its standard deviation, and is the standard deviation of the mean. As such, the statistical distribution associated with this input source is considered to be normal or Gaussian.
Note: This evaluation is not consistent with the GUM supplement 1 , where repeated indications are treated as Student’s t-distributions to account for the lack of degrees of freedom or a low number of indications. In this way, the new proposal for the draft GUM is to consider repeated indications as t-distributions, just like in supplement 1. Therefore, its uncertainty would be evaluated as in Eq. (5). This equation takes the degrees of freedom for the indications () into account, raising the uncertainty for a low number of indications. This correction would then be in accordance with the approach suggested by the other GUM supplements for this type of uncertainty
It is important to note that the evaluation of uncertainties of Type B input sources must be based on careful analysis of observations or in an accurate scientific judgment, using all available information about the measurement procedure. This uncertainty type is generally used when repeated experiments would not be possible, not available, or would be too costly or time-consuming. In this case, the GUM also suggests the use of two more types of statistical distributions: the uniform and the triangular distributions.
The uniform distribution should be used when only a range of values are available, that is, an interval with the minimum and maximum values, and no detailed information about the probability of values within this interval is available. The standard uncertainty associated with such an interval is given by Eq. (6):
where b is the maximum and a is the minimum values for the range. For example, if the only information about the room temperature of a laboratory is known to be °C, then °C and the standard uncertainty associated with the room temperature would be evaluated as °C °C.
The triangular distribution can be used when there is a strong evidence that the most probable value lies in the middle of a given interval, but still without knowing exactly how this probability behave within the interval. In chemistry, for example, the uncertainty associated with the volume of a measuring flask could be evaluated by a triangular distribution. The standard uncertainty for a triangular distribution is given by Eq. (7):
where is the semi-interval for the total range of the triangular distribution.
Another common Type B source of uncertainty is due to calibration certificates, related to a standard or to a calibrated instrument. In this case, the standard uncertainty to be used is normally obtained by dividing the expanded uncertainty by the coverage factor , both provided by the calibration certificate (Eq. (8))
Example: Returning to the example of torque measurement and considering the model defined in Eq. (3), the following sources of uncertainty are considered:
Mass (). The mass was repeatedly measured 10 times in a calibrated balance. The average mass was 35.7653 kg, with a standard deviation of 0.3 g. This source of uncertainty is purely statistical and is classified as being of Type A according to the GUM. The standard uncertainty () that applies in this case is obtained by Eq. (4), that is, kg.
In addition, the balance used for the measurement has a certificate stating an expanded uncertainty for this range of mass of = 0.1 g, with a coverage factor = 2 and a coverage probability of 95%. The uncertainty of the mass due to the calibration of the balance constitutes another source of uncertainty involving the same input quantity (mass). In this case, the standard uncertainty () is calculated by using Eq. (8), that is, kg.
Local gravity acceleration (). The value for the local gravity acceleration is stated in a certificate of measurement as 9.80665 m/s2, as well as its expanded uncertainty of = 0.00002 m/s2, for = 2 and = 95%. Again, Eq. (8) is used to calculate the standard uncertainty (), that is, m/s2.
Length of the arm (). Let us suppose that in this hypothetical case, the arm used in the experiment has no certificate of calibration, indicating its length value and uncertainty, and that the only measuring method available for the arm’s length is by the use of a ruler with a minimum division of 1 mm. The use of the ruler leads then to a measurement value of 2000.0 mm for the length of the arm. However, in this situation, very poor information about the measurement uncertainty of the arm’s length is available. As the minimum division of the ruler is 1 mm, one can assume that the reading can be done with a maximum accuracy of up to 0.5 mm, which can be thought as an interval of 0.5 mm as limits for the measurement. As no information of probabilities within this interval is available, the assumption of a uniform distribution is the best option, on which there is equal probability for the values within the whole interval. Thus, Eq. (6) is used to determine the standard uncertainty (), that is, .
In practice, one can imagine several more sources of uncertainty for this experiment, like, for example, the thermal dilatation of the arm as the room temperature changes. However, the objective here is not to exhaust all the possibilities, but instead to provide basic notions of how to implement the methodology of the GUM on a simple model.
3.4. Propagation of uncertainties
3.4.1. The law of propagation of uncertainties
The GUM uncertainty approach is based on the law of propagation of uncertainties (LPU). This methodology encompasses a set of approximations to simplify the calculations and is valid for a range of simplistic models.
According to the LPU, the propagation of uncertainties is accomplished by expanding the measurand model in a Taylor series and simplifying the expression by considering only the first-order terms. This approximation is viable as uncertainties are very small numbers compared with the values of their corresponding quantities. In this way, the treatment of a model where the measurand is expressed as a function of variables , …, (Eq. (9)) leads to the general expression for the propagation of uncertainties shown in Eq. (10)
where is the combined standard uncertainty for the measurand and is the uncertainty for the ith input quantity. The second term of Eq. (10) is due to the correlation between the input quantities. If there is no supposed correlation between them, Eq. (10) can be further simplified as
The partial derivatives of Eq. (11) are known as sensitivity coefficients and describe how the output estimate varies with changes in the values of the input estimates . It also converts the units of the inputs to the unit of the measurand.
Another important observation regarding the sensitivity coefficient occurs when the mathematical model that defines the measurand does not contemplate a given quantity, known as influence quantity. In this case, the determination of the sensitivity coefficient of the measurand in relation to the input quantity must be done experimentally. For example, biodiesel is susceptible to oxidation when exposed to air, and this oxidation process affects fuel quality. The oxidation time is determined by measuring the conductivity of an oil sample when inflated with air at a given flow rate. There are a number of influence quantities that impact the oxidation time of biodiesel such as temperature, air flow, conductivity, sample mass, and so on. In this case, the sensitivity coefficients for oxidation time with respect to each of these influence quantities are determined from an interpolation function obtained with experimental data. For example, Figure 3 presents the table and its resulting graph, which shows the model of the function that relates the oxidation time to the temperature of a biofuel sample (case study of the authors).
Example: On returning to the torque measurement example, assuming that all the input quantities are independent, the combined standard uncertainty for the torque is calculated by using the LPU (Eq. (11)). The final expression is then
It is important to note that the terms (not squared) of Eq. (12), that is, each sensitivity coefficient multiplied by its corresponding uncertainty, are known as uncertainty components. These components can be compared to each other as they are in the same units of the measurand. Figure 4 shows the comparison between the uncertainty components for the torque measurement model.
As can be noted, the dominant uncertainty component is due to the uncertainty associated with the measurement of the arm length, which was taken as the resolution of the non-calibrated ruler used in the measurement. This analysis shows to the analyst that, to reduce the final uncertainty and improve the measurement system, a calibrated ruler, with a better uncertainty, should be used. This represents the importance of the GUM as a management tool to the measurement process.
3.4.2. The Kragten method
The Kragten method is an approximation that facilitates the calculation of the combined uncertainty using finite differences in place of the derivatives . This approximation is valid when the uncertainties of the inputs are relatively small compared to the respective values of the input quantities, generating discrepancies in the final result in relation to the LPU that occur in decimals that can be ignored.
Assuming a measurand , which is calculated from the input quantities , and according to the mathematical model of Eq. (2), the uncertainties , and for the input quantities are evaluated normally, according to methodologies already explained previously. From there, the calculations of the measurand are performed individually for each input magnitude (, and ) so that each time their respective values are added with their uncertainties, as shown in Eqs. (13)–(15)
The value of the measurand varies for due to the addition of the uncertainty to the value of its respective input quantity. Thus, the uncertainty component of each input source in the unit of the measurand is defined by the difference , according to Eqs. (16)–(18)
Thus, the combined standard uncertainty of is finally evaluated as
or by Eq. (20), if there are correlated uncertainties
where is the correlation coefficient between and .
3.5. Evaluation of the expanded uncertainty
The result provided by Eqs. (10) and (11) corresponds to an interval that contains only one standard deviation (or approx. 68.2% of the measurements for a normal distribution). In order to have a better coverage probability for the result, the GUM approach expands this interval by assuming that the measurand follows the behavior of a Student’s t-distribution. An effective degrees of freedom for the t-distribution can be obtained by using the Welch-Satterthwaite formula (Eq. (21))
where is the degrees of freedom for the ith input quantity.
The effective degrees of freedom is used to obtain a coverage factor that depends also of a chosen coverage probability , which is often 95%. The expanded uncertainty is then evaluated by multiplying the combined standard uncertainty by the coverage factor that finally expands it to a coverage interval delimited by a t-distribution with a coverage probability (Eq. (22))
Note: The draft for the new GUM proposal suggests that the final coverage interval cannot be reliably determined if only an expectation and a standard deviation are known, mainly if the final distribution deviates significantly from a normal or a t-distribution. Thus, they propose distribution-free coverage intervals in the form of, with : (a) if no information is known about the final distribution, then a coverage interval for the measurand for coverage probability of at least is determined using . If , a coverage interval of is evaluated. (b) If it is known that the distribution is unimodal and symmetric about , then and the coverage interval would correspond to a coverage probability of at least .
Example: The effective degrees of freedom for the torque measurement example is calculated using Eq. (21). As the number of degrees of freedom for Type B uncertainties is considered infinite, only Type A uncertainties are accounted. In this case,
Using t-distribution tables, the coverage factor for this value of and = 95% is = 1.96. Therefore, the expanded uncertainty is calculated as , and the measurement result is expressed as 668.0 ± 0.2 N m. The GUM recommends that the final uncertainty should be expressed with one or at most two significant digits.
4. Calibration curve and correlated uncertainties
One of the most valuable tools for the metrologist is the calibration curve. It is widely used in measurement systems on which one cannot directly obtain the property value to be measured of an object. Instead, a response from the system is measured. In this way, a calibration curve is used to correlate the response from the system with well-known property values, usually calibration standards (see Figure 5).
With a calibration curve in hands, the property value for a new unknown sample can be directly determined by using the equation for the fitted curve, which is usually adjusted by a linear regression. However, the calibration curve contains errors due to the lack of fit for the experimental data, causing an uncertainty source to arise. Thus, when evaluating the uncertainty for a predicted property value of corresponding to a new observation (for a new unknown sample, for example), the LPU with correlation terms is applied on the linear regression model in the form of Eq. (24). Eq. (25) represents the model for a predicted value corresponding to a new observed value , in the case of the inverse process
where and are, respectively, the intercept and the slope parameters of the linear regression. The application of the LPU with the correlation term to Eqs. (24) and (25) leads to Eqs. (26) and (27), respectively, for both cases:
For Eq. (26), is the combined uncertainty for the predicted value and is the uncertainty for the new observed response . For Eq. (27), is the combined uncertainty for the predicted value and is the uncertainty for the new observed response . In both cases, and are, respectively, the uncertainties for the intercept and the slope, and is the correlation coefficient between and . These last equations can also be found in the ISO/TS 28037 , concerning the use of straight-line calibration functions.
where is the number of points used to construct the curve, are the values for the independent variable of the linear equation for each and is the residual variance of the fitted curve, obtained by Eq. (31)
where are the interpolated values in the fitted curve for each , that is, .
Example: This time, let us consider that the calibration certificate of a thermometer presents the results shown in Table 1.
|Indication (xi) (°C)||Reference value (yi) (°C)|
For the data shown in Table 1, the calibration curve of the thermometer is expressed by . For a temperature value indicated by the thermometer of = 22°C, applying the equation of the calibration curve yields a reference value of = 22.22°C.
Considering that there is no uncertainty for the observed point = 22°C, that is, = 0, the uncertainty of arising from the interpolation process of the point = 22°C can then be calculated by applying Eq. (27) and the data from Table 2, resulting in the following: °C.
where is the residual standard deviation of the fitted line, is the number of observations of , is the number of points composing the calibration curve, and is the average value obtained from the observations of . In this expression, the uncertainty component due to the observations of is evaluated by 
However, Hibbert  suggests that if the standard deviation of the indications is known from consistent data, can be better evaluated by
5. Assessment of uncertainty in instrument calibration
The methodology presented in the GUM can also be used to evaluate the uncertainty in the calibration of a measuring instrument. Following the steps of the GUM, the measurand for the model used in the calibration must be defined by the quantity that evaluates the systematic error of an instrument over its entire measurement range. Thus, Eq. (36) can be generally used for the evaluation of uncertainty in a calibration process:
where is the systematic error of the instrument for a fixed range, is the value indicated by the instrument, and is the reference value corresponding to the indicated value.
The sources of uncertainty in this case involve the repeatability of indicated values, the resolution of the instrument in calibration, and the certificate of calibration of the reference values. Thus, an evaluation of the uncertainty about the systematic error should be done for each nominal value of the instrument in calibration. The combined standard uncertainties for each calibrated nominal value are obtained by applying the LPU, as shown in Eq. (37)
where , and are, respectively, standard uncertainties due to resolution of the instrument, repeatability of indication values, and certificate of calibration of the reference. These standard uncertainties are obtained as described in Section 3.
The final calibration result can then be presented according to Table 3. In addition, correction values or systematic errors can also be reported.
|Range||Indicated value||Reference value||Expanded uncertainty||Coverage factor|
6. Monte Carlo simulation applied to metrology
This section presents the limitations of the GUM and shows an alternative methodology based on the propagation of distributions that overcome those limitations. For further details, please refer to the authors’ publication that addresses the use of the Monte Carlo methodology applied to uncertainty in measurement  or to the JCGM 101:2008 guide . Also, in the field of analytical chemistry, the latest version of EURACHEM/CITAC guide (2012) was updated with procedures to use Monte Carlo simulations .
6.1. Limitations of the GUM approach
As mentioned earlier, the approach to evaluate measurement uncertainties using the LPU as presented by the GUM is based on some approximations that are not valid for every measurement model [5, 20, 21, 22]. These approximations comprise (1) the linearization of the measurement model made by the truncation of the Taylor series, (2) the use of a t-distribution as the distribution for the measurand, and (3) the calculation of an effective degrees of freedom for the measurement model based on the Welch-Satterthwaite formula, which is still an unsolved problem . Moreover, the GUM approach usually presents deviated results when one or more of the input uncertainties are relatively much larger than others, or when they have the same order of magnitude than its quantity.
The limitations and approximations of the LPU are overcome when using a methodology that relies on the propagation of distributions. This methodology carries more information than the simple propagation of uncertainties and generally provides results closer to reality. It is described in detail by the JCGM 101:2008 guide (Evaluation of measurement data—Supplement 1 to the “Guide to the expression of uncertainty in measurement”—propagation of distributions using a Monte Carlo method) , providing basic guidelines for using Monte Carlo numerical simulations for the propagation of distributions in metrology. This method provides reliable results for a wider range of measurement models as compared to the GUM approach and is presented as a fast and robust alternative method for cases where the GUM approach does not present good results.
6.2. Running Monte Carlo simulations
The propagation of distributions as presented by the JCGM 101:2008 involves the convolution of the probability distributions for the input sources of uncertainty through the measurement model to generate a distribution for the output (the measurand). In this process, no information is lost due to approximations, and the result is much more consistent with reality.
The main steps of this methodology are similar to those presented in the GUM. The measurand must be defined as a function of the input quantities through a model. Then, for each input, a probability density function (PDF) must be assigned. In this step, the concept of maximum entropy used in the Bayesian statistics should be used to assign a PDF that does not contain more information than that which is known by the analyst. A number of Monte Carlo trials are then chosen and the simulation can be set to run.
Results are expressed in terms of the average value for the output PDF, its standard deviation, and the end points that cover a chosen probability .
Example: Returning once more to the torque measurement example, one can consider the following PDFs for the input sources:
Mass (). For repeated indications, the JCGM 101:2008 suggests the use of a scaled and shifted t-distribution. Thus, the distribution should use 35.7653 kg as its average, a scale value of kg, and degrees of freedom.
For the calibration component, the supplement 1 recommends the use of a normal distribution if the number of degrees of freedom is not available. In this case, the mass value of 35.7653 kg is taken as the mean and a standard deviation of kg should be used. However, to facilitate the calculation of the final mean value of the measurand, the mean should be shifted to zero, since both values for the mass sources will be added together.
Local gravity acceleration (). This case is similar to the case of the balance certificate, for which we have values of expanded uncertainty and coverage factor without information on the number of effective degrees of freedom. Thus, a normal distribution with a mean of 9.80665 m/s2 and a standard deviation of m/s2 are assumed.
Length of the arm (). In this case, as poor information about the interval is available (±0.5 mm), an uniform distribution is assumed with a minimum value of 1999.5 mm and a maximum value of 2000.5 mm.
|Uncertainty source||Type||PDF parameters|
|Mass (repeatability)||A||t-distribution||Mean: 35.7653 kg; scale: 9.49 x 10−5 kg; degrees of freedom: 9|
|Mass (certificate)||B||Normal||Mean: 0 kg; standard deviation: 0.00005 kg|
|Local gravity||B||Normal||Mean: 9.80665 m/s2; standard deviation: 0.00001 m/s2|
|Arm length||B||Uniform||Minimum: 1999.5 mm; maximum: 2000.5 mm|
Table 5 summarizes the statistical data of the output distribution, including the upper and lower limits of a probabilistically symmetric range for a 95% coverage probability.
|Statistical data||Value (N m)|
|Lower limit for p = 95%||667.812|
|Upper limit for p = 95%||668.129|
Measurement uncertainty and metrological traceability are interdependent concepts. The evaluation of uncertainties of measurement results is essential to ensure that they are reliable and comparable. Moreover, the process that involves the modeling of measurement systems and evaluation of their uncertainties is of great importance for the metrologist as it constitutes a tool for the management of the measurement laboratory, since it can indicate exactly where to invest to get better, more qualified results.
The GUM and the application of the LPU continue to be the most used and widespread methodology for bottom-up uncertainty evaluation in metrology. It is adopted worldwide and provides a strong base for comparability of measurement results between laboratories. On the other hand, a new version for the GUM is currently under revision. This version should be aligned with its supplements in a more harmonized way, incorporating concepts from Bayesian statistics and resolving some inconsistencies. As a consequence, if the mentioned distribution-free coverage intervals are maintained, results for the expanded uncertainty will be greatly overestimated compared to the current version of the GUM.
In this way, the best alternative for a more realistic and lean uncertainty assessment would be through a numerical simulation using the Monte Carlo method, which should lead to a smaller and more reliable uncertainty result.