Open access peer-reviewed chapter

Fitting Models to Data: Residual Analysis, a Primer

Written By

Julia Martin, David Daffos Ruiz de Adana and Agustin G. Asuero

Submitted: 09 December 2016 Reviewed: 21 February 2017 Published: 05 July 2017

DOI: 10.5772/68049

From the Edited Volume

Uncertainty Quantification and Model Calibration

Edited by Jan Peter Hessling

Chapter metrics overview

2,897 Chapter Downloads

View Full Metrics

Abstract

The aim of this chapter is to show checking the underlying assumptions (the errors are independent, have a zero mean, a constant variance and follows a normal distribution) in a regression analysis, mainly fitting a straight‐line model to experimental data, via the residual plots. Residuals play an essential role in regression diagnostics; no analysis is being complete without a thorough examination of residuals. The residuals should show a trend that tends to confirm the assumptions made in performing the regression analysis, or failing them should not show a tendency that denies them. Although there are numerical statistical means of verifying observed discrepancies, statisticians often prefer a visual examination of residual graphs as a more informative and certainly more convenient methodology. When dealing with small samples, the use of the graphic techniques can be very useful. Several examples taken from scientific journals and monographs are selected dealing with linearity, calibration, heteroscedastic data, errors in the model, transforming data, time‐order analysis and non‐linear calibration curves.

Keywords

  • least squares method
  • residual analysis
  • weighting
  • transforming data

“Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity”.

(Box, G.E.P. Science and Statistics, J. Am. Stat. Ass. 1976, 71, 791–796)

Advertisement

1. Introduction

The purpose of this chapter is to provide an overview of checking the underlying assumptions (errors normally distributed with zero mean and constant variance (σi2), being independent one of each other) in a regression analysis, via the use of basic residual plot, such as plots of residuals versus the independent variable x. Compact formulae for the weighted least squares calculation of the a0 (intercept) and a1 (slope) parameters and their standard errors [1, 2] are shown in Table 1. The similarity with simple linear regression is obvious, simply making the weighting factors wi = 1. A number of selected examples taken from scientific journals and monographs are subject of study in this chapter. No rigorous mathematical treatment will be given to this interesting topic. Emphasis is mainly placed on a visual examination of residuals to check for the model adequacy [37] in regression analysis. The role of residuals in regression diagnostics is vital, being necessary with their thorough examination to consider an analysis as complete [810].

Equation: y ^ i = a 0 + a 1 x i Slope: a 1 = S X Y / S x x
Weights: w i = 1 / s i 2 Intercept: a 0 = y ¯ a 1 x ¯
Explained sum of squares: S S Reg = w i ( y ^ i y ¯ ) 2 Weighted residuals: w i 1 / 2 ( y i y ^ i )
Residual sum of squares: S S E = w i ( y i y ^ i ) 2 Correlation coefficient: r = S X Y / S X X S Y Y
Mean: x ¯ = w i x i / w i y ¯ = w i y i / w i
Sum of squares about the mean: S X X = w i ( x i x ¯ ) 2 S Y Y = w i ( y i y ¯ ) 2 S X Y = w i ( x i x ¯ ) ( y i y ¯ )
Standard errors: s y / x 2 = S S E n 2 = S Y Y a 1 2 S X X n 2
s a 0 2 = s y / x 2 w i x i 2 S X X w i
s a 1 2 = s y / x 2 S X X
cov ( a 0 , a 1 ) = x ¯ s y / x 2 / S X X

Table 1.

Formulae for calculating statistics for weighted linear regression (WLR).

The residuals are geometrically the distances calculated in the y‐direction [1, 2, 11, 12] (vertical distances) between the points and the regression line (error free in the independent variable)

r i = y i y ^ i E1

The calculated regression line

y ^ i = a 0 + a 1 x i E2

corresponds to the model

y i = α 0 + α 1 x i + ε i E3

where εi is the random error (a0 and a1, Eq. (2), are the estimates of the true values α0 and α1), leading to a sum of squares of the residuals minimum

Q min = [ r i 2 ] min E4

Note that the model error is given by

ε i = y i E ( y i ) E5

where E(yi) is the expected value of yi [6]. Thus, the residuals ri may be viewed as the differences between what is really observed, and what is predicted by the model equation (i.e. the amount that the regression equation is not able to explain). The residuals ri may be thought as the observed errors [10] in correct models. The residuals reveal the existing asymmetry [13] in the functions of the response and the independent variable in regression problems. A number of assumptions concerning the errors [14, 15] have to be made when performing a regression analysis, for example, normality, independence, zero mean and constant variance (homoscedasticity property).

An assumption that the errors are normally distributed is not required to obtain the parameter estimates by the least squares method. However, for inferences and estimates (standard errors, t‐ and F‐test, confidence intervals) to be made about regression, it is necessary to assume that the errors are normally distributed [11]. The assumption of normality, nevertheless, is plausible as in many real situations errors tend to be normally distributed due to the central limit theorem. The assumption that no residual term is correlated with another, combined with the normality assumption, means [10] the errors are also independent. Constructing a normal probability plot of the residuals [1618] is a way to verify the assumption of normality. Residuals are ordered and plotted against the corresponding percentage points from the standardized normal distribution (normal quantities). If the residuals are then situated along a straight line, the assumption of normality is satisfied.

A standardized residual is the residual divided by the standard deviation of the regression line

e r i = r i s y / x E6

The standardized residuals are normally distributed with a mean value of zero and (approximately) unity variance [10, 19]

Var( r i ) = Var( y i ) Var( y ^ i ) = σ 2 σ 2 ( 1 w i + ( x x ¯ ) 2 S X X ) = σ 2 ( 1 1 w i ( x x ¯ ) 2 S X X ) = σ 2 ( 1 h i i ) E7
Var( e r i ) = 1 h i i E8

The hii term may be regarded as measuring the leverage of the data point (xi,yi) (see below). The estimated residuals are correlated [10], but this correlation does not invalidate the residual plot when the number of points is large in comparison with the number of parameters estimated by the regression. As pointed out by Behnken and Draper [20]: ‘In many situations little is lost by failing to take into account the differences in variances’. Standardized residuals are useful in looking for outliers. They should have random values, the 95% falling between −2 and 2 for normal distribution.

The tendencies followed by the residuals should confirm the assumptions we have previously made [21], or at least do not deny them. Remember the sentence of Fischer [22, 23]: ‘Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis’. Conformity assays of the assumptions inherent to regression fall mainly in the examination of residual pattern. Although there are statistical ways of numerically measuring some of the observed discrepancies [24], graphical methods play an important role in data analysis (Table 2) [2531]. A quick examination of residuals often allows obtaining more information than significance statistical tests of some limited null hypothesis. Nevertheless, objective, unambiguous determination should be based on standard statistical methodology. This chapter is mainly focused on residual plots rather than on formulas, or hypothesis testing. As we will see in the selected examples, the plots easily reveal violations of the assumptions if they are severe enough as to warrant any correction.

Sentence Author(s)/reference
‘Most assumptions required for the least squares analysis of data using the general linear model can be judged using residuals graphically without the need for formal testing’ Darken [25]
‘Graphical methods have an advantage over numerical methods for model validation because they readily illustrate a broad range of complex aspects of the relationship between the model and the data’. NIST/SEMATECH [26]
‘There is no single statistical tool that is a powerful as a well‐chosen graph’ Huber [27]
‘Although there are statistical ways of numerically measuring some of the observed discrepancies, statistician themselves prefer a visual information of the residual plots as being more informative and certainly more convenient’ Belloto and Sokoloski [28]
‘Eye‐balling can give diagnostic insights no formal diagnostic will ever provide’ Chambers et al. [29]
‘Graphs are essential to good statistical analysis’ Anscombe [30]
‘One picture says more than a thousand equations’ Sillen [31]

Table 2.

Sentences of some authors about the use of graphical methods.

The main forms of representation [10] of residuals are (i) global; (ii) in temporal sequence, if its order is known; (iii) faced to the adjusted values, y‐hat; (iv) facing the independent variable xji for j = 1,2… k; and (v) in any way that is sensitive to the problem subject of analysis.

The following points can be verified in the representation of the residuals: (i) the form of the representation, (ii) the number of positive and negative residuals should be equivalent of vanishing median, (iii) the sequence of residual signs must be randomly distributed between + and −, and (iv) it is possible to detect spurious results (outliers); their magnitudes are greater than the rest of the residuals.

Residual plots appear more and more frequently [3239] in papers published in analytical journals. In general, these plots as well as those discussed in this chapter are very basic and can undergo some criticism. For example, the residuals are not totally distributed independent of x, since [10, 19] the substitution of the estimates by the parameters introduces some dependence. However, more sophisticated methods have been developed [4044] based on standardized, studentized, jack‐knife, predictive, recursive residuals, and so on (Table 3). In spite of their importance, they are considered beyond the scope of this contribution.

Symbol Name Formula Comments
e i Classical residuals e i
e N i Normalized residuals e i s y / x
e S i Standardized residuals e i s y / x 1 h i i Identification of heteroscedasticity
e J i Jack‐knife residuals e i n m 1 n m e S i 2 Identification of outliers
e P i Predicted residuals e i 1 h i i
e R i Recursive residuals Identification of autocorrelations

Table 3.

Types of residuals and suitability for diagnostic purposes [4244].

Despite the frequency with which the correlation coefficient is referred to in the scientific literature as a criterion of linearity, this assertion is not free from reservations [1, 4549] as evidenced several times throughout this chapter.

The study of linearity not only implies a graphic representation of the data. It is also necessary to carry out a statistical check, for example, the analysis of the variance [5054], which requires repeated measurements. This implies the fulfilment of two requirements: the homogeneity (homoscedasticity) of the variances and the normality of the residuals. Incorporating replicates to the calibration estimation offers a possibility to look at the calibration not only in the context of fitting but also of the uncertainty of measurements [15]. However, if replicate measurements are not made, and an estimate of the mean square error (replication variance) is not available, the regression variance

s y / x 2 = ( y i y ^ i ) 2 n 2 = r i 2 n 2 E9

may be compared with the estimated variance around the mean of the yi values [55]

s y 2 = ( y y ¯ ) 2 n 1 E10

by means of an F‐test. The goodness of fit of non‐linear calibration curves is improved by raising the degree of the fitting polynomial, performing then an F‐test (quotient of the residual variance for the kth to k + first‐polynomial degree) [56]. A suitable test can also be carried out according to Mandel [56, 57]. However, this contribution is essentially devoted to the use of basic graphical plots of residuals, a simple and at the same time a powerful diagnostic tool, as we will have the occasion to claim through this chapter.

Several examples taken from scientific journals and monographs are selected in order to illustrate this chapter: (1) linearity calibration methods: fluorescence data [58] as an example; (2) calibration graphs: the question of intercept [59] or non‐intercept; (3) errors are not in the data, but in the model: the CO2 vapour pressure [59] versus temperature dependence; (4) the heteroscedastic data: high‐performance liquid chromatography (HPLC) calibration assay [60] of a drug; (5) transforming data: preliminary investigation of a dose‐response relationship [61, 62]; the microbiological assay of vitamin B12; (6) the variable that has not yet been discovered; the solubility of diazepan [28] in propylene glycol; and (7) no models perfect: nickel(II) by atomic absorption spectrophotometry.

Advertisement

2. Linearity in calibration methods: fluorescence data as example

Calibration is a crucial step, an essential part, the key element, the soul of every quantitative analytical method [38, 40, 6369], and influences significantly the accuracy of the analytical determination. Calibration is an operation that usually relates an output quantity to an input quantity for a measuring system under given condition (The International Union of Pure and Applied Chemistry (IUPAC)). The topic has been the subject of a recent review [67] focused on purely practical aspects and obviating the mathematical and metrological aspects. The main role of calibration is transforming the intensity of the measured signal into the analyte concentration in a way as accurate and precise as possible. Guidelines for calibration and linearity are shown in Table 4 [7081].

Scientific Association or Agency Reference
Calibration
International Union of Pure and Applied Chemistry (IUPAC) Guidelines for calibration in analytical chemistry [70]
International Organization for Standardization (ISO) ISO 8466‐1:1990 [70]; ISO 8466‐2:2001 [71]; ISO 11095:1996 [72]; ISO 28037:2010 [73]; ISO 28038:2014 [74]; ISO 11843‐2: 2000 [75]; ISO 11843‐5: 2008 [76]
LGC Standards Proficiency Testing LGC/VAM/2003/032 [77]
Linearity
International Conference on Harmonization (ICH) Guideline Q2A [78]
Clinical Laboratory Standard Institute (CLSI) EP6‐A [79]
Association of Official Analytical Chemists (AOAC) AOAC Guidelines 2002 [80]
European Union EC 2002/657 [81]

Table 4.

Scientific organizations that approve calibration guidelines [7081].

Linearity is the basis of many analytical procedures. It has been defined as [78] the ability (within a certain range) to obtain test results that are directly proportional to the concentration (amount) of analyte in the sample. Linearity is one of the most important characteristics for the evaluation of accuracy and precision in assay validation, and as seldom is the case where a calibration curve is perfectly linear, it is crucial to access linearity during method validation. Such evaluation is also recommended in regulatory guidelines [7881]. Although it may seem that everything has been said on the subject of linearity, it is still an open question and subject to debate. It is therefore not surprising that some proposals are made from time to time to resolve this issue [54, 8292].

However, in calibration, statistical linearity tests between the variables are rarely performed in analytical studies. When dealing with regression models, the most convenient way of testing linearity beside a visual assessment is plotting the residual sequence in the concentration domain. A simple nonparametric statistical test for linearity, known as ‘the sign test’ [9, 16, 28], is based on the examination of the residuals (ri) sign sequence.

The residuals should be distributed in a random way. That is, the number of plus and minus residuals sign should be equal with the error symmetrically distributed (null hypothesis for the assay) when the variables are connected through a true linear relationship. The probability to get a random residual signs pattern is related to the number of runs in the sequence of signs. Intuitively and roughly speaking, the more these changes are randomly distributed [93] the best is the fit. A run is a sequence of the same sign with independence of its length. A pattern of residual signs of the kind [+ ‐ ‐ + + ‐ + ‐ + ‐ +], from independent measurements, is considered as random, whereas a pattern like this [‐ ‐ ‐ + + + + + + ‐ ‐] is not. Though a formal statistical test may carry out [94] with the information afforded by the residual plot, it is necessary a number of points greater than is usual in calibrate measurements.

The fluorescence in arbitrary units of a series of standards is shown in Table 5. To these data that appear to be curved, a straight line may be fitted (Figure 1, top) which results in an evident lack of fit, though the correlation coefficient (R) of the line is equal to 0.995 2. A plot of the resulting residuals ri against the x‐values (reduced residuals on the secondary axis) is also shown in Figure 1 (top), and allows checking for systematic deviations between data and model.

Concentration (μM) Fluorescence (arbitrary units) Concentration (μM) Fluorescence (arbitrary units)
0 0.2 6 20.4
1 3.6 7 22.7
2 7.5 8 25.9
3 11.5 9 27.6
4 15 10 30.2
5 17

Table 5.

Calibration fluorescence data [58].

Figure 1.

Fitting a straight line (top), a quadratic function (middle) and a cubic function (bottom) to fluorescence data compiled in Table 5.

The pattern of the sign of the residuals indicates that fitting the fluorescence data by a straight‐line equation is inadequate, higher‐order terms should possibly be added to account for the curvature. Note that the straight‐line model is not adequate even though the reduced residuals are less than 1.5 in all cases. When an erroneous equation is fitted to the data [9597], the information contained in the form of the residual plots is a valuable tool, which indicates how the model equation must be modified to describe the data in a better way. A curved calibration line may be fitted to a power series. The use of a quadratic (second‐degree) equation is enough in this case to obtain a good fit: the scattering of the residuals above and below the zero line is similar, as shown in Figure 1 (middle). Then, when no obvious trends in the residuals are apparent, the model may be considered to be an adequate description of the data. The simplest model or the model with the minimum number of parameters that adequately fit the data in question is usually the best choice. ‘Non sunt multiplicanda entia praeter necessitatem’ (Occam’s razor) [98]. In fact, the order of the polynomial (k) must not rise above 2 since [sy/x]k = 2 = 0.3994 < [sy/x]k = 3 = 0.4142.

In summary, when it is assumed a correct relationship between the response and the independent variable (s), the residual plot should resemble that of Figure 2 (left). All residuals should fall into the gravelled area, with a non‐discernable pattern, that is, random. If the representation of the residuals resembles that of Figure 2 (right), where curvature is appreciated, the model can probably be improved by adding a quadratic term or higher‐order terms, which should better describe the model with the required curvature.

Figure 2.

Residuals in a random (left) and parabolic (right) form.

Calibration curves with a non‐linear shape also appear in analytical chemistry [99104]. When the data in the x‐range (calibration) vary greatly as it does in many real problems, the response becomes non‐linear (Table 6) [101, 105107] at sufficiently large x‐values. The linear range of liquid chromatography‐tandem mass spectrometry (LC‐MS/MS) is typically about three orders of magnitude. The analyst’s usual response in this case is to restrict sometimes the concentration range with the purpose of using a linear response, thus introducing biases in the determination, since the choice of the ‘linear region’ is usually done in an arbitrary way. The use of a wider range in standard curve is preferred in order to avoid the sample dilution, saving time and labour [108]. An acceptable approach to extend the dynamic range of the standard curve is the use of quadratic regression. Among the possible causes of standard curve non‐linearity are saturation at high concentration during ionization, the formation of dimer/multimer/cluster ions, or detector saturation. It has been established that when the analyte concentration is above ∼10−5 M, its response starts to saturate providing non‐linear response.

Response function Name Technique
y = a 0 + a 1 x Beer law Absorption spectrophotometry
y = A + B log x Nernst equation Electrochemistry
y = a x n ( log y = n log x + log a ) Scheibe‐Lomakin Emission spectroscopy ESI‐MS; ELSD; CAD
y = a x n + a 0 ( 0 < y < y ) y = k x + y 0 ( y > y ) TLC‐densitometry“
b n y n + b 1 y = x DAD
b y + b 0 = x ESI‐MS
y = a 0 + a 1 x + a 2 x 2 Wagenaar et al. Atomic absorption spectrophotometry, liquid chromatography/MS/MS
log y = a 0 + a 1 log x + a 2 log x 2 CAD
y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Ion‐Trap‐MS
y = A + B ( 1 exp C x ) Andrews et al. Atomic absorption spectrophotometry
y = A D 1 + ( x C ) B + D Rodbard Radioimmunoassay

Table 6.

Response functions used in instrumental analysis [105107].

ESI‐MS: electrospray ionization mass spectrometry; TCD: thermal conductivity detector; TLC‐densitometry: TLC with detection by optical densitometry; ELSD: evaporative light scattering detector; CAD: charged aerosol detection.


Quadratic curve‐fitting calibration data are more appropriate [104, 109117] than straight‐line linear regression, in the case of some quantification methods. Matrix‐related non‐linearity is typical of methods such as LC‐MS/MS. In order to provide an appropriate validation strategy for such cases, the straight‐line fit approximation has been extended to quadratic calibration functions. When such quadratic terms are included [10, 118120], precautions should be taken because of the consequent multicollinearity problems.

However, the use of quadratic regression model is considered as less appropriate or even viewed with suspicion by some regulatory agencies and, as a result, not often used in regulated analysis. In addition, the accuracy around the upper limit of quantitation (ULOQ) can be affected if the curve range is extended to the point where the top of the curve is flat.

Statistical tests may also be considered for providing linearity, like Mandel’s test [57] for comparison errors of residuals of quadratic and linear regression by means of an F‐test at a determined significance level, or like lack‐of‐fit test by analysis of variance (ANOVA) or testing homoscedasticity (the homogeneity of variance residuals).

Advertisement

3. Calibration graphs: the question of intercept or non‐intercept

Absorption spectrometry is an important analytical technique, and, to be efficient, the calibration must be accomplished with known samples. Data for the calibration of an iron analysis, in which the iron is complexed with thiocyanate, are shown in Table 7. The absorption of the iron complex is measured and depicted versus iron concentration in ppm. The standard deviation of the regression line, sy/x, obtained from the experimental data, quantifies the quality of fit and the accuracy of the model to predict the values of y (the measured quantity), for a given value of x (the independent variable).

Concentration, Fe (ppm) Absorbance Concentration, Fe (ppm) Absorbance
0.3644 0.0268 3.644 0.248
0.7288 0.0506 7.288 0.495
1.083 0.0783 32.8 1.52

Table 7.

Absorbance data for Fe3+ calibration (as Fe‐SCN2+ complex) [59].

The regression line is first computed by forcing it to pass through the coordinate origin (a0 = 0), since the absorbance should be directly proportional to the concentration (at zero concentration, a zero absorbance might be expected). However, the adjustment thus obtained is not very good. The representation of residuals shows the pattern of signs + + + + + ‐ (Figure 3, lower left). If we compute the regression line with intercept (Figure 3, upper right), the correlation coefficient increases from 0.990 8 to 0.994 7, but the pattern of non‐random signs persists, that is, ‐ ‐ ‐ + + ‐ (Figure 3, lower right). What is a reasonable explanation? If the highest concentration point (32.8 ppm) is discarded, all the other points appear to fall on a straight line. However, this point cannot be discarded on the basis of its deviation from the best‐fit line, because it is closer to the line than other calibration points in the series. As a matter of fact, the last point (32.8 ppm) defines where the line must pass: being so distant, it has a great influence (leverage) and forces the least squares regression line in its direction. The non‐robustness behaviour is a general weakness of the least squares criterion. Very bad points [59] have a great influence just because their deviation from the true line, which rises to the square, is very large. One has to be aware of dubious data by correcting obvious copying errors and adopting actions coherent with the identified causes. Improper recording of data (e.g. misreading responses or interchange of digits) is frequently a major component of the experimental error [121].

Figure 3.

Calibration curve and residuals for the Fe‐SCN2+ system without intercept (left) and with intercept (right) included in the model for all data compiled in Table 7.

A few high or low points [8] can alter the value of the correlation coefficient in a great extension. Larger deviations present at larger concentrations tend to influence (weight) the regression line more than smaller deviations associated with smaller concentrations, and thus the accuracy in the lower end of the range is impaired. It is therefore very convenient [122124] to analyse the plotted data and to make sure that they cover uniformly (approximately equally spaced) the entire range of signal response from the instrument (85). Data should be measured at random (to avoid confusing non‐linearity with drift). The individual solutions should be prepared from the same stock solution, thus avoiding the introduction of random errors from weighing small quantities from individual standards. Depending on the location of the outliers, the correlation coefficient may increase or decrease. In fact, a strategically situated point can make the correlation coefficient varies practically between −1 and +1 (Figure 4), so precautions should be taken when interpreting its value. However, points of influence (e.g. leverage points and outliers) (Table 8) are rejected only when there is an obvious reason for their anomalous [125] behaviour. The effect of outliers is greater as the sample size decreases. Duplicate measurements, careful scrutiny of the data while collecting and testing discrepant results with available samples may aid to solve problems [28] with outliers.

Gross errors Caused by outliers in the measured variable or by the leverage points extremes
Golden points Special chosen points which have been very precisely measured to extend the prediction capability of the system
Latently influential points Consequence of a poor regression model
According to data location
Outliers Differs from the other points in values of the y‐axis
Leverage points Differ from the other points in values of the x‐axis or in a combination of these quantities (in the case of multicollinearity)

Table 8.

Influential points [44].

Figure 4.

Influence of an anomalous result on the least squares method (solid line) and on the correlation coefficient.

If the regression analysis is made without the 32.8 ppm influence point forcing to pass through the origin, the correlation coefficient reaches the 0.999 91 value (Figure 5, top left). This point was not considered because a high deviation standard values were above the new line. Perhaps, the problem observed with the 32.8 ppm point is due to the fact that sulphocyanide (thiocyanate) is not in enough excess to complex all the iron present. However, the inspection of residuals (+ + + + ‐) shows systematic, non‐random deviations (Figure 5, bottom left), which may indicate an incorrect or inadequate model. Systematic errors of analysis translate into (systematic) deviations from the fit equation (negative residuals correspond to low estimated values, and positive residuals to high). An erroneous omission of the intercept term in the model may be the cause of this effect. The standard deviation of the regression line improves notably, from 0.0026 to 0.0017, when the intercept is introduced (Figure 5, top right) in the model (correlation coefficient equals 0.999 97), the residual pattern being now random (‐ ‐ + ‐ +) (Figure 5, bottom right). The calibration is then appropriate and linear, at least up to 8 ppm. However, the intercept value, 0.0027, is of the same order of magnitude as the standard deviation, sy/x, of the regression line. A calibration problem (of minor order) may be apparent, for example, the spectrophotometer was not properly set to zero or the cuvettes were not conveniently matched.

Figure 5.

Calibration curve and residuals for the Fe‐SCN2+ system without intercept (left) and with intercept (right) included in the model (data in Table 7 with the exception of the 32.8 ppm point).

Residual analysis of small sample sizes has [126] some complications. Firstly, residuals are not totally distributed in an independent way from x, because the substitution of the parameters by the estimators introduces [10, 19] certain dependence. Secondly, a few points far from the bulk of the data may eventually condition the estimators, residuals and inferences.

Advertisement

4. Error is not in the data, but in the model: the CO2 vapour pressure versus temperature case

The linear variation of physical quantities is not a universal rule, although it is often possible to find a coordinate transformation [18] that converts non‐linear data into linear ones. The vapour pressure, P, in atmospheres, of carbon dioxide (liquid) as a function of temperature, T, in degrees Kelvin, is not linear (Table 9). Carbon dioxide found its use in chemical analysis as a supercritical fluid for extracting the caffeine from the coffee. We may expect, on the basis of the Clausius‐Clapeyron equation, to fit the data compiled in Table 9 into an equation of the form

Temperature (°K) Vapour pressure Temperature (°K) Vapour pressure
216.5500 5.11023 266.4944 28.70169
222.0500 6.44393 272.0500 33.39684
227.6056 8.04301 277.6056 38.63636
233.1611 9.92107 283.1611 44.47469
238.7167 12.09853 288.7167 50.93903
244.2722 14.62303 294.2722 58.07022
249.8278 17.50817 299.8278 65.91589
255.3833 20.78797 304.1611 72.76810
260.9389 24.51007

Table 9.

CO2 vapour pressure versus temperature data [59].

ln P = A + B / T E11

This requires a transformation of the data. If we define

Y = ln P E12

and

X = 1 / T , E13

this form is linear,

Y = A + B X E14

The resulting graph (Figure 6, middle solid line) examined, appears to be fine, like calculated statistics, and so there is no reason at first to esteem any problem. Results lead to a correlation coefficient of 0.999 988 76. This almost perfect adjustment is indeed very poor when attention is paid to the potential quality of the fit as shown by the sinusoidal pattern of residuals [+ + ‐ + ‐ + + + + + ‐ ‐], which are incorporated in the figure to the resulting least squares regression line. As the details of measurements are unknown, it is not possible to test for systematic error in the experiments. The use of an incorrect or an inadequate model is the reason, which explains in this case the systematic deviations. The Clausius‐Clapeyron equation does not exactly describe the phenomenon when the temperature range is wide. Results similar to those shown in Figure 6 are also obtained by applying weighted linear regression by using weighting factors defined by [6, 7, 127129]

Figure 6.

Top: CO2 vapour pressure against temperature data (top). Middle: representation of ln P against the reciprocal of temperature (Clausius‐Clapeyron equation) including the residuals plot. Bottom: CO2 vapour pressure as a function of temperature according to the expanded (MLR) model.

w i = 1 ( Y i y i ) 2 = 1 ( ln P i P i ) 2 = P i 2 E15

on the basis of the transformation used.

The error does not lie in the data then, but in the model. We may try to improve the latter by using a more complete form of the equation

ln P = A + B / T + C ln T + D T E16

The results now obtained (analysis by multiple linear regression) depicted in Figure 6 (bottom) are better than those obtained by using the single linear regression equation, with the residuals randomly distributed. Values of ln P may be calculated with an accuracy of 0.001 (or an accuracy level of 0.1%), as suggested by the standard deviation of the regression line obtained. In addition, as T is used as a variable, instead of its inverse, interpolation calculations are carried out in an easier way.

The moral of this section is that there are not perfect models [130, 131], but models that are more appropriate than others.

Advertisement

5. The heteroscedastic data: HPLC calibration assay of a drug

In those cases in which the experimental data to be used in a given analysis are more reliable than others [6, 61, 63], gross errors may be involved when the conventional method of least squares is applied in a direct way. The assumption of uniform or regular variance of y may be no correct when the experimental measurements cover a wide range of x‐values. There are two possible solutions to this non‐constant, irregular, non‐uniform or heteroscedastic variance problem: data transformation or weighted least squares regression analysis.

The squared sum of the weighted residuals [132, 133, 64]

Q min , w = [ w i r i 2 ] min E17
w i = 1 σ i 2 E18

is minimized in the weighted least squares procedure. The idea underlying weighted least squares is to attribute the greatest worth [2, 40, 132, 66, 101, 135, 136] to the most precise data. The greater the deviation from the homoscedasticity, the greater the profit that can be extracted from the use of the weighted least squares procedure. The homoscedasticity hypothesis is usually justified in analytical chemistry in the framework of the calibration. However, when the range of abscissa values (concentration) covers several orders of magnitude, for example, in the study of (calibration) drug concentrations in urine or in other biological fluids, the accuracy of y‐values is strongly dependent of the x ones. In those cases, the homoscedastic requirement implied in single linear regression is violated, thus the introduction of weighting factors being mandatory. Some typical cases of heteroscedasticity appear [137, 138] involving a constant relative standard deviation (Figure 7)

Figure 7.

Left: hypothetical HPLC response versus concentration for a typical serum. Right: examples of relationships between concentration of analyte (x), standard deviation (SD), and coefficient of variation (CV).

R S D = σ i x ¯ E19

or a constant relative variance (radioactive accounts, Poisson distribution). Photometric absorbances by Beer’s law cover a wide concentration range and like chromatographic analysis in certain conditions, tend to be heteroscedastic. Inductive plasma‐coupling emission spectrometry coupled to mass spectrometry (ICPMS) requires weighted least squares estimates even when the calibration covers a relatively small concentration range. The standard deviation (absolute precision) of measurements σi usually increases with the concentration xi, whereas the relative standard deviation (relative precision, RSD) decreases instead.

It is possible to derive [138] relationships between precision and concentration through the concentration range essayed so that chemical methods are applied to found analytes present at varying concentrations. A number of different relationships [2, 139142] have been proposed (Table 10) for different authors, and ISO 5725 gives [143] indications to assist in obtaining a given C = f(σi) relationship.

σ c = p C k ( k = 0.5 , 1 , > 1 , )
σ c = p ( C + 1 ) k
σ c = p e q C
σ c = p C k + q
σ c = a 0 + a 1 C q ( q = 1 , 2 )
σ c = a 0 + a 1 C + a 2 C 2
σ c = p y k
σ c = a 0 + a 1 y + a 2 y 2

Table 10.

Relationship types between the standard deviation and the concentration of the analyte.

The advantages of the least squares method may be impaired if the appropriate weights are not included in the equations, despite being a powerful tool. The least squares criterion is highly sensitive to outliers, as we have seen in Figure 4. An undesirable paradox may often occur consisting in the fact that the experimental data of worst quality contribute most to the estimation of the parameters. Although replication may be severely limited [15, 132], it possesses the advantage to provide a certain kind of robust regression [144]. The most common method of performing a weighted regression is using weights values reciprocal to the corresponding variances values, that is,

w i = 1 s i 2 E20

where si2 is the experimental estimate of σi2. Eq. (14) warrants that in using replication, the lower weights correspond to the outliers of yi. The incorporation of heteroscedasticity into the calibration procedure is preconized [145, 146] by several international organizations such as ISO 9169, and ISO/CD 13‐752. The International Union of Pure and Applied Chemistry (IUPAC) includes the heteroscedasticity [147, 148] or non‐constant variance topic for the calculation of the limits of detection and quantification.

The assumption of constant variance in the physical sciences may be erroneous [34, 149157]. The data from a calibration curve (Table 11) relating to the readings of an HPLC assay to the drug concentration in ng/mL in blood [60] are shown in Figure 8. A regression model reasonable for the mean values is y = α0 + α1 x, in the first approximation. However, the variability of the response increases in a systematic way with increasing the concentration level. This indicates that the constant variance assumption of the response through the range of concentrations assayed is not followed. In fact at the highest level of concentration, a very large response value is produced. There is no physical justification that allows excluding this value from the others in the analysis. Assuming in the first instance constant variance, the least squares are used to obtain as estimated parameters a0 = 0.0033 and a1 = 0.0014, with sy/x = 0.0265. The representation of the residuals versus the values of x should show variability with a constant band, if the model was appropriate. Note that, in Figure 9 (bottom left), the pattern of funnel shape or trumpet indicates that the measurement error is increasing as it does the mean response. The assumption of constant variance is thus not satisfied. On the other hand, the intercept value, 0.0033, is not in good agreement with the mean response value (13 replicates) at zero dose, 0.0021, which supposes another additional problem. The result of ignoring the non‐constant variance in this case results in a poor fit of the model. The weighted linear regression straight‐line model led instead (Figure 10, top) to the equation y = 0.0015 x + 0.0021, the band of residuals being now rectangular (Figure 10, bottom).

Doses
0 5 15 45 90
Response 0.0016 0.0118 0.0107 0.106 0.106
0.0019 0.0139 0.0670 0.026 0.158
0.0002 0.0092 0.0410 0.088 0.272
0.0030 0.0033 0.0087 0.078 0.121
0.0042 0.0120 0.0410 0.029 0.099
0.0006 0.0070 0.0104 0.063 0.116
0.0006 0.0025 0.0170 0.097 0.117
0.0011 0.0075 0.0320 0.066 0.105
0.0006 0.0130 0.0310 0.052 0.098
0.0013 0.0050
0.0020 0.0180
0.0050
0.0050

Table 11.

Calibration data for an HPLC blood concentration (ng/mL) assay of a drug [60].

Figure 8.

Calibration curve obtained by single linear regression (top) and corresponding residuals plot (bottom).

Figure 9.

Residuals in the form of funnel (left) and ascending (right).

Figure 10.

Response versus concentration obtained by weighted linear regression (top) and residuals plot for this model (bottom).

The weighted least squares method requires a higher number of replicates than the conventional least squares method. The estimation of the minimum number of replicates varies between six and 20, according to different authors. In practice, it is often difficult to reach such high level of replication [2, 15] for different reasons, such as cost or availability of calibration, standards and reagents, time demands on previous operations, or by recording of the chromatograms.

In order to apply the weighted least squares analysis, it is mandatory to assign weighting factors to the corresponding observations. In fact, the weighting factor is related with the information contained in the yi value, being proportional to the reciprocal of the variance of yi. The results of single‐trial assay without additional information seldom contain enough information as to model the variance in a satisfactory way. The independent variable may be usually choice, fortunately, by the researcher, and the corresponding values for the dependent thus replicated.

The general phenomenon of non‐constancy of variance is called as we have previously seen heteroscedasticity. It can be addressed [2, 15] by the weighted least squares method. A second method of dealing with heteroscedasticity is to transform the response [18] so that the variance of the transformed response is constant, proceeding then in the usual way, as in the following.

Advertisement

6. Transforming data: preliminary investigation of a dose‐response relationship

The non‐linear relationship between two variables may be sometimes handled as linear by means of [158] a transformation. A transformation consists in the application of a mathematical function to a set of data. The transformation leading finally to a straight‐line fit to the data can be carried out on a variable or on both. The transformation of data is sometimes understood as a device which statisticians use, a conviction founded on the preconceived idea that the natural scale of measurement [159] is something like sacrosanct. This is not like this, and in fact some measurements, for example, those of pH, are actually logarithmic, transformed values [160]

As much as the analyst wants the mould of nature to be linear, often in the curves truth is simply found [118, 160]. Real‐world systems sometimes do not fulfil the essential requirements for a rigorous or even an approximate validity of the method of analysis. In many cases, a transformation (change of scale) can sometimes be applied to the experimental data [18] in order to carry out a conventional analysis. Although it may seem that the best way to estimate the coefficients of a non‐linear equation is the direct use of a non‐linear regression program (NLR), NLR itself [161] is not without drawbacks and problems.

The data of turbidimetric measurements of the growth response of Lactobacillus leichmannii to vitamin B12 [61] provide a good illustration of a preliminary investigation of dose‐response relationships. Table 12 shows the responses to the eight different doses of vitamin B12 measured in six independent tubes per dose, which are depicted in Figure 11.

Doses (ng/tube)
0.23 0.35 0.53 0.79 1.19 1.78 2.67 4
Response 0.15 0.28 0.36 0.51 0.68 0.85 1.06 1.21
0.14 0.20 0.36 0.53 0.63 0.80 0.91 1.22
0.19 0.23 0.34 0.54 0.64 0.71 1.09 1.29
0.19 0.25 0.37 0.45 0.61 0.85 0.93 1.24
0.17 0.23 0.33 0.57 0.65 0.94 1.09 1.18
0.16 0.23 0.38 0.49 0.68 0.83 1.12 1.24

Table 12.

Microbiological assay of vitamin B12 [61, 62].

Figure 11.

Representation of the dose‐response data for the microbiological assay of vitamin B12.

The transformation [62]

z = log x E21

can be used (Figure 12). The inspection of Figure 12, however, suggests the existence of a marked curvature. The graph of the residuals, the deviation of each point of the model, indicates that the straight line is incorrect, due to the observed systematic pattern. There is a tendency towards curvature, as it is not randomly distributed around zero. It should be assumed that the model is susceptible to improvement, requiring either higher‐order additional terms or a transformation of the data.

Figure 12.

Top: fitting a straight line to response versus logarithm of dose (microbiological assay of vitamin B12). Bottom: plot of the residuals against the logarithm of the doses for the straight‐line model.

If a second‐degree polynomial is fitted to the response data as a function of the logarithm of the dose, the adjustment to the naked eye seems adequate (Figure 13, top). The representation of the residuals as a function of the abscissa values (Figure 13, bottom), however, adopts a funnel shape. The non‐random pattern of residuals carries the message that the assumption of homogeneous (regular or constant) variance is not satisfied, which would require the application of the weighted least squares method, rather than simple linear regression.

Figure 13.

Top: fitting a second‐degree polynomial to the response versus logarithm of dose (microbiological assay of vitamin B12). Bottom: plot of the residuals against the logarithm of the dose for the second‐degree polynomial model:

The shape of Figure 13 (top) suggests a simple possibility, that of transformation [43] also in the response

u = y E22

A simple inspection of Figure 14 (top) now shows that the linear regression is valid throughout the entire range. Both transformations to achieve homogeneity of variance and normality (Tables 13 and 14) go together (hand in hand) and then both postulates are (almost) often simultaneously fulfilled, fortunately, on applying an adequate transformation.

Date type Transformation
Poisson (counts) (y) y
Small counts (y) y + 1 or y + y + 1
Binomial ( 0 < P < 1 ) a sin P
Variance = (mean)2 lny
Correlation coefficient 0.5[ln(1+r)−ln(1−r)]

Table 13.

Transformations to correct for homogeneity and approximate normality [18, 162].

Estimated relationship α λ = 1 α Transformation
s = k y ¯ ^ 2 2 −1 Reciprocal
  s = k y ¯ ^ 3 / 2 3/2 −1/2 Inverse square root
  s = k y ¯ ^ 2 1 0 Logarithmic
  s = k y ¯ ^ 1 / 2 1/2 1/2 Square root
  s = k 0 1 Without transformation

Table 14.

Transformations to stabilize variance [18, 158] W = ( y λ 1 ) / λ ( λ 0 ) ; W = ln y ( λ = 0 ) .

Figure 14.

Top: fitting a straight line to the transformed data, square root of the response against the logarithm of the dose (microbiological assay of vitamin B12). Bottom: plot of the residuals versus the logarithm of the doses for the straight‐line model applied to the data transformed in both axes.

The stabilization of variance usually takes precedence over improving normality [160]. As stated by Acton [86] ‘The gods who favour statisticians have frequently ordained that the world be well behaved, and so we often find that transformation to obtain one of these desiderata in fact achieves them all (well, almost achieves them¡)’.

Linear regression is a linear (in the parameters) modelling process. However, non‐linear terms may be introduced into the linear mathematical context by performing a transformation [162, 163] of the variables (Table 15). Note that when a transformation is used, a transformation‐dependent weight (Table 16) should be used (in addition to any weight based on replicate measurements). When a non‐linear function is capable of being transformed into another one linear, it is called ‘intrinsically linear’. Non‐linear functions that cannot be transformed into linear are instead called ‘intrinsically non‐linear’.

Function Formula Transformation Linear form
Power function y = α x b y = log y
x = log x
y = log α + β x
Exponential grow model y = α e β x y = log y y = log α + β x
Logarithmic y = α + β log x x = log x y = α + β x
Hyperbolic y = x α x β y = 1 / y
x = 1 / x
y = α β x
Logit y = e α + β x 1 + e α + β x y = log ( y 1 y ) y = α + β x

Table 15.

Linearizable non‐linear functions [18, 162].

Transformation Weighting factor (*)
1 y y4
ln y y2
y2 1 4 y 2
ey 1 e 2 y
logit y 2 ( 1 y ) 2

Table 16.

Weighting factors associated with a given transformation [2, 127, 128].

(*) in units of σ 0 2 / σ y 2 ; σ 0 2 is a proportionality factor, that is, the variance of a function of unit weight [2]


Advertisement

7. The variable that has not yet been discovered: the solubility of diazepan in propylene glycol

The study of the solubility of diazepan in mixed solvents [28] requires the representation of Beer’s law of a set of data corresponding to the solubility of diazepan in propylene glycol. The experimental data are shown in Table 17.

C (mg/mL) T (min) A C (mg/mL) T (min) A
16.0760 0.00 1.799 12.8608 87.50 1.481
3.2152 6.00 0.335 12.8608 92.75 1.503
6.4304 10.50 0.700 12.8608 97.75 1.522
12.8608 21.50 1.487 16.0760 102.75 1.868
6.4304 33.50 0.670 12.8608 117.25 1.508
9.6500 39.25 1.068 9.6500 122.25 1.108
16.0760 45.75 1.840 9.6500 130.50 1.109
9.6500 50.75 1.088 9.6500 135.75 1.128
16.0760 56.75 1.842 6.4304 141.00 0.720
3,2152 67.25 0.358 6.4304 146.25 0.719
6.4304 71.75 0.703 3.2152 150.75 0.349
16.0760 77.75 1.869 3.2152 155.75 0.367
3.2152 82.50 0.345

Table 17.

Solubility of diazepan in propylene glycol (absorbance as a function of concentration and time) [28].

The relationship obtained between absorbance and concentration is (Figure 15)

Figure 15.

Top: absorbance as a function of concentration for (solubility) of diazepam. Middle: plot of the residuals as a function of the concentration. Bottom: plot of the residuals as a function of the measurement time (measurement order).

A = 0.11767 C 0.003568 E23

These data can be used to corroborate the previously made statement that the correlation coefficient is not necessarily a measure of the suitability of the model. The R2‐value of the above equation is 0.998 (r = 0.999). Many researchers would settle for this, but they would be wrong.

In spite of the high coefficient of correlation (r = 0.999), when the residuals are represented as a function of the numerical order in which the samples were measured, we obtain Figure 15 (bottom). The pattern obtained is not random by marking the residual trend with a positive slope. This behaviour is indicative of the situation in which the assumption of independence is not satisfied. The slope in a representation of the residuals as a function of the order of measure (time) indicates that a linear term must be included in the model.

Figure 16.

Top: residuals as a function of concentration for the extended model including the measurement time. Bottom: residuals as a function of time (order of measurement) for the model with the time included.

When time is included in the model, Eq. (21) results

A = 0.070193 + 0.118394 C + 0.000336936 t E24

giving rise to a value of R2 equal to 0.999.

When the residuals are calculated for this model and are plotted as a function of the concentration, a graph similar to that of Figure 15 (middle) is obtained (Figure 16, top). However, if the residuals are represented for this model as a function of time (which is reflected in the order in which the samples were measured), the resulting pattern is obtained in Figure 16 (bottom), in which it is observed that the independence of the error has been accommodated (compare Figure 15, bottom), and the fit has improved, although it could probably do so even more.

The order in the analysis time demonstrates the significant fact that a representation of the residuals allows the observation of the effect of the time that otherwise would not have been perceived. This is possible in the case of diazepam solubility because the researcher was careful to record the time to which the samples were measured.

The appearance of a pattern in residuals as a function of time in a study of Beer’s law could indicate that some contaminant is affecting, or that the light source is decaying, or perhaps that it has not yet been warmed. The pattern of the residuals indicates if there is a time‐dependent variable, but not the reason for that dependency, which must be ascertained, in its case.

Advertisement

8. Nickel by atomic absorption: all models are wrong

Nickel nitrate (II) hexahydrate reagent analysis (Merck) is used to prepare a standard solution of 1 g/L Ni. The salt of 5.0058 g is weighed into the analytical balance and brought into a 1‐L volumetric flask with ultrapure water. From this solution containing 1000 mg/L, a working solution contains 125 mg/L. Appropriate volumes of this solution (triplicates) are added to 25‐mL volumetric flasks to obtain the calibration curve, thinning with ultrapure water. The measurements are carried out in an ‘Analyst 200 Atomic Spectrometer’ operating in absorption mode with a Cu‐Fe‐Ni multi‐element Lumma lamp (Perkin Elmer), at 232 nm, with an acetylene air flame. The obtained absorbances, given below, are superior to those described in Perkin Elmer [164]. The measurements depend on the flow, for example, of the nebulizer system, different in each case.

Absorbance data (in arbitrary units) in the triplicate of aqueous solutions of Ni2+ in mg/L (ppm) are compiled in Table 18. It has been tried to adjust Eq. (2), third‐degree and fourth‐degree polynomial models (Figure 17) (left figures with mean and right values with individual values), observing that as the degree of the polynomial increases, the goodness of the adjustment increases, although the residuals detect pattern. There are no perfect models, but models more appropriate than others [165, 166]. It is possible to use rational form polynomials with the SOLVER function of Excel. Even so, the residuals show a pattern similar to that presented when a fourth‐degree polynomial is fitted to the data.

Ni2+ (ppm) Absorbance (arbitrary units) Ni2+ (ppm) Absorbance (arbitrary units)
2.5 0.217 0.207 0.226 17.5 0.743 0.742 0.744
5.0 0.399 0.396 0.389 20.0 0.767 0.767 0.771
7.5 0.523 0.513 0.519 22.5 0.787 0.786 0.789
10.0 0.618 0.615 0.612 25.0 0.808 0.813 0.807
12.5 0.672 0.664 0.664 27.5 0.820 0.821 0.824
15.0 0.713 0.715 0.707 30 0.835 0.835 0.831

Table 18.

Atomic absorption spectrophotometry calibration data of nickel(II).

Figure 17.

Atomic absorption spectrophotometry nickel(II) calibration data.

Advertisement

9. Final comments

Calibration is an essential part of every quantitative analytical method, with the exception of primary methods of analysis (isotope dilution mass spectrometry, coulometry, gravimetry, titrimetry and a group of colligative methods). The correct performance of calibration is a vital part of method development and validation. Parameter estimation models are often employed to obtain information concerning chemical systems, forming on this way a fundamental part of analytical chemistry. In those cases in which a wrong equation is fitted to data, the form of the residuals plot contains useful information which helps to modify and improve the model in order to get a better explanation of the data. Examples extracted from the literature show how residual plots reveal any violation of the assumptions severe enough as to deserve correction. As a matter of fact, some authors [12, 25, 28, 59, 96] are in favour of using residuals graphically to evaluate the inherent assumptions in the least squares method.

If there is a true linear relationship between the variables with the error symmetrically distributed, the sign of residuals should be distributed at random between plus and minus with an equal number of each. A plot of residuals allows checking for systematic deviation between data and model. Systematic deviations may indicate either a systematic error in the experiment or an incorrect or inadequate model. A curvilinear pattern in the residuals plot shows that the equation being fitted should possibly contain higher‐order terms to account for the curvature. A systematic linear trend (descending or ascending) may indicate that an additional term in the model is required. The ‘fan‐shaped’ residual pattern shows that experimental error increases with mean response (heteroscedasticity) so the constant variance assumption is inappropriate. This phenomenon may be approached by the weighted least squares method or by transforming the response. Time‐order analysis proves sometimes the more noteworthy fact that a residual plot permits the observation of a time effect that otherwise might not have become known. However, note that there are no perfect models, but models that are more suitable than others.

Many more sophisticated methods have been devised (standardized, studentized, jack‐knife, predicted and recursive residuals). However, in spite of their worth and importance they are considered beyond the scope of this chapter, devoted to a primer on residuals. The analyses presented in this chapter were mainly done using an Excel spreadsheet.

References

  1. 1. Asuero AG, Sayago A, González AG. The correlation coefficient: an overview. Crit. Rev. Anal. Chem. 2006;36(1):41–59.
  2. 2. Asuero AG, González AG. Fitting straight lines with replicated observations by linear regression. III. Weighting data. Crit. Rev. Anal. Chem. 2007;37(3):143–172.
  3. 3. Meloun M, Militký J, Kupka K, Brereton RG. The effect of influential data, model and method on the precision of univariate calibration. Talanta 2002;57(4):721–740.
  4. 4. Meloun M, Militký J, Hill M, Brereton RG. Crucial problems in regression modeling and their solutions. Analyst 2002;127(4):433–450.
  5. 5. Meloun M, Militký J. Detection of single influential points in OLS regression model building. Anal. Chim. Acta 2001;439(2):169–191.
  6. 6. Wisniak J, Polishuk A. Analysis of residuals—a useful tool for phase equilibrium data analysis. Fluid Phase Equilibr. 1999;164(1):61–82.
  7. 7. Fernandez GCJ. Residual analysis and data transformations: important tools in statistical analysis. Hortscience 1992;27(4):297–300.
  8. 8. Chatterjee S, Hadi AS. Sensitivity Analysis in Linear Regression. New York: Wiley, 1988. p. 72.
  9. 9. Chatterjee S, Hadi AS. Regression Analysis by Example. 5th ed., New York: Wiley, 2012. p. 98.
  10. 10. Draper NR, Smith H. Applied Regression Analysis. 3rd ed., New York: Wiley, 1998.
  11. 11. Asuero AG, González AG. Some observations on fitting a straight line to data. Microchem. J. 1989;40(2):216–225.
  12. 12. Thompson M. Regression methods in the comparison of accuracy. Analyst 1982;107:1169–1180.
  13. 13. Weisberg S. Applied Linear Regression. 3rd ed., New York: Wiley, 2005. p. 23.
  14. 14. Asuero AG. Calibración comparación de métodos y estimación de parámetros en el análisis químico y farmacéutico. Anal. Real Acad. Nac. Farm. 2005;71:153–173.
  15. 15. Sayago A, Boccio M, Asuero AG. Fitting straight lines with replicated observations by linear regression: The least squares postulates. Crit. Rev. Anal. Chem. 2004;34(1):39–50.
  16. 16. Shapiro SS. How to Test Normality and Other Distributional Assumptions, American Society for Quality Control. Wilwaukee, WI: ASCQ, 1990.
  17. 17. Myers RH, Montgomery DC, Anderson‐Cook CH. Response Surface Methodology. Process and Product Optimization using Designed Experiments, 3rd ed., New York: Wiley, 2009. p. 37.
  18. 18. Asuero AG, Martin JB. Fitting straight lines with replicated observations by linear regression. IV. Transforming data. Crit. Rev. Anal. Chem. 2011;41(1):36–69.
  19. 19. Sheather SJ. A Modern Approach to Regression with R. New York: Springer‐Verlag, 2009.
  20. 20. Behnken DW, Draper NR. Residuals and their variance. Technometrics 1972;11(1):101–111.
  21. 21. Cornish‐Bowden A. Analysis and interpretation of enzyme kinetics data. Perspect. Sci. 2014;1:121–125.
  22. 22. Fisher RA. The Design of Experiments. 8th ed., New York: Hafner, 1966. p. 16.
  23. 23. Laitinen HA, Harris WE. Chemical Analysis: and advanced text and reference. Chapter 26, New York: McGraw‐Hill, 1975. p. 562.
  24. 24. Miller JN, Miller JC. Statistics and Chemometrics for Analytical Chemistry. 6th ed., Harlow, England; Prentice Hall, 2010.
  25. 25. Darken PF. Evaluating assumptions for least squares analysis using the general lineal model: a guide for the pharmaceutical industry statistician. J. Biopharm. Stat. 2004;14(3):803–816.
  26. 26. NIST/SEMATECH E‐Handbook of Statistical Methods; http://www.itl.nist.gov/div898/handbook/,dat (date created 6/01/2003; updated April, 2012).
  27. 27. Huber PJ. Between Robustness and Diagnostics. In Directions in Robust Statistics and Diagnostics. Stahel W and Weisberg S. Eds., New York: Springer‐Verlag, 1991. p. 121.
  28. 28. Belloto RJ Jr, Sokoloski TD. Residual analysis in regression. Am. J. Pharm. Educ. 1985;49:295–303.
  29. 29. Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphical Methods for Data Analysis. Duxbury Press: Boston, 1983.
  30. 30. Anscombe FJ. Graphs in statistical analysis. The American Statistician 1973;27(1):17–21.
  31. 31. Sillen LG. Graphic Presentation of Equilibrium Data. In Treatise on Analytical Chemistry, Part I. Kolthoff IM and Elving DJ. Eds., vol. 1, Chapter 8, New York: Interscience, 1959.
  32. 32. Brüggemann L, Wenrich R. Application of a special in‐house validation procedure for environmental‐analytical schemes including a comparison of functions for modelling the repeatability standard deviation. Accred. Qual. Assur. 2011;16(2):89–97.
  33. 33. Espinosa‐Mansilla A, Muñoz de la Peña A, González‐Gómez D. Using univariate linear regression calibration software in the MATLAB environment. Application to chemistry laboratory practices. Chem. Educator 2005;10:337–345.
  34. 34. da Silva CP, Emidio ES, de Marchi MRR. Method validation using weighted linear regression models for quantification of UV filters in water samples. Talanta 2015;131:221–227.
  35. 35. Mermet J‐M. Calibration in atomic spectrometry: a tutorial review dealing with quality criteria, weighting procedures and possible curvatures. Spectrochim. Acta B 2010;65(7):509–523.
  36. 36. Renger B, Végh Z, Ferenczi‐Fodor K. Validation of thin layer and high performance thin layer chromatographic methods. J. Chromatogr. A. 2011;1218(19):2712–2721.
  37. 37. Sousa JA, Reynolds AM, Ribeiro AS. A comparison in the evaluation of measurement uncertainty in analytical chemistry testing between the use of quality control data and a regression analysis. Accred. Qual. Assur. 2012;17:207–214.
  38. 38. Tellinghuisen J. Simple algorithms for nonlinear calibration by the classical and standard additions methods. Analyst 2005;130(3):370–378.
  39. 39. Lindner E, Pendeley BD. A tutorial on the application of ion‐selective electrode potentiometry: an analytical method with unique qualities, unexplored opportunities an potential pitfalls: a tutorial. Anal. Chim. Acta 2013;762:1–13.
  40. 40. Baumann K. Regression and calibration techniques. Part II. Validation, weighted and robust regression. Process Contr. Qual. 1997;10(1–2):75–112.
  41. 41. Meloun M, Dluhosova Z. Precision limits and interval estimation in the calibration of 1‐hydroxypyrene in urine and hexachlorobenzene in water, applying the regression triplet procedure on chromatographic data. Anal. Bional. Chem. 2008;390(7):1899–1910.
  42. 42. Meloun M, Militky J. Statistical Data Analysis, a Practical Guide. New Delhi: Woodhead Publishing, 2011.
  43. 43. Miller JN. Outliers in experimental data and their treatment. Analyst. 1993;118(5):455–461.
  44. 44. Meloun M, Militky J, Forina M. Chemometrics for Analytical Chemistry, Volume 2. PC‐aided regression and related methods. Hertfordshire: Ellis Horwood, 1994.
  45. 45. Badertscher M, Pretsch E. Bad results from good data. Trends Anal. Chem. 2006;25(11):1131–1138.
  46. 46. Sonnergaard JM. On the misinterpretation of the correlation coefficient in pharmaceutical sciences. Int. J. Pharm. 2006;321(1–2):12–17.
  47. 47. Tellinghuisen J, Bolster Ch. Using R2 to compare least‐squares fir models: when it must fail. Chem. Intell. Lab. Syst. 2011;105:220–222.
  48. 48. Loco JV, Elkens M, Crouse C, Beernaet H. Use and misuse of the correlation coefficient. Accred. Qual. Assur. 2002;7:281–285.
  49. 49. Raposo F. Evaluation of analytical calibration based on least squares linear regression for instrumental techniques: a tutorial review. Trends Anal. Chem. 2016;77:167–185.
  50. 50. Araujo P. Key aspects of analytical method validation and linearity evaluation. J. Chromatogr. B. 2009;877(23):2224–2234.
  51. 51. Castillo MA, Castells RC. Initial evaluation of quantitative performance of chromatographic methods using replicates at multiple concentrations. J. Chromatogr. A. 2001;921(2):121–133.
  52. 52. Coleman DE, Vanatta LE. Lack‐of‐fit testing of ion chromatographic calibration curves with inexact replicates. J. Chromatogr. A. 1999;850(1–2):43–51.
  53. 53. Perez Cuadrado JA, Pujol Forn M. Validación de Métodos Analíticos, Asociación Española de Farmacéuticos de la Industria. Barcelona: AEFI, 2001.
  54. 54. de Souza SVC, Junqueira RG. A procedure to assess linearly by ordinary least squares. Anal. Chim. Acta 2005;552:25–35.
  55. 55. Akhnazarova S, Kafarov V. Experiment Optimization in Chemistry and Chemical Engineering. Moscow: Mir, 1982.
  56. 56. Danzer K. Guidelines for calibration in analytical chemistry. Part 1. Fundamentals and single component calibration. IUPAC recommendations 1998. Pure Appl. Chem. 1998;70(4):993–1014.
  57. 57. Andrade JM, Gomez‐Carracedo MP. Notes on the use of Mandel’s test to check for nonlinearity in laboratory calibration. Anal. Meth. 2013;5:1145–1149.
  58. 58. Miller JN. Calibration methods in spectroscopy II. Is it a straight line?. Spectrosc. Int. 1991;3(4):41–43.
  59. 59. Noggle JH. Practical Curve Fitting and Data Analysis, Software and Self‐Instructions for Scientists and Engineers. Englewood Cliffs, NJ: Prentice Hall, 1993.
  60. 60. Davidian M, Haaland PD. Regression and calibration with non constant error variance. Chemometr. Intell. Lab. 1990;9(3):231–248.
  61. 61. Finney DJ. Statistical Methods in Biological Assay. 3rd ed., London: Griffin & Co, 1978.
  62. 62. Emery WB, Lees KA, Tootil JPR. The assay of Vitamin B12. Part IV. The microbiological estimation with Lactobacillus leichmannii 313 by the turbidimetric procedure. Analyst 1951;76(3):141–146.
  63. 63. Kóscielniak P, Wieczorek M, Kozak J, Herman M. Generalized calibration strategy in analytical chemistry. Anal. Lett. 2011;44:411–430.
  64. 64. Komsta L. Chemometrics and statistical evaluation of calibration curves in pharmaceutical analysis—a short review on trends and recommendations. J. AOAC Int. 2012;95(3):669–672.
  65. 65. Burke S. Regression and correlation, LC‐GC Europe Online Supplement Statistical and Data Analysis; http://www.lcgceurope.com/lcgceurope/article/article.List.jsp?categoryId=935
  66. 66. Miller JN. Basic statistical methods for analytical chemistry. Part 2. Calibration and regression methods. Analyst 1991;116:3–14.
  67. 67. Kóscielniak P, Wieczorek M. Univariate analytical calibration methods and procedures: a review. Anal. Chim. Acta 2016;944:14–28.
  68. 68. Olivieri AC. Practical guidelines for reporting results in simple and multi‐component analytical calibration: a tutorial. Anal. Chim. Acta 2015;868:10–22.
  69. 69. Tellinghuisen J. Least squares in calibration: weights, nonlinearity, and other nuisances. Methods Enzymol. 2009;454:259–285.
  70. 70. ISO 8466‐1: 1990. Water quality‐Calibration and evaluation of analytical methods and estimation of performance characteristics. Part 1. Statistical evaluation of the linear calibration function. International Organization for Standardization: Geneva, 1990.
  71. 71. ISO 8466‐2.: 2001. Water quality‐Calibration and evaluation of analytical methods and estimation of performance characteristics. Part 2. Calibration strategies for non‐linear second order calibration function. International Organization for Standardization: Geneva, 2001.
  72. 72. ISO 11095: 1996. Linear Calibration using Reference Materials. International Organization for Standardization: Geneva, 1996.
  73. 73. ISO/TC 28037:2010. Determination and Use of Straight Line Calibration Functions. International Organization for Standardization: Geneva, 2010.
  74. 74. ISO/NP TS 28038: 2014. Determination and Uses of Polynomial Calibration Procedure. International Organization for Standardization: Geneva, 2014.
  75. 75. ISO 11843‐2:2000. Capability of Detection. Part 2. Methodology in the Linear Calibration Case. International Organization for Standardization: Geneva, 2000.
  76. 76. ISO 11843‐5:2008. Capability of Detection. Part 5. Methodology in the Linear and Non linear Calibration Cases. International Organization for Standardization: Geneva, 2008.
  77. 77. LGC Preparation of Calibration Curves. A Guide to Best Practice. Barwick V. (Ed.), LGC/VAM/2003/032.
  78. 78. ICH Expert Working Group. International Conference on Harmonization. Tripartite Guideline Q2A, Test on Validation of Analytical Procedures.
  79. 79. Tholen DW, Kroll M, Astles JR, Caffo AL, Hapne TM, Krouver J, Casky F. EP6‐A Evaluation of the Linearity of Quantitative Measurement Procedures: A Statistical Approach: Approved Guideline, Clinical Laboratory Standard Institute, USA: Wayne PA, 2003.
  80. 80. AOAC. Guidelines for single laboratory validation of chemical methods for dietary supplements and botanicals, 2002. Accessed 29/3/2017.https://www.aoac.org/aoac_prod_imis/AOAC_Docs/StandardsDevelopment/SLV_Guidelines_Dietary_Supplements.pdf
  81. 81. EC 2002/657, Commission Decision of 12 August 2002 implementing Council Directive 96/23/EC concerning the performance of analytical methods and the interpretation of results. Official Journal of the European Communities 17.8.2002, I. 221/8‐I.221/36.
  82. 82. Yang H, Novic SJ, LeBlond D. Testing assay linearity over a pre‐specified range. J. Biopharm. Stat. 2015;25(2):339–350.
  83. 83. Michalowska‐Kazcmarcyk A, Asuero AG, Martin J, Alonso E, Jurado JM, Michalowski T. A uniform nonlinearity criteria for rational functions applied to calibration curve and standard addition methods. Talanta 2014;130:307–314.
  84. 84. Novick SJ, Yang H. Directly testing the linearity assumption for assay validation. J. Chemometr. 2013;27:117–125.
  85. 85. Sanagi MM, Nasir Z, Ling SLL, Hermawan D, Ibrahim WAW, Naim AA. A practical approach for linearity assessment of calibration curves under the International Union of Pure and Applied Chemistry (IUPAC) guidelines for an inn‐house validation of method of analysis. J. AOAC Int. 2010;93(4):1322–1330.
  86. 86. Liu J‐p, Chow S‐C, Hsieh T‐C. Deviations from linearity in assay validation. J. Chemomet. 2009;23:487–494.
  87. 87. Hsieh E, Hsiao C‐F, Liu J‐P. Statistical methods for evaluating the linearity in assay validation. J. Chemomet. 2009;23:56–63.
  88. 88. Hsieh E, Liu JP. On statistical evaluation of the linearity in assay validation. J. Biopharm. Stat. 2008;18(4):677–690.
  89. 89. Brüggemann L, Quapp W, Wenrich R. Test for non‐linearity concerning linear calibrated chemical measurements. Accred. Qual. Assur. 2006;11:625–631.
  90. 90. Mark H. Application of an improved procedure for testing the linearity of analytical methods to pharmaceutical analysis. J. Pharm. Biom. Anal. 2003;33:7–20.
  91. 91. Kuttatharmmakul S, Masart L, Smeyer‐Verbeke J. Influence of precision, sample size and design on the beta error of linearity tests. J. Anal. Atom. Spectro. 1998;13:109–118.
  92. 92. Karvlczak M, Mickiewicz A. Why calculate, when to use and how to understand curvature measurements of non linearity. Curr Sep 1995;14(1):10–16.
  93. 93. Féménias J‐L. Goodness of fit: analysis of residuals. J. Mol. Spectrosc. 2003;217:32–42.
  94. 94. Kuzmic P, Lorenz T, Reinstein J. Analysis of residuals from enzyme kinetic and protein folding experiments in the presence of correlated experimental noise. Anal. Biochem. 2009;395:1–7.
  95. 95. Brown S, Muhamad N, Pedley KV, Simcock DC. What happen when the wrong equation is fitted to data?. Int. J. Emerg. Sci. 2012;2(4):133–142.
  96. 96. Ellis KJ, Duggleby RG. What happens when data are fitted to the wrong equation?. Biochem. J. 1978;171(3):513–517.
  97. 97. Straume M, Johnson ML. Analysis of residuals: criteria for determining goodness of fit. Methods Enzymol. 1992;210:87–105.
  98. 98. Bates DM and Watt DG. Nonlinear Regression Analysis and Its Applications. 2nd ed., New York: Wiley, 2007. p. 1.
  99. 99. Bonate PL. Chromatographic calibration revisited. J. Chromatogr. Sci. 1990;28(11):559–562.
  100. 100. Lavagnini I, Magno F. A statistical overview on univariate calibration, inverse regression, and detection limits: application to gas chromatography/mass spectrometry technique. Mass Spectrom. Rev. 2007;26(1):1–18.
  101. 101. Mermet J‐M. Quality of calibration in inductively coupled plasma atomic emission spectrometry. Spectrochim. Acta B 1994;49(12‐14):1313–1324.
  102. 102. Schwartz LM. Calibration curves with non uniform variance. Anal. Chem. 1979;51(6):723–727.
  103. 103. Schwartz LM. Nonlinear calibration. Anal. Chem. 1977;49(13):2062–2068.
  104. 104. Tan A, Awaiye K, Trabelsi F. Impact of calibrator concentrations and their distribution on accuracy of quadratic regression for liquid chromatography‐mass spectrometry bioanalysis. Anal. Chim. Acta 2014;815:33–41.
  105. 105. Asnin LD. Peak measurement and calibration in chromatographic analysis. Trends Anal. Chem. 2016;81:51–62.
  106. 106. Findlay JWA, Dillard RF. Appropriate calibration curve fitting in ligand binding bioassays. APPS 2007;9(2):Article 29; http://www.aapsj.org
  107. 107. Kleijbur MR, Pijners FW. Calibration graphs in atomic absorption spectrophotometry. Analyst 1985;110:147–150.
  108. 108. Yuang L, Ji QC. Automation in new frontiers of bioanalysis: a key for quantity and efficiency. Bioanalysis 2012;4(23):2759–2762.
  109. 109. Lavagnini I, Magno F, Seraglia R, Traldi P. Quantitative Applications of Mass Spectrometry. New York: Wiley, 2006.
  110. 110. van Loco J, Hanot V, Huysmans G, Elkens M, Degrood JM, Beernert H. Estimation of the minimum detectable value for the determination of PCBs in fatty food samples by GC‐ECD: a curvilinear calibration. Anal. Chim. Acta 2003;483:413–418.
  111. 111. Yuan L, Zhang D, Jemal M, Aubri A‐F. Systematic evaluation of the root cause of non‐linearity in liquid chromatography/tandem mass spectrometry bioanalytical assays and strategy to predict and extend the linear standard curve. Rapid Commun. Mass Spectrom. 2012;26:1465–1474.
  112. 112. Rawski R, Sanecki PT, Kijowska KM, Skital PM, Saletnik DE. Regression analysis in analytical chemistry. Determination and validation of linear and quadratic regression dependences. S. Afr. J. Chem. 2016;69:166–173.
  113. 113. Bouklouze A, Kharbah M, Cherrah Y, Heyden YV. Azithromycin assay in drug formulations: validation of a HPTLC method with a quadratic polynomial calibration model using the accuracy profile approach. Ann. Pharm. 2016;75(2):112-120.
  114. 114. Zareba M, Sanecki PT, Rawski R. Simultaneous determination of thimerosal and aluminium in vaccines and pharmaceuticals with the use of HPLC method. Acta Chromatogr. 2016;28(3):299–311.
  115. 115. Frisbie SH, Mitchell EJ, Sikora KR, Abualrub MS, Abosalem Y. Using polynomial regression to objectively test the fit of calibration curves in analytical chemistry. Int. J. Appl. Math. Theor. Phys. 2015;1(2):14–18.
  116. 116. Kiser M, Dolan JW. Selecting the best curve fit. LC‐GC Europe 2004;17(3):138–143.
  117. 117. Zscheppank C, Telgheder U, Molt K. Stir‐bar sorptive extraction and TDS‐IMS for the detection of pesticides in aqueous samples. Int. J. Ion Mob. Spectrom. 2012;15(4):257–264.
  118. 118. de Levie R. Collinearity in linear least squares. J. Chem. Educ. 2012;89:68–78.
  119. 119. Stewart GW. Collinearity and least squares. Stat. Sci. 1987;2:68–100.
  120. 120. Mandel J. The regression analysis of collinear data. J. Res. Nat. Bur. Stand. 1985;90:465–477
  121. 121. Bayne CK, Rubin IB. Practical Experimental Design Methods for Chemists. Deerfield Beach, FL: VCH, 1986. pp. 31–32.
  122. 122. Blanco M, Cerda V. Temas Avanzados de Quimiometría, Universitat de les Illes Balears: Palma, 2007.
  123. 123. da Silva RJN, Camoes MF. The quality of standards in least squares calibrations. Anal. Lett. 2010;43(7–8):1257–1266.
  124. 124. de Beer JO, Naert C, Deconinck E. The quality coefficient as performance assessment parameter of straight line calibration curves in relationship with the number of calibration points. Accred. Qual. Assur. 2012;17(3):265–274.
  125. 125. Chatterjee S, Hadi AS. Influential observations, high leverage points, and outliers in linear regression. Statist. Sci. 1986;1(3):379–393.
  126. 126. Cook RD, Weisberg S. An Introduction to Regression Diagnostics. New York: Wiley, 1994. p. 172.
  127. 127. de Levie R. When, why, and how to use weighted least squares. J. Chem. Educ. 1986;63(1):10–15.
  128. 128. de Levie R. Advanced Excel for Scientific Data Analysis. 3rd ed., Brunswick, Maine: Atlantic Academy, 2012.
  129. 129. de Levie R. Curve fitting with least squares. Crit. Rev. Anal. Chem. 2000;30(1):59–74.
  130. 130. Box GEP, Draper NR. Empirical Model Building and Response Surfaces. New York: Wiley, 1987.
  131. 131. Box GEP, Hunter JS, Hunter WG. Statistics for Experimenters. 2nd ed., New York: Wiley, 2005.
  132. 132. Sayago A, Asuero AG. Fitting straight lines with replicated observations by linear regression: Part II. Testing for homogeneity of variances. Crit. Rev. Anal. Chem. 2004;34(2):133–146.
  133. 133. Zorn ME, Gibson RD, Sonzogni WC. Weighted least squares approach to calculating limits of detection and quantification by modeling variability as a function of concentration. Anal. Chem. 1997;69(15):3069–3075.
  134. 134. Hibbert DB. The uncertainty of a result from a linear calibration. Analyst 2006;131(12):1273–1278.
  135. 135. Penninckx W, Harmann DL, Massart DL, Smeyers‐Verbeke J. Validation of the calibration procedure in atomic absorption spectrometric methods. J. Anal. Atom. Spectr. 1996;11(4):237–246.
  136. 136. Taylor PDP, Schutyser P. Weighted linear regression applied in inductively coupled plasma‐atomic emission spectrometry –a review of the statistical considerations involved. Spectrochim. Acta B 1986;41(10):1055–1061.
  137. 137. Szabo GK, Browne JK, Ajami A, Josephs EG. Alternative to least squares linear regression analysis for the computation of standard curves for quantitation by high performance liquid chromatography: application to clinical pharmacology. J. Clin. Pharmacol. 1994;34(3):242–249.
  138. 138. Sadler WA, Smith MH, Dedge HM. A method for the direct estimation of imprecision profiles, with reference to immunoassay data. Clin. Chem. 1988;34(6):1058–1061.
  139. 139. González AG, Herrador MA. A practical guide to analytical method validation, including measurement uncertainty and accuracy profiles. Trends Anal. Chem. 2007;26(3): 227–238.
  140. 140. Hwang L‐J. Impact of variance function estimation in regression and calibration. Methods Enzymol. 1994;240:150–170.
  141. 141. Thompson M. Variation of precision with concentration in an analytical system. Analyst 1988;113(10):1579–1587.
  142. 142. Zeng QC, Zhang E, Dong H, Tellinghuisen J. Weighted least squares in calibration: estimating data variance functions in high performance liquid chromatography. J. Chromatogr. A 2008;1206(2):147–152.
  143. 143. ISO 5725:5:1998. Accuracy (trueness) and precision of measurement methods and results. Part 5. Alternative methods for the determination of the precision of a standard measurement method, International Organization for Standardization, Geneva, 1998.
  144. 144. MacTaggart DL, Farwell SO. Analytical use of linear regression. Part I. Regression procedures for calibration and quantitation. J. AOAC Int. 1992;75(4):594–607.
  145. 145. ISO 9169:2006. Air quality. Definition and Determination of Performance Characteristics of an Automatic Measuring System. International Organization for Standardization: Geneva, 2006.
  146. 146. ISO 13752:1998. Air quality. Assessment of Uncertainty of a Measurement Method under Field Conditions using a Second Method as Reference. International Organization for Standardization: Geneva, 1998.
  147. 147. Currie LA. Detection and quantification limits: origins and historical overview. Anal. Chim. Acta 1999;391:127–134.
  148. 148. Desiminoni E, Brunetti B. About estimating the limit of detection of heteroscedastic analytical systems. Anal. Chim. Acta 2009;655:30–37.
  149. 149. Ketkar SN, Bzik TJ. Calibration of analytical instruments. Impact of nonconstant variance in calibration data. Anal. Chem. 2000;72(19):4762–4765.
  150. 150. Nascimiento R, Froes RES, e Silva NOC, Naveira RLP, Mendes DBC, Neto WB, Silva JBB. Comparison between ordinary least squares regression and weighted least squares regression in the calibration of metals present in human milk determined by ICP‐OES. Talanta 2010;80(3):1102–1109.
  151. 151. Korany MA, Maher HM, Galal SM, Ragab AA. Comparative study of some robust statistical methods: weighted, parametric, and nonparametric linear regression of HPLC convoluted peak responses using internal standard methods in drug bioavailability studies. Anal. Bioanal. Chem. 2013;405(14):4835–4848.
  152. 152. Brasil B, da Silva RJNV, Camoes FGFC, Salgueiro PAS. Weighted calibration with reduced number of signals by weighing factor modeling: application to the identification of explosives by ion chromatography. Anal. Chim. Acta 2013;804:187–295.
  153. 153. Lavagnini I, Urbani A, Magno F. Overall calibration procedure via a statistically based matrix‐comprehensive approach in the stir bar sorptive extraction‐thermal desorption‐gas chromatography‐mass spectrometry analysis of pesticide residues in fruit‐based soft drinks. Talanta 2011;83:1754–1762.
  154. 154. Jain RB. Comparison of three weighting schemes in weighted schemes in weighted regression analysis for use in a chemistry laboratory. Clin. Chim. Acta 2010;411:270–279.
  155. 155. AMC, Why are we weighting, Analytical Methods Committee, AMCTB No 27, June 2007.
  156. 156. Zenf QC, Zhang E, Tellinghuisen J. Univariate calibration by reversed regression of heteroscedastic data: a case study. Analyst 2008;33:1649–1655.
  157. 157. Tellinghuisen J. Weighted least‐squares in calibration: what difference does it make?. Analyst 2007;132:536–543.
  158. 158. Cook RD, Weisberg S. Applied Regression Including Computer and Graphic. New York: Wiley, 1999.
  159. 159. Altman G. Practical Statistics for Medical Research. Boca Raton, FL: Chapman & Hall, 1991. p. 145.
  160. 160. Acton FS. Analysis of Straight Line Data. New York: Wiley, 1959.
  161. 161. Mager PP. Design Statistics in Pharmacochemistry. New York: Wiley, 1991.
  162. 162. Tomassone R, Lesquoy E, Millier C. La Régression: nouveaux regards su une ancienne méthode statistique. Paris: Masson, 1983.
  163. 163. Daniel C, Wood FS. Fitting Equations to Data: Computer Analysis of Multifactor Data. 2nd ed., New York: Wiley, 1999.
  164. 164. Perkin Elmer. Analytical Methods for Atomic Absorption Spectrometry, The Perkin Elmer Corporation, 1996. Accessed 29/3/2017. http://eecelabs.seas.wustl.edu/files/Flame%20AA%20Operating%20Manual.pdf
  165. 165. Box GEP. Science and Statistics. J. Am. Stat. Assoc. 1976;71:791–796.
  166. 166. Cook RD, Weisberg S. Residuals and Influence in Regression. New York - London: Chapman & Hall, 1982.

Written By

Julia Martin, David Daffos Ruiz de Adana and Agustin G. Asuero

Submitted: 09 December 2016 Reviewed: 21 February 2017 Published: 05 July 2017