Formulae for calculating statistics for weighted linear regression (WLR).

## Abstract

The aim of this chapter is to show checking the underlying assumptions (the errors are independent, have a zero mean, a constant variance and follows a normal distribution) in a regression analysis, mainly fitting a straight‐line model to experimental data, via the residual plots. Residuals play an essential role in regression diagnostics; no analysis is being complete without a thorough examination of residuals. The residuals should show a trend that tends to confirm the assumptions made in performing the regression analysis, or failing them should not show a tendency that denies them. Although there are numerical statistical means of verifying observed discrepancies, statisticians often prefer a visual examination of residual graphs as a more informative and certainly more convenient methodology. When dealing with small samples, the use of the graphic techniques can be very useful. Several examples taken from scientific journals and monographs are selected dealing with linearity, calibration, heteroscedastic data, errors in the model, transforming data, time‐order analysis and non‐linear calibration curves.

### Keywords

- least squares method
- residual analysis
- weighting
- transforming data

“Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity”.

(Box, G.E.P. Science and Statistics, J. Am. Stat. Ass. 1976, 71, 791–796)

## 1. Introduction

The purpose of this chapter is to provide an overview of checking the underlying assumptions (errors normally distributed with zero mean and constant variance (*σi*^{2}), being independent one of each other) in a regression analysis, via the use of basic residual plot, such as plots of residuals versus the independent variable *x*. Compact formulae for the weighted least squares calculation of the *a*_{0} (intercept) and *a*_{1} (slope) parameters and their standard errors [1, 2] are shown in Table 1. The similarity with simple linear regression is obvious, simply making the weighting factors *wi* = 1. A number of selected examples taken from scientific journals and monographs are subject of study in this chapter. No rigorous mathematical treatment will be given to this interesting topic. Emphasis is mainly placed on a visual examination of residuals to check for the model adequacy [3–7] in regression analysis. The role of residuals in regression diagnostics is vital, being necessary with their thorough examination to consider an analysis as complete [8–10].

Equation: | Slope: |

Weights: | Intercept: |

Explained sum of squares: | Weighted residuals: |

Residual sum of squares: | Correlation coefficient: |

Mean: Sum of squares about the mean: | Standard errors: |

The residuals are geometrically the distances calculated in the *y*‐direction [1, 2, 11, 12] (vertical distances) between the points and the regression line (error free in the independent variable)

The calculated regression line

corresponds to the model

where *εi* is the random error (*a*_{0} and *a*_{1}, Eq. (2), are the estimates of the true values *α*_{0} and *α*_{1}), leading to a sum of squares of the residuals minimum

Note that the model error is given by

where *E*(*yi*) is the expected value of *yi* [6]. Thus, the residuals *ri* may be viewed as the differences between what is really observed, and what is predicted by the model equation (i.e. the amount that the regression equation is not able to explain). The residuals *ri* may be thought as the observed errors [10] in correct models. The residuals reveal the existing asymmetry [13] in the functions of the response and the independent variable in regression problems. A number of assumptions concerning the errors [14, 15] have to be made when performing a regression analysis, for example, normality, independence, zero mean and constant variance (homoscedasticity property).

An assumption that the errors are normally distributed is not required to obtain the parameter estimates by the least squares method. However, for inferences and estimates (standard errors, *t*‐ and *F*‐test, confidence intervals) to be made about regression, it is necessary to assume that the errors are normally distributed [11]. The assumption of normality, nevertheless, is plausible as in many real situations errors tend to be normally distributed due to the central limit theorem. The assumption that no residual term is correlated with another, combined with the normality assumption, means [10] the errors are also independent. Constructing a normal probability plot of the residuals [16–18] is a way to verify the assumption of normality. Residuals are ordered and plotted against the corresponding percentage points from the standardized normal distribution (normal quantities). If the residuals are then situated along a straight line, the assumption of normality is satisfied.

A standardized residual is the residual divided by the standard deviation of the regression line

The standardized residuals are normally distributed with a mean value of zero and (approximately) unity variance [10, 19]

The *hii* term may be regarded as measuring the leverage of the data point (*xi*,*yi*) (see below). The estimated residuals are correlated [10], but this correlation does not invalidate the residual plot when the number of points is large in comparison with the number of parameters estimated by the regression. As pointed out by Behnken and Draper [20]: ‘*In many situations little is lost by failing to take into account the differences in variances’*. Standardized residuals are useful in looking for outliers. They should have random values, the 95% falling between −2 and 2 for normal distribution.

The tendencies followed by the residuals should confirm the assumptions we have previously made [21], or at least do not deny them. Remember the sentence of Fischer [22, 23]: ‘*Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis’*. Conformity assays of the assumptions inherent to regression fall mainly in the examination of residual pattern. Although there are statistical ways of numerically measuring some of the observed discrepancies [24], graphical methods play an important role in data analysis (Table 2) [25–31]. A quick examination of residuals often allows obtaining more information than significance statistical tests of some limited null hypothesis. Nevertheless, objective, unambiguous determination should be based on standard statistical methodology. This chapter is mainly focused on residual plots rather than on formulas, or hypothesis testing. As we will see in the selected examples, the plots easily reveal violations of the assumptions if they are severe enough as to warrant any correction.

Sentence | Author(s)/reference |
---|---|

‘Most assumptions required for the least squares analysis of data using the general linear model can be judged using residuals graphically without the need for formal testing’ | Darken [25] |

‘Graphical methods have an advantage over numerical methods for model validation because they readily illustrate a broad range of complex aspects of the relationship between the model and the data’. | NIST/SEMATECH [26] |

‘There is no single statistical tool that is a powerful as a well‐chosen graph’ | Huber [27] |

‘Although there are statistical ways of numerically measuring some of the observed discrepancies, statistician themselves prefer a visual information of the residual plots as being more informative and certainly more convenient’ | Belloto and Sokoloski [28] |

‘Eye‐balling can give diagnostic insights no formal diagnostic will ever provide’ | Chambers et al. [29] |

‘Graphs are essential to good statistical analysis’ | Anscombe [30] |

‘One picture says more than a thousand equations’ | Sillen [31] |

The main forms of representation [10] of residuals are (i) global; (ii) in temporal sequence, if its order is known; (iii) faced to the adjusted values, *y*‐hat; (iv) facing the independent variable *xji* for *j* = 1,2… *k*; and (v) in any way that is sensitive to the problem subject of analysis.

The following points can be verified in the representation of the residuals: (i) the form of the representation, (ii) the number of positive and negative residuals should be equivalent of vanishing median, (iii) the sequence of residual signs must be randomly distributed between + and −, and (iv) it is possible to detect spurious results (outliers); their magnitudes are greater than the rest of the residuals.

Residual plots appear more and more frequently [32–39] in papers published in analytical journals. In general, these plots as well as those discussed in this chapter are very basic and can undergo some criticism. For example, the residuals are not totally distributed independent of *x*, since [10, 19] the substitution of the estimates by the parameters introduces some dependence. However, more sophisticated methods have been developed [40–44] based on standardized, studentized, jack‐knife, predictive, recursive residuals, and so on (Table 3). In spite of their importance, they are considered beyond the scope of this contribution.

Symbol | Name | Formula | Comments |
---|---|---|---|

Classical residuals | |||

Normalized residuals | |||

Standardized residuals | Identification of heteroscedasticity | ||

Jack‐knife residuals | Identification of outliers | ||

Predicted residuals | |||

Recursive residuals | Identification of autocorrelations |

Despite the frequency with which the correlation coefficient is referred to in the scientific literature as a criterion of linearity, this assertion is not free from reservations [1, 45–49] as evidenced several times throughout this chapter.

The study of linearity not only implies a graphic representation of the data. It is also necessary to carry out a statistical check, for example, the analysis of the variance [50–54], which requires repeated measurements. This implies the fulfilment of two requirements: the homogeneity (homoscedasticity) of the variances and the normality of the residuals. Incorporating replicates to the calibration estimation offers a possibility to look at the calibration not only in the context of fitting but also of the uncertainty of measurements [15]. However, if replicate measurements are not made, and an estimate of the mean square error (replication variance) is not available, the regression variance

may be compared with the estimated variance around the mean of the *yi* values [55]

by means of an *F*‐test. The goodness of fit of non‐linear calibration curves is improved by raising the degree of the fitting polynomial, performing then an *F*‐test (quotient of the residual variance for the *k*th to *k* + first‐polynomial degree) [56]. A suitable test can also be carried out according to Mandel [56, 57]. However, this contribution is essentially devoted to the use of basic graphical plots of residuals, a simple and at the same time a powerful diagnostic tool, as we will have the occasion to claim through this chapter.

Several examples taken from scientific journals and monographs are selected in order to illustrate this chapter: (1) linearity calibration methods: fluorescence data [58] as an example; (2) calibration graphs: the question of intercept [59] or non‐intercept; (3) errors are not in the data, but in the model: the CO_{2} vapour pressure [59] versus temperature dependence; (4) the heteroscedastic data: high‐performance liquid chromatography (HPLC) calibration assay [60] of a drug; (5) transforming data: preliminary investigation of a dose‐response relationship [61, 62]; the microbiological assay of vitamin B_{12}; (6) the variable that has not yet been discovered; the solubility of diazepan [28] in propylene glycol; and (7) no models perfect: nickel(II) by atomic absorption spectrophotometry.

## 2. Linearity in calibration methods: fluorescence data as example

Calibration is a crucial step, an essential part, the key element, the soul of every quantitative analytical method [38, 40, 63–69], and influences significantly the accuracy of the analytical determination. Calibration is an operation that usually relates an output quantity to an input quantity for a measuring system under given condition (The International Union of Pure and Applied Chemistry (IUPAC)). The topic has been the subject of a recent review [67] focused on purely practical aspects and obviating the mathematical and metrological aspects. The main role of calibration is transforming the intensity of the measured signal into the analyte concentration in a way as accurate and precise as possible. Guidelines for calibration and linearity are shown in Table 4 [70–81].

Scientific Association or Agency | Reference |
---|---|

Calibration | |

International Union of Pure and Applied Chemistry (IUPAC) | Guidelines for calibration in analytical chemistry [70] |

International Organization for Standardization (ISO) | ISO 8466‐1:1990 [70]; ISO 8466‐2:2001 [71]; ISO 11095:1996 [72]; ISO 28037:2010 [73]; ISO 28038:2014 [74]; ISO 11843‐2: 2000 [75]; ISO 11843‐5: 2008 [76] |

LGC Standards Proficiency Testing | LGC/VAM/2003/032 [77] |

Linearity | |

International Conference on Harmonization (ICH) | Guideline Q2A [78] |

Clinical Laboratory Standard Institute (CLSI) | EP6‐A [79] |

Association of Official Analytical Chemists (AOAC) | AOAC Guidelines 2002 [80] |

European Union | EC 2002/657 [81] |

Linearity is the basis of many analytical procedures. It has been defined as [78] the ability (within a certain range) to obtain test results that are directly proportional to the concentration (amount) of analyte in the sample. Linearity is one of the most important characteristics for the evaluation of accuracy and precision in assay validation, and as seldom is the case where a calibration curve is perfectly linear, it is crucial to access linearity during method validation. Such evaluation is also recommended in regulatory guidelines [78–81]. Although it may seem that everything has been said on the subject of linearity, it is still an open question and subject to debate. It is therefore not surprising that some proposals are made from time to time to resolve this issue [54, 82–92].

However, in calibration, statistical linearity tests between the variables are rarely performed in analytical studies. When dealing with regression models, the most convenient way of testing linearity beside a visual assessment is plotting the residual sequence in the concentration domain. A simple nonparametric statistical test for linearity, known as ‘the sign test’ [9, 16, 28], is based on the examination of the residuals (*ri*) sign sequence.

The residuals should be distributed in a random way. That is, the number of plus and minus residuals sign should be equal with the error symmetrically distributed (null hypothesis for the assay) when the variables are connected through a true linear relationship. The probability to get a random residual signs pattern is related to the number of runs in the sequence of signs. Intuitively and roughly speaking, the more these changes are randomly distributed [93] the best is the fit. A run is a sequence of the same sign with independence of its length. A pattern of residual signs of the kind [+ ‐ ‐ + + ‐ + ‐ + ‐ +], from independent measurements, is considered as random, whereas a pattern like this [‐ ‐ ‐ + + + + + + ‐ ‐] is not. Though a formal statistical test may carry out [94] with the information afforded by the residual plot, it is necessary a number of points greater than is usual in calibrate measurements.

The fluorescence in arbitrary units of a series of standards is shown in Table 5. To these data that appear to be curved, a straight line may be fitted (Figure 1, top) which results in an evident lack of fit, though the correlation coefficient (*R*) of the line is equal to 0.995 2. A plot of the resulting residuals *ri* against the *x*‐values (reduced residuals on the secondary axis) is also shown in Figure 1 (top), and allows checking for systematic deviations between data and model.

Concentration (μM) | Fluorescence (arbitrary units) | Concentration (μM) | Fluorescence (arbitrary units) |
---|---|---|---|

0 | 0.2 | 6 | 20.4 |

1 | 3.6 | 7 | 22.7 |

2 | 7.5 | 8 | 25.9 |

3 | 11.5 | 9 | 27.6 |

4 | 15 | 10 | 30.2 |

5 | 17 |

The pattern of the sign of the residuals indicates that fitting the fluorescence data by a straight‐line equation is inadequate, higher‐order terms should possibly be added to account for the curvature. Note that the straight‐line model is not adequate even though the reduced residuals are less than 1.5 in all cases. When an erroneous equation is fitted to the data [95–97], the information contained in the form of the residual plots is a valuable tool, which indicates how the model equation must be modified to describe the data in a better way. A curved calibration line may be fitted to a power series. The use of a quadratic (second‐degree) equation is enough in this case to obtain a good fit: the scattering of the residuals above and below the zero line is similar, as shown in Figure 1 (middle). Then, when no obvious trends in the residuals are apparent, the model may be considered to be an adequate description of the data. The simplest model or the model with the minimum number of parameters that adequately fit the data in question is usually the best choice. ‘*Non sunt multiplicanda entia praeter necessitatem’* (Occam’s razor) [98]. In fact, the order of the polynomial (*k*) must not rise above 2 since [*sy*_{/x}]*k* = 2 = 0.3994 < [*sy*_{/x}]*k* = 3 = 0.4142.

In summary, when it is assumed a correct relationship between the response and the independent variable (s), the residual plot should resemble that of Figure 2 (left). All residuals should fall into the gravelled area, with a non‐discernable pattern, that is, random. If the representation of the residuals resembles that of Figure 2 (right), where curvature is appreciated, the model can probably be improved by adding a quadratic term or higher‐order terms, which should better describe the model with the required curvature.

Calibration curves with a non‐linear shape also appear in analytical chemistry [99–104]. When the data in the *x*‐range (calibration) vary greatly as it does in many real problems, the response becomes non‐linear (Table 6) [101, 105–107] at sufficiently large *x*‐values. The linear range of liquid chromatography‐tandem mass spectrometry (LC‐MS/MS) is typically about three orders of magnitude. The analyst’s usual response in this case is to restrict sometimes the concentration range with the purpose of using a linear response, thus introducing biases in the determination, since the choice of the ‘linear region’ is usually done in an arbitrary way. The use of a wider range in standard curve is preferred in order to avoid the sample dilution, saving time and labour [108]. An acceptable approach to extend the dynamic range of the standard curve is the use of quadratic regression. Among the possible causes of standard curve non‐linearity are saturation at high concentration during ionization, the formation of dimer/multimer/cluster ions, or detector saturation. It has been established that when the analyte concentration is above ∼10^{−5} M, its response starts to saturate providing non‐linear response.

Response function | Name | Technique |
---|---|---|

Beer law | Absorption spectrophotometry | |

Nernst equation | Electrochemistry | |

Scheibe‐Lomakin | Emission spectroscopy ESI‐MS; ELSD; CAD | |

TLC‐densitometry“ | ||

DAD | ||

ESI‐MS | ||

Wagenaar et al. | Atomic absorption spectrophotometry, liquid chromatography/MS/MS | |

CAD | ||

Ion‐Trap‐MS | ||

Andrews et al. | Atomic absorption spectrophotometry | |

Rodbard | Radioimmunoassay |

Quadratic curve‐fitting calibration data are more appropriate [104, 109–117] than straight‐line linear regression, in the case of some quantification methods. Matrix‐related non‐linearity is typical of methods such as LC‐MS/MS. In order to provide an appropriate validation strategy for such cases, the straight‐line fit approximation has been extended to quadratic calibration functions. When such quadratic terms are included [10, 118–120], precautions should be taken because of the consequent multicollinearity problems.

However, the use of quadratic regression model is considered as less appropriate or even viewed with suspicion by some regulatory agencies and, as a result, not often used in regulated analysis. In addition, the accuracy around the upper limit of quantitation (ULOQ) can be affected if the curve range is extended to the point where the top of the curve is flat.

Statistical tests may also be considered for providing linearity, like Mandel’s test [57] for comparison errors of residuals of quadratic and linear regression by means of an *F*‐test at a determined significance level, or like lack‐of‐fit test by analysis of variance (ANOVA) or testing homoscedasticity (the homogeneity of variance residuals).

## 3. Calibration graphs: the question of intercept or non‐intercept

Absorption spectrometry is an important analytical technique, and, to be efficient, the calibration must be accomplished with known samples. Data for the calibration of an iron analysis, in which the iron is complexed with thiocyanate, are shown in Table 7. The absorption of the iron complex is measured and depicted versus iron concentration in ppm. The standard deviation of the regression line, *sy*_{/x}, obtained from the experimental data, quantifies the quality of fit and the accuracy of the model to predict the values of *y* (the measured quantity), for a given value of *x* (the independent variable).

Concentration, Fe (ppm) | Absorbance | Concentration, Fe (ppm) | Absorbance |
---|---|---|---|

0.3644 | 0.0268 | 3.644 | 0.248 |

0.7288 | 0.0506 | 7.288 | 0.495 |

1.083 | 0.0783 | 32.8 | 1.52 |

The regression line is first computed by forcing it to pass through the coordinate origin (*a*_{0} = 0), since the absorbance should be directly proportional to the concentration (at zero concentration, a zero absorbance might be expected). However, the adjustment thus obtained is not very good. The representation of residuals shows the pattern of signs + + + + + ‐ (Figure 3, lower left). If we compute the regression line with intercept (Figure 3, upper right), the correlation coefficient increases from 0.990 8 to 0.994 7, but the pattern of non‐random signs persists, that is, ‐ ‐ ‐ + + ‐ (Figure 3, lower right). What is a reasonable explanation? If the highest concentration point (32.8 ppm) is discarded, all the other points appear to fall on a straight line. However, this point cannot be discarded on the basis of its deviation from the best‐fit line, because it is closer to the line than other calibration points in the series. As a matter of fact, the last point (32.8 ppm) defines where the line must pass: being so distant, it has a great influence (leverage) and forces the least squares regression line in its direction. The non‐robustness behaviour is a general weakness of the least squares criterion. Very bad points [59] have a great influence just because their deviation from the true line, which rises to the square, is very large. One has to be aware of dubious data by correcting obvious copying errors and adopting actions coherent with the identified causes. Improper recording of data (e.g. misreading responses or interchange of digits) is frequently a major component of the experimental error [121].

A few high or low points [8] can alter the value of the correlation coefficient in a great extension. Larger deviations present at larger concentrations tend to influence (weight) the regression line more than smaller deviations associated with smaller concentrations, and thus the accuracy in the lower end of the range is impaired. It is therefore very convenient [122–124] to analyse the plotted data and to make sure that they cover uniformly (approximately equally spaced) the entire range of signal response from the instrument (85). Data should be measured at random (to avoid confusing non‐linearity with drift). The individual solutions should be prepared from the same stock solution, thus avoiding the introduction of random errors from weighing small quantities from individual standards. Depending on the location of the outliers, the correlation coefficient may increase or decrease. In fact, a strategically situated point can make the correlation coefficient varies practically between −1 and +1 (Figure 4), so precautions should be taken when interpreting its value. However, points of influence (e.g. leverage points and outliers) (Table 8) are rejected only when there is an obvious reason for their anomalous [125] behaviour. The effect of outliers is greater as the sample size decreases. Duplicate measurements, careful scrutiny of the data while collecting and testing discrepant results with available samples may aid to solve problems [28] with outliers.

Gross errors | Caused by outliers in the measured variable or by the leverage points extremes |

Golden points | Special chosen points which have been very precisely measured to extend the prediction capability of the system |

Latently influential points | Consequence of a poor regression model |

According to data location | |

Outliers | Differs from the other points in values of the y‐axis |

Leverage points | Differ from the other points in values of the x‐axis or in a combination of these quantities (in the case of multicollinearity) |

If the regression analysis is made without the 32.8 ppm influence point forcing to pass through the origin, the correlation coefficient reaches the 0.999 91 value (Figure 5, top left). This point was not considered because a high deviation standard values were above the new line. Perhaps, the problem observed with the 32.8 ppm point is due to the fact that sulphocyanide (thiocyanate) is not in enough excess to complex all the iron present. However, the inspection of residuals (+ + + + ‐) shows systematic, non‐random deviations (Figure 5, bottom left), which may indicate an incorrect or inadequate model. Systematic errors of analysis translate into (systematic) deviations from the fit equation (negative residuals correspond to low estimated values, and positive residuals to high). An erroneous omission of the intercept term in the model may be the cause of this effect. The standard deviation of the regression line improves notably, from 0.0026 to 0.0017, when the intercept is introduced (Figure 5, top right) in the model (correlation coefficient equals 0.999 97), the residual pattern being now random (‐ ‐ + ‐ +) (Figure 5, bottom right). The calibration is then appropriate and linear, at least up to 8 ppm. However, the intercept value, 0.0027, is of the same order of magnitude as the standard deviation, *sy*_{/x}, of the regression line. A calibration problem (of minor order) may be apparent, for example, the spectrophotometer was not properly set to zero or the cuvettes were not conveniently matched.

Residual analysis of small sample sizes has [126] some complications. Firstly, residuals are not totally distributed in an independent way from *x*, because the substitution of the parameters by the estimators introduces [10, 19] certain dependence. Secondly, a few points far from the bulk of the data may eventually condition the estimators, residuals and inferences.

## 4. Error is not in the data, but in the model: the CO_{2} vapour pressure versus temperature case

The linear variation of physical quantities is not a universal rule, although it is often possible to find a coordinate transformation [18] that converts non‐linear data into linear ones. The vapour pressure, *P*, in atmospheres, of carbon dioxide (liquid) as a function of temperature, *T*, in degrees Kelvin, is not linear (Table 9). Carbon dioxide found its use in chemical analysis as a supercritical fluid for extracting the caffeine from the coffee. We may expect, on the basis of the Clausius‐Clapeyron equation, to fit the data compiled in Table 9 into an equation of the form

Temperature (°K) | Vapour pressure | Temperature (°K) | Vapour pressure |
---|---|---|---|

216.5500 | 5.11023 | 266.4944 | 28.70169 |

222.0500 | 6.44393 | 272.0500 | 33.39684 |

227.6056 | 8.04301 | 277.6056 | 38.63636 |

233.1611 | 9.92107 | 283.1611 | 44.47469 |

238.7167 | 12.09853 | 288.7167 | 50.93903 |

244.2722 | 14.62303 | 294.2722 | 58.07022 |

249.8278 | 17.50817 | 299.8278 | 65.91589 |

255.3833 | 20.78797 | 304.1611 | 72.76810 |

260.9389 | 24.51007 |

This requires a transformation of the data. If we define

and

this form is linear,

The resulting graph (Figure 6, middle solid line) examined, appears to be fine, like calculated statistics, and so there is no reason at first to esteem any problem. Results lead to a correlation coefficient of 0.999 988 76. This almost perfect adjustment is indeed very poor when attention is paid to the potential quality of the fit as shown by the sinusoidal pattern of residuals [+ + ‐ + ‐ + + + + + ‐ ‐], which are incorporated in the figure to the resulting least squares regression line. As the details of measurements are unknown, it is not possible to test for systematic error in the experiments. The use of an incorrect or an inadequate model is the reason, which explains in this case the systematic deviations. The Clausius‐Clapeyron equation does not exactly describe the phenomenon when the temperature range is wide. Results similar to those shown in Figure 6 are also obtained by applying weighted linear regression by using weighting factors defined by [6, 7, 127–129]

on the basis of the transformation used.

The error does not lie in the data then, but in the model. We may try to improve the latter by using a more complete form of the equation

The results now obtained (analysis by multiple linear regression) depicted in Figure 6 (bottom) are better than those obtained by using the single linear regression equation, with the residuals randomly distributed. Values of ln *P* may be calculated with an accuracy of 0.001 (or an accuracy level of 0.1%), as suggested by the standard deviation of the regression line obtained. In addition, as *T* is used as a variable, instead of its inverse, interpolation calculations are carried out in an easier way.

The moral of this section is that there are not perfect models [130, 131], but models that are more appropriate than others.

## 5. The heteroscedastic data: HPLC calibration assay of a drug

In those cases in which the experimental data to be used in a given analysis are more reliable than others [6, 61, 63], gross errors may be involved when the conventional method of least squares is applied in a direct way. The assumption of uniform or regular variance of *y* may be no correct when the experimental measurements cover a wide range of *x*‐values. There are two possible solutions to this non‐constant, irregular, non‐uniform or heteroscedastic variance problem: data transformation or weighted least squares regression analysis.

The squared sum of the weighted residuals [132, 133, 64]

is minimized in the weighted least squares procedure. The idea underlying weighted least squares is to attribute the greatest worth [2, 40, 132, 66, 101, 135, 136] to the most precise data. The greater the deviation from the homoscedasticity, the greater the profit that can be extracted from the use of the weighted least squares procedure. The homoscedasticity hypothesis is usually justified in analytical chemistry in the framework of the calibration. However, when the range of abscissa values (concentration) covers several orders of magnitude, for example, in the study of (calibration) drug concentrations in urine or in other biological fluids, the accuracy of *y*‐values is strongly dependent of the *x* ones. In those cases, the homoscedastic requirement implied in single linear regression is violated, thus the introduction of weighting factors being mandatory. Some typical cases of heteroscedasticity appear [137, 138] involving a constant relative standard deviation (Figure 7)

or a constant relative variance (radioactive accounts, Poisson distribution). Photometric absorbances by Beer’s law cover a wide concentration range and like chromatographic analysis in certain conditions, tend to be heteroscedastic. Inductive plasma‐coupling emission spectrometry coupled to mass spectrometry (ICPMS) requires weighted least squares estimates even when the calibration covers a relatively small concentration range. The standard deviation (absolute precision) of measurements *σi* usually increases with the concentration *xi*, whereas the relative standard deviation (relative precision, RSD) decreases instead.

It is possible to derive [138] relationships between precision and concentration through the concentration range essayed so that chemical methods are applied to found analytes present at varying concentrations. A number of different relationships [2, 139–142] have been proposed (Table 10) for different authors, and ISO 5725 gives [143] indications to assist in obtaining a given *C* = *f*(*σi*) relationship.

The advantages of the least squares method may be impaired if the appropriate weights are not included in the equations, despite being a powerful tool. The least squares criterion is highly sensitive to outliers, as we have seen in Figure 4. An undesirable paradox may often occur consisting in the fact that the experimental data of worst quality contribute most to the estimation of the parameters. Although replication may be severely limited [15, 132], it possesses the advantage to provide a certain kind of robust regression [144]. The most common method of performing a weighted regression is using weights values reciprocal to the corresponding variances values, that is,

where *si*^{2} is the experimental estimate of *σi*^{2}. Eq. (14) warrants that in using replication, the lower weights correspond to the outliers of *yi*. The incorporation of heteroscedasticity into the calibration procedure is preconized [145, 146] by several international organizations such as ISO 9169, and ISO/CD 13‐752. The International Union of Pure and Applied Chemistry (IUPAC) includes the heteroscedasticity [147, 148] or non‐constant variance topic for the calculation of the limits of detection and quantification.

The assumption of constant variance in the physical sciences may be erroneous [34, 149–157]. The data from a calibration curve (Table 11) relating to the readings of an HPLC assay to the drug concentration in ng/mL in blood [60] are shown in Figure 8. A regression model reasonable for the mean values is *y* = *α*_{0} + *α*_{1} *x*, in the first approximation. However, the variability of the response increases in a systematic way with increasing the concentration level. This indicates that the constant variance assumption of the response through the range of concentrations assayed is not followed. In fact at the highest level of concentration, a very large response value is produced. There is no physical justification that allows excluding this value from the others in the analysis. Assuming in the first instance constant variance, the least squares are used to obtain as estimated parameters *a*_{0} = 0.0033 and *a*_{1} = 0.0014, with *sy*_{/x} = 0.0265. The representation of the residuals versus the values of *x* should show variability with a constant band, if the model was appropriate. Note that, in Figure 9 (bottom left), the pattern of funnel shape or trumpet indicates that the measurement error is increasing as it does the mean response. The assumption of constant variance is thus not satisfied. On the other hand, the intercept value, 0.0033, is not in good agreement with the mean response value (13 replicates) at zero dose, 0.0021, which supposes another additional problem. The result of ignoring the non‐constant variance in this case results in a poor fit of the model. The weighted linear regression straight‐line model led instead (Figure 10, top) to the equation *y* = 0.0015 *x* + 0.0021, the band of residuals being now rectangular (Figure 10, bottom).

Doses | |||||
---|---|---|---|---|---|

0 | 5 | 15 | 45 | 90 | |

Response | 0.0016 | 0.0118 | 0.0107 | 0.106 | 0.106 |

0.0019 | 0.0139 | 0.0670 | 0.026 | 0.158 | |

0.0002 | 0.0092 | 0.0410 | 0.088 | 0.272 | |

0.0030 | 0.0033 | 0.0087 | 0.078 | 0.121 | |

0.0042 | 0.0120 | 0.0410 | 0.029 | 0.099 | |

0.0006 | 0.0070 | 0.0104 | 0.063 | 0.116 | |

0.0006 | 0.0025 | 0.0170 | 0.097 | 0.117 | |

0.0011 | 0.0075 | 0.0320 | 0.066 | 0.105 | |

0.0006 | 0.0130 | 0.0310 | 0.052 | 0.098 | |

0.0013 | 0.0050 | ||||

0.0020 | 0.0180 | ||||

0.0050 | |||||

0.0050 |

The weighted least squares method requires a higher number of replicates than the conventional least squares method. The estimation of the minimum number of replicates varies between six and 20, according to different authors. In practice, it is often difficult to reach such high level of replication [2, 15] for different reasons, such as cost or availability of calibration, standards and reagents, time demands on previous operations, or by recording of the chromatograms.

In order to apply the weighted least squares analysis, it is mandatory to assign weighting factors to the corresponding observations. In fact, the weighting factor is related with the information contained in the *yi* value, being proportional to the reciprocal of the variance of *yi*. The results of single‐trial assay without additional information seldom contain enough information as to model the variance in a satisfactory way. The independent variable may be usually choice, fortunately, by the researcher, and the corresponding values for the dependent thus replicated.

The general phenomenon of non‐constancy of variance is called as we have previously seen heteroscedasticity. It can be addressed [2, 15] by the weighted least squares method. A second method of dealing with heteroscedasticity is to transform the response [18] so that the variance of the transformed response is constant, proceeding then in the usual way, as in the following.

## 6. Transforming data: preliminary investigation of a dose‐response relationship

The non‐linear relationship between two variables may be sometimes handled as linear by means of [158] a transformation. A transformation consists in the application of a mathematical function to a set of data. The transformation leading finally to a straight‐line fit to the data can be carried out on a variable or on both. The transformation of data is sometimes understood as a device which statisticians use, a conviction founded on the preconceived idea that the natural scale of measurement [159] is something like sacrosanct. This is not like this, and in fact some measurements, for example, those of pH, are actually logarithmic, transformed values [160]

As much as the analyst wants the mould of nature to be linear, often in the curves truth is simply found [118, 160]. Real‐world systems sometimes do not fulfil the essential requirements for a rigorous or even an approximate validity of the method of analysis. In many cases, a transformation (change of scale) can sometimes be applied to the experimental data [18] in order to carry out a conventional analysis. Although it may seem that the best way to estimate the coefficients of a non‐linear equation is the direct use of a non‐linear regression program (NLR), NLR itself [161] is not without drawbacks and problems.

The data of turbidimetric measurements of the growth response of *Lactobacillus leichmannii* to vitamin B_{12} [61] provide a good illustration of a preliminary investigation of dose‐response relationships. Table 12 shows the responses to the eight different doses of vitamin B_{12} measured in six independent tubes per dose, which are depicted in Figure 11.

Doses (ng/tube) | ||||||||
---|---|---|---|---|---|---|---|---|

0.23 | 0.35 | 0.53 | 0.79 | 1.19 | 1.78 | 2.67 | 4 | |

Response | 0.15 | 0.28 | 0.36 | 0.51 | 0.68 | 0.85 | 1.06 | 1.21 |

0.14 | 0.20 | 0.36 | 0.53 | 0.63 | 0.80 | 0.91 | 1.22 | |

0.19 | 0.23 | 0.34 | 0.54 | 0.64 | 0.71 | 1.09 | 1.29 | |

0.19 | 0.25 | 0.37 | 0.45 | 0.61 | 0.85 | 0.93 | 1.24 | |

0.17 | 0.23 | 0.33 | 0.57 | 0.65 | 0.94 | 1.09 | 1.18 | |

0.16 | 0.23 | 0.38 | 0.49 | 0.68 | 0.83 | 1.12 | 1.24 |

The transformation [62]

can be used (Figure 12). The inspection of Figure 12, however, suggests the existence of a marked curvature. The graph of the residuals, the deviation of each point of the model, indicates that the straight line is incorrect, due to the observed systematic pattern. There is a tendency towards curvature, as it is not randomly distributed around zero. It should be assumed that the model is susceptible to improvement, requiring either higher‐order additional terms or a transformation of the data.

If a second‐degree polynomial is fitted to the response data as a function of the logarithm of the dose, the adjustment to the naked eye seems adequate (Figure 13, top). The representation of the residuals as a function of the abscissa values (Figure 13, bottom), however, adopts a funnel shape. The non‐random pattern of residuals carries the message that the assumption of homogeneous (regular or constant) variance is not satisfied, which would require the application of the weighted least squares method, rather than simple linear regression.

The shape of Figure 13 (top) suggests a simple possibility, that of transformation [43] also in the response

A simple inspection of Figure 14 (top) now shows that the linear regression is valid throughout the entire range. Both transformations to achieve homogeneity of variance and normality (Tables 13 and 14) go together (hand in hand) and then both postulates are (almost) often simultaneously fulfilled, fortunately, on applying an adequate transformation.

Date type | Transformation |
---|---|

Poisson (counts) (y) | |

Small counts (y) | |

Binomial | |

Variance = (mean)^{2} | lny |

Correlation coefficient | 0.5[ln(1+r)−ln(1−r)] |

Estimated relationship | α | Transformation | |
---|---|---|---|

2 | −1 | Reciprocal | |

3/2 | −1/2 | Inverse square root | |

1 | 0 | Logarithmic | |

1/2 | 1/2 | Square root | |

0 | 1 | Without transformation |

The stabilization of variance usually takes precedence over improving normality [160]. As stated by Acton [86] ‘*The gods who favour statisticians have frequently ordained that the world be well behaved, and so we often find that transformation to obtain one of these desiderata in fact achieves them all (well, almost achieves them¡)’*.

Linear regression is a linear (in the parameters) modelling process. However, non‐linear terms may be introduced into the linear mathematical context by performing a transformation [162, 163] of the variables (Table 15). Note that when a transformation is used, a transformation‐dependent weight (Table 16) should be used (in addition to any weight based on replicate measurements). When a non‐linear function is capable of being transformed into another one linear, it is called ‘*intrinsically linear’*. Non‐linear functions that cannot be transformed into linear are instead called ‘*intrinsically non‐linear’.*

Function | Formula | Transformation | Linear form |
---|---|---|---|

Power function | |||

Exponential grow model | |||

Logarithmic | |||

Hyperbolic | |||

Logit |

Transformation | Weighting factor (*) |
---|---|

y^{4} | |

y^{2} | |

y^{2} | |

ey | |

logit |

## 7. The variable that has not yet been discovered: the solubility of diazepan in propylene glycol

The study of the solubility of diazepan in mixed solvents [28] requires the representation of Beer’s law of a set of data corresponding to the solubility of diazepan in propylene glycol. The experimental data are shown in Table 17.

C (mg/mL) | T (min) | A | C (mg/mL) | T (min) | A |
---|---|---|---|---|---|

16.0760 | 0.00 | 1.799 | 12.8608 | 87.50 | 1.481 |

3.2152 | 6.00 | 0.335 | 12.8608 | 92.75 | 1.503 |

6.4304 | 10.50 | 0.700 | 12.8608 | 97.75 | 1.522 |

12.8608 | 21.50 | 1.487 | 16.0760 | 102.75 | 1.868 |

6.4304 | 33.50 | 0.670 | 12.8608 | 117.25 | 1.508 |

9.6500 | 39.25 | 1.068 | 9.6500 | 122.25 | 1.108 |

16.0760 | 45.75 | 1.840 | 9.6500 | 130.50 | 1.109 |

9.6500 | 50.75 | 1.088 | 9.6500 | 135.75 | 1.128 |

16.0760 | 56.75 | 1.842 | 6.4304 | 141.00 | 0.720 |

3,2152 | 67.25 | 0.358 | 6.4304 | 146.25 | 0.719 |

6.4304 | 71.75 | 0.703 | 3.2152 | 150.75 | 0.349 |

16.0760 | 77.75 | 1.869 | 3.2152 | 155.75 | 0.367 |

3.2152 | 82.50 | 0.345 |

The relationship obtained between absorbance and concentration is (Figure 15)

These data can be used to corroborate the previously made statement that the correlation coefficient is not necessarily a measure of the suitability of the model. The *R*^{2}‐value of the above equation is 0.998 (*r* = 0.999). Many researchers would settle for this, but they would be wrong.

In spite of the high coefficient of correlation (*r* = 0.999), when the residuals are represented as a function of the numerical order in which the samples were measured, we obtain Figure 15 (bottom). The pattern obtained is not random by marking the residual trend with a positive slope. This behaviour is indicative of the situation in which the assumption of independence is not satisfied. The slope in a representation of the residuals as a function of the order of measure (time) indicates that a linear term must be included in the model.

When time is included in the model, Eq. (21) results

giving rise to a value of *R*^{2} equal to 0.999.

When the residuals are calculated for this model and are plotted as a function of the concentration, a graph similar to that of Figure 15 (middle) is obtained (Figure 16, top). However, if the residuals are represented for this model as a function of time (which is reflected in the order in which the samples were measured), the resulting pattern is obtained in Figure 16 (bottom), in which it is observed that the independence of the error has been accommodated (compare Figure 15, bottom), and the fit has improved, although it could probably do so even more.

The order in the analysis time demonstrates the significant fact that a representation of the residuals allows the observation of the effect of the time that otherwise would not have been perceived. This is possible in the case of diazepam solubility because the researcher was careful to record the time to which the samples were measured.

The appearance of a pattern in residuals as a function of time in a study of Beer’s law could indicate that some contaminant is affecting, or that the light source is decaying, or perhaps that it has not yet been warmed. The pattern of the residuals indicates if there is a time‐dependent variable, but not the reason for that dependency, which must be ascertained, in its case.

## 8. Nickel by atomic absorption: all models are wrong

Nickel nitrate (II) hexahydrate reagent analysis (Merck) is used to prepare a standard solution of 1 g/L Ni. The salt of 5.0058 g is weighed into the analytical balance and brought into a 1‐L volumetric flask with ultrapure water. From this solution containing 1000 mg/L, a working solution contains 125 mg/L. Appropriate volumes of this solution (triplicates) are added to 25‐mL volumetric flasks to obtain the calibration curve, thinning with ultrapure water. The measurements are carried out in an ‘Analyst 200 Atomic Spectrometer’ operating in absorption mode with a Cu‐Fe‐Ni multi‐element Lumma lamp (Perkin Elmer), at 232 nm, with an acetylene air flame. The obtained absorbances, given below, are superior to those described in Perkin Elmer [164]. The measurements depend on the flow, for example, of the nebulizer system, different in each case.

Absorbance data (in arbitrary units) in the triplicate of aqueous solutions of Ni^{2+} in mg/L (ppm) are compiled in Table 18. It has been tried to adjust Eq. (2), third‐degree and fourth‐degree polynomial models (Figure 17) (left figures with mean and right values with individual values), observing that as the degree of the polynomial increases, the goodness of the adjustment increases, although the residuals detect pattern. There are no perfect models, but models more appropriate than others [165, 166]. It is possible to use rational form polynomials with the SOLVER function of Excel. Even so, the residuals show a pattern similar to that presented when a fourth‐degree polynomial is fitted to the data.

Ni^{2+} (ppm) | Absorbance (arbitrary units) | Ni^{2+} (ppm) | Absorbance (arbitrary units) | ||||
---|---|---|---|---|---|---|---|

2.5 | 0.217 | 0.207 | 0.226 | 17.5 | 0.743 | 0.742 | 0.744 |

5.0 | 0.399 | 0.396 | 0.389 | 20.0 | 0.767 | 0.767 | 0.771 |

7.5 | 0.523 | 0.513 | 0.519 | 22.5 | 0.787 | 0.786 | 0.789 |

10.0 | 0.618 | 0.615 | 0.612 | 25.0 | 0.808 | 0.813 | 0.807 |

12.5 | 0.672 | 0.664 | 0.664 | 27.5 | 0.820 | 0.821 | 0.824 |

15.0 | 0.713 | 0.715 | 0.707 | 30 | 0.835 | 0.835 | 0.831 |

## 9. Final comments

Calibration is an essential part of every quantitative analytical method, with the exception of primary methods of analysis (isotope dilution mass spectrometry, coulometry, gravimetry, titrimetry and a group of colligative methods). The correct performance of calibration is a vital part of method development and validation. Parameter estimation models are often employed to obtain information concerning chemical systems, forming on this way a fundamental part of analytical chemistry. In those cases in which a wrong equation is fitted to data, the form of the residuals plot contains useful information which helps to modify and improve the model in order to get a better explanation of the data. Examples extracted from the literature show how residual plots reveal any violation of the assumptions severe enough as to deserve correction. As a matter of fact, some authors [12, 25, 28, 59, 96] are in favour of using residuals graphically to evaluate the inherent assumptions in the least squares method.

If there is a true linear relationship between the variables with the error symmetrically distributed, the sign of residuals should be distributed at random between plus and minus with an equal number of each. A plot of residuals allows checking for systematic deviation between data and model. Systematic deviations may indicate either a systematic error in the experiment or an incorrect or inadequate model. A curvilinear pattern in the residuals plot shows that the equation being fitted should possibly contain higher‐order terms to account for the curvature. A systematic linear trend (descending or ascending) may indicate that an additional term in the model is required. The ‘fan‐shaped’ residual pattern shows that experimental error increases with mean response (heteroscedasticity) so the constant variance assumption is inappropriate. This phenomenon may be approached by the weighted least squares method or by transforming the response. Time‐order analysis proves sometimes the more noteworthy fact that a residual plot permits the observation of a time effect that otherwise might not have become known. However, note that there are no perfect models, but models that are more suitable than others.

Many more sophisticated methods have been devised (standardized, studentized, jack‐knife, predicted and recursive residuals). However, in spite of their worth and importance they are considered beyond the scope of this chapter, devoted to a primer on residuals. The analyses presented in this chapter were mainly done using an Excel spreadsheet.