Open access peer-reviewed chapter

Youden Two-Sample Method

By Julia Martín, Nieves Velázquez and Agustin G. Asuero

Submitted: May 4th 2016Reviewed: October 19th 2016Published: February 22nd 2017

DOI: 10.5772/66397

Downloaded: 2095


The results obtained when testing materials, equipment and procedures are not generally identical. Factors that influence the magnitude of the results are not fully controllable. As such, the interpretation and analysis of results must take into account the variations caused by numerous and random unavoidable causes. Intercomparison exercises are considered of being of importance, as they do allow the examination of the analytical process and their generated results. Youden plot is particularly aimed at interlaboratory comparisons. The raw results provided by the participating laboratories are treated by a statistical method applied by the centre performing the trial. In order to materialize this, two similar materials with small differences in the concentration of the characteristics are required. The advantage of Youden analysis is its ability to separate the random errors with a minimum effort by participants in the design from the point of view of the analytical requirement. This book chapter illustrates the method that has been applied to elaborate on data covering a diverse scientific field: polyunsaturated fatty acids in fat and oils, total blood cholesterol and aspirin in pharmaceutical preparations. Finally, liquid chromatography with tandem mass spectrometry detector has been applied to the determination of an emerging contaminant, methylparaben (MeP), in surface waters.


  • Youden plot
  • confidence ellipse
  • quality control

1. Introduction

The main objective of quality systems when implanted in analytical laboratories is to ensure that the results obtained confirm to quality standards, in addition to them showing a level of harmonization [14] between obtained results.

In order to achieve this goal, quality assessment systems are implemented so as to allow the examination of the analytical process as well as of their results generated.

Quality assessment is coined as the systematic examination carried out by an entity to verify [57] that it meets specified requirements (fitness for purpose). This is a generic concept that can be refined more by relating it to the specific set of activities planned and executed with the aim of ensuring that the activities involved in the quality control are done in a proper and efficient way.

The quality assessment involves the methodical and continuous contrast of the product, system or quality service. In the specific area of the laboratory, it refers to examination of systems and to analytical results generated both in terms of accuracy [8, 9] and representativeness.

Intercomparison exercises are framed in this context [1013], establishing the procedure for design, organizing and gathering information from a set of laboratories working with the same samples that undergo an assessment of their results. An intercomparison exercise is based on acceptance by several laboratories to perform the same analysis. This analysis is carried out under the co-ordination of an organization. The purpose is to assess the quality of their work, to evaluate the method of measuring, to determine the property of a material (the content of an element or compound, etc.).

The main mission of the organization is to establish the objectives and the conditions regarding the participation of the laboratories [1418] while ensuring the quality and stability of the sample under study. Those institutions are also responsible of dealing with the statistical treatment of obtained outcomes. The participating laboratories should, in turn, commit themselves to follow the conditions set by the organization, which may change depending on the type of executed exercise.

A very important aspect of intercomparison exercises is the selection of material to be used for the study. In that sense the type of matrix and the nature and range of values of the parameter (or parameters) under study must be defined.

It is worth mentioning at this point that not all materials are suitable to carry out a study of this type. It is essential that the material used is representative, homogeneous and stable. The organization is responsible for ensuring that the criteria mentioned above are met.

The preparation of the material must follow a series of stages, after which the material is packaged. It should, hence, be homogeneous and stable. The submitted sample must be properly identified and packaged in order to prevent breakage.

The sample is tagged accordingly so as to show the state in which the sample has been sent to the participating laboratory. Guidelines for sample preservation and handling, a description of the analytical methods to be applied as well as the report methods must also be included in the shipment.

Youden two-sample diagram, two-sample collaborative testing, two-sample plan, Youden plot or Youden analysis is particularly aimed at interlaboratory comparisons [19, 20]. The raw results provided by the participating laboratories are treated by a statistical method applied by the organizing centre of the trial. The Youden approach or z-score marks are some of the tools used for the treatment of the results. Finally, based on these statistical treatments, a statement is sent to the laboratories that have presented inaccurate results, as well as appropriate suggestions in order to improve their work.

A literature search has been carried out in order to validate current status with regards to the use of the Youden approach. The information gathered is presented in Table 1 [1129].

Key feature in analytical proficiency testing[5]
Comparative studies of Shewhart, Thompson, Howarth and Youden representations: advantages and disadvantages[21]
Implementation of two new graphical methods recommended by the ISO standard, Mandel’s h statistic and the Youden plot, to evaluate the consistency between laboratories and within laboratories for radon and thoron exposures[22]
A proficiency testing scheme (CNAS T0419) is described involving 217 laboratories in China as participants using their regular analytical methods for the determination of lead and arsenic in foundation cream cosmetics[20]
An optimized Youden chart was developed and compared with the traditional and trimmed traditional Youden charts[23]
A robust Youden plot is constructed based on robust statistical parameters since these are scarcely affected by non-normally distributed data, and this approach is applied in an external quality assessment (EQA) programme.[24]
Youden representation for mycotoxins (deoxynivalenol and ochratoxin A) and toxins (T-2 and HT-2) in wheat and corn[25]
Metrology statistical manual: the Youden approach with standardized variables[26]
Interlaboratory studies: statistical organization protocol and evaluation[27]
Control blood for an external quality assessment scheme (EQAS) for international normalized ratio (INR) point-of-care testing (POCT) in the Netherlands and to assess the performance of the participants[28]
Application of ISO 13528 robust statistical methods for external quality assessment of blood glucose measurements in China[29]
A study under what conditions of measurement to assess bias and from the results of a six-round blind-duplicated interlaboratory proficiency programme for creatinine in urine shows that bias is present in each individual run with components from that batch and from the laboratory over the rounds of the programme[30]
Brazilian interlaboratory programme study on anion measurement in synthetic water. The programme described is promoted regularly since 2007 and recommended the use of ion chromatography as analytical technique for all participant laboratories[31]
Robust determination of the correlation coefficient, analytically validated using two types of statistical models and computational simulations[32]
Comparison of the statistical Youden method (by Hotelling T2 test and bivariate normal distribution ) in interlaboratory studies by ISO and National Association of Testing Authorities (NATA) standards[33]
Collaborative study procedures[14]
A method validation study was conducted according to the IUPAC harmonized protocol for the determination of ochratoxin A in Capsicumspp. (paprika and chilli). The study involved 21 participants representing a cross section of research, private and official control laboratories from 14 EU member states and Singapore[34]
Collaborative test on the count of Escherichia coli[35]
Proficiency test for the determination of heavy metals in mineral feed. The importance of correctly selecting the certified reference materials during method validation[36]
Evaluation tools to understand statistical methods related to the z-score for use in proficiency testing by interlaboratory comparisons[37]
Evaluation of learning outcomes in quantitative analysis lab using Youden plots[38]
State of the art with respect to the selection and use of proficiency testing schemes and the interpretation of results and evaluations given in proficiency testing schemes[39]
Performances of analytical methods for atmospheric deposition and soil analysis assessed through intercomparison exercises[40]
HIV external quality assessment (EQA) results by the KCDC from the 17 HIV testing laboratories that also performed HIV-1 western blot testing of the 585 laboratories[41]
Second interlaboratory exercise on non-steroidal anti-inflammatory drug analysis in environmental aqueous samples[42]
Statics and chemometrics for analytical chemistry[7]
A proficiency testing scheme was developed for a limited number of analytical laboratories participating in the analysis of natural water in Israel[43]
Proficiency test for heavy metals in feed and food in Europe[44]
A multilaboratory proficiency testing programme was conducted by the National Accreditation Board for Testing and Calibration Laboratories (India) and coordinated by the Institute of Pesticide Formulation Technology. This programme was conducted to compare the performance of individual laboratories in the area of pesticide formulation (Chlorpyrifos 20 EC) analysis. A total of 24 laboratories in India participated[45]
Proficiency testing for the determination of pesticides in mango pulp: a view of the employed chromatographic techniques and the evaluation of laboratories’ performance[46]
Investigations for the improvement of the measurement of volatile organic compounds from floor coverings within the health-related evaluation of construction products: application of the Youden method[47]
Characterization of candidate reference materials for bone lead via interlaboratory study and double isotope dilution mass spectrometry[48]
Implementation and methodology of an interlaboratory system that ensures the quality of glassware calibration and use in a large laboratory[49]
An updated liquid chromatographic assay for the determination of glyphosate in technical material and formulations: application of the Youden method[50]
Collaborative studies for quantitative chemical analytical methods[51]
Description and results of the 2005 interlaboratory comparison exercise for trace elements in marine mammals. Two quality control materials derived from fresh-frozen marine mammal livers were produced and characterized at the NIST and were then distributed to over 30 laboratories[52]
Youden method applied to the external quality control of semen analysis in Germany[53]
Quality assurance in analytical chemistry application in the environmental, food and materials analysis, biotechnology and medical engineering[1]
Brief note on the Youden method[54]
Interlaboratory comparison by means of method performance precision and bias studies and proficiency testing schemes are described. The set-up of the experiments and the evaluation of the data by means of graphical and statistical methods are considered[11]
Practical advice on the Youden plot[55]
Repeatability and reproducibility of determination of the nitrogen content of fishmeal by the combustion (Dumas) method and comparison with the Kjeldahl method: interlaboratory study[56]
Application of the Youden method in clinical chemistry: cortisol determination[57]
An investigation of the capability of the medium resolution imaging spectrometer validation teams to determine chlorophyll a, using the latest measuring protocols and advanced high-performance liquid chromatography and spectrophotometric and fluorometric method has been performed[58]
Standardization of calibration and quality control surface-enhanced laser desorption/ionization time of flight mass spectrometry[59]
A chlorophyll-a interlaboratory comparison was carried out to compare three different analytical chlorophyll-a determination methods: a German standard DIN 38412-16, a method of the HELCOM-Combine-Manual and the different “in-house” methods of participating laboratories[60]
Results for total chloride content in four different types of Portland cement provided by testing laboratories participating in an interlaboratory comparison are presented. The data sets were evaluated by using different statistical methods[61]
A proficiency test on the quantification of trace elements in serum was carried out to verify the performance of about 30 regional laboratories of the network of Italian laboratories. The exercise consisted of four runs in which the laboratories were free in choosing analytical methods to determine trace elements in freeze-dried animal serum. Laboratory performances were evaluated by the study of statistical functions as coefficients of variation (CV), Youden plot and z-score value[62]
Collaborative studies for cereal analysis[10]
Practical digest for evaluating the uncertainty of analytical assays from validation data according to the LGC/VAM protocol[63]
Statistical methods for use in proficiency testing by interlaboratory studies, International Organization for Standardization[13]
Youden analysis of Karl Fisher titration data from an interlaboratory study determining water in animal feed, grain and forage[64]
Establishing measurement traceability in clinical chemistry: cholesterol, progesterone and aldosterone in serum[65]
Worldwide and regional intercomparison for the determination of organochlorine compounds and petroleum hydrocarbons in mussel tissue IAEA-432[66]
Interlaboratory study on the determination of ascorbic acid in serum[67]
An intercomparison of in vitro chlorophyll-a determination[68]
Guide about collaborative studies to validate characteristics of an analytical method[69]
Youden method application to result in total and dissolved organic carbon in surface waters[70]
Interlaboratory exercise conducted within the framework of a hydrological project on underground water[71]
Interlaboratory study on the determination of trace elements in sea water[72]
State of the art with respect to the selection and use of proficiency testing schemes and the interpretation of results and evaluations given in proficiency testing schemes[15]
Interlaboratory studies in analytical chemistry: method performance studies (collaborative trials), laboratory-performance studies (proficiency tests), collaborative bias evaluation, interlaboratory evaluation of to-be standard methods as well as certification studies for reference materials[12]
Intercomparison exercise on the determination of organochlorine compounds and petroleum hydrocarbons in algae[73]
Statistical model assumptions upon which the procedure is based. Provides validity tests for several of these assumptions, explains conditions under which Youden is not consistent with precision estimate and indicates when precision estimates based on the procedure should be interpreted with caution or should not be used[74]
Intralaboratory testing of method accuracy from recovery assays[8]
Application of the Youden method to the mass fraction Youden protein fodder[75]
Succinct description of the two-sample Youden method[19]
Performances of analytical methods for freshwater analysis assessed through intercomparison exercises[76]
Proposed guidelines for the internal quality control of analytical results inthe medical laboratories[77]
Application and improvement of the Youden analysis in the intercomparison between flowmeter calibration facilities[78]
Basic of interlaboratory studies: the trends in the new ISO 5725 standard edition[79]
Round-robin study of performance evaluation soils vapor-fortified with volatile organic compounds[80]
Protocol for the design, conducting and interpretation of collaborative studies[2]
Application of the Youden method to acid rain analites[81]
A bivariate control chart for paired measurements[82]
Polystyrene film as a standard for testing FT-IR spectrometers[83]
Graphical diagnosis of interlaboratory quality control data for surface water samples[84]
Nomenclature of interlaboratory analytical studies[85]
Basic method for the determination of repeatability and reproducibility of a standard measurement method[86]
Reviews on the life and work of Youden[87]
Quality control in analytical chemistry[6]
Assessment of overall accuracy of lead isotope ratios determined by inductively coupled plasma mass spectrometry using batch quality control and the Youden two-sample method[9]
World Health Organization international intercalibration study on dioxins and furans in human milk and blood[88]
Analytical quality assurance. A review[89]
Multiway analysis of variance for the interpretation of interlaboratory studies[90]
External quality control study on the reliability of current histamine determinations in European laboratories[91]
Guidelines for the development of standard methods of collaborative study: organization of interlaboratory studies and a simplified approach to the statistical analysis of collaborative study results[16]
Classic paper reprint. The collaborative test of Youden[92]
Robust statistic and functional relationship estimation for comparing the bias of analytical procedures over extended concentration ranges[93]
Bias-free adjustment of analytical methods to laboratory samples in routine analytical procedures[94]
Protocol for the design, conducting and interpretation of method-performance studies[3]
Exchange of comments on a new technique in chemical assay calculations[95]
Measurement, statistics and computation, analytical chemistry by open learning. Application to aspirin preparations[96]
Quality assurance of chemical measurements[4]
The use of statistics to develop and evaluate analytical methods[17]
Interlaboratory evaluation of high-performance liquid chromatographic. Determination of nitroorganics in munition plant wastewater[97]
Interlaboratory variability in trace element analysis[98]
The limitations of models and measurements as revealed through chemometrics intercomparison[99]
Considerations about the graphical representation[100]
Reverse-phase HPLC method for analysis of TNT, RDX, HMX and 2,4-DNT in munitions wastewater[101]
Determination of heavy metals in reference marine sediments. Application of the Youden method[102]
Organization and evaluation of interlaboratory comparison studies amongst southern African water analysis laboratories[103]
The use of the Youden plot for internal quality control in the immunoassay laboratory[104]
An annotation on the Youden method: recognition of the systematic and random errors[105]
Testing laboratory performance: evaluation and accreditation[106]
Qualification of estimates for total trace elements in food stuffs using measurement by atomic-absorption spectrophotometry[107]
A collaborative study for measuring polyunsaturated fatty acids in fats and oils[108]
Application of interlaboratory studies on the quality of effluent wastewaters[109]
Statistical techniques for collaborative tests. Planning and analysis of results of collaborative tests[18]
Interpretation and generalization of Youden’s two-sample method[110]
Collaborative analysis and the standardization of analytical methods[111]
Graphical diagnosis of interlaboratory test results (reprinted from industrial quality control)[112]
Systematic versus random error laboratory surveys[113]
Precision measurement and calibration. Statistical concepts and procedure[114]
A graphic display of interlaboratory test results[115]
Determination of systematic an accidental errors of analytical procedure by the Youden method[116]
Collaborative test[117]
The sample, the procedure and the laboratory[118]
Graphical diagnosis of interlaboratory test results[119]
Statistical aspects of the cement testing programme[120]
Evaluation of chemical analyses on two rocks. A simple graphical technique is proposed to aid in the comparisons between laboratories[121]
A plan for studying the accuracy and precision of an analytical procedure[122]
Design and interpretation of interlaboratory studies of test methods[123]

Table 1.

Some published papers dealing with the Youden approach.

Performed literature review reveals that the Youden chart has been successfully used in agriculture, environmental chemistry, geochemistry, industry and medicine. The invention of the Youden diagram may be regarded as setting a landmark in quality control in clinical chemistry [21].

The performance and evaluation of interlaboratory programmes by Youden’s method are suited to laboratory monitoring and allow to obtain information concerning both precision and systematic errors from analytical results without much effort.


2. Literature review: the Youden plot

W. J. Youden (1900–1973) was a physical chemist during the first third of this life, who turned into a statistician later, employed by the National Bureau of Standards (NBS) (now National Institute of Standards and Technology, NIST) from 1948 until his death in 1971. One of his more memorable sentences states [22, p. 12] that “The best way to find out about some of the difficulties in making measurements is to make measurements” [22].

He approached interlaboratory testing as a means of uncovering biases in measurement processes, and the so-called Youden plot has become an accepted design and analysis technique throughout the world for comparing precision and bias amongst laboratories. Youden suggested in 1959 a very simple graphical procedure for plotting results obtained by different laboratories [2325]. Work in graphical methods, which began with the Youden plot, continues today, notably in recent works of NIST chemists.

The above is also referred to as two-sample collaborative testing, two-sample diagram, two-sample plan or Youden plot.

The method focuses on intercomparison exercises. The main characteristic is its ability to separate the systematic and random errors with minimal effort on the part of the participants.

The method is implemented as follows:

Two nearly identical samples are prepared, divided and sent to each of the participating laboratories, as recommended by Youden. A scatter plot is drawn in which the x-axis indicates one of the reported values and the y-axis the other. The scale units are the same along each axis. Each pair of results, corresponding to a given laboratory, is a point in the Youden plot (see Figure 1).

Figure 1.

Typical Youden plots when (a) random errors are significantly larger than systematic errors due to the analysts and (b) when systematic errors due to the analysts are significantly larger than the random errors [126].

The points will cluster in a circular pattern whose centre is the mean values for the two samples.

Once the results are represented in the plot, they are divided into four quadrants, which are identified as (+, +), (−, +), (−, −) and (+, −). When any laboratory’s result exceeds the mean achieved for all laboratories, a plus sign is used, a minus sign indicates a value smaller than the mean. If the variation in results is dominated by random errors, it would be expected that the points fall randomly distributed in all quadrants, with similar number of points in each quadrant. When systematic errors are significantly larger than random errors, then the points occur primarily in the (+, +) and the (−, −) quadrants, forming an elliptical pattern around a line bisecting these quadrants at a 45° angle.

The plot is an effective method to qualitatively evaluate the results and the capabilities of the proposed method. As can be seen in Figure 2, the length of a perpendicular line from any point to the 45° line is proportional to the contribution of random error on a given laboratory’s results (red arrow). The distance from the intersection of the axes (mean values for samples X and Y) to the perpendicular projection of a point on the 45° line is proportional to the laboratory’s systematic error (green arrow).

Figure 2.

Relationship between the result for a single laboratory (in blue) and the contribution of random error (red arrow) and the contribution from the laboratory’s systematic error (green arrow) [126].

An ideal standard method is linked with small random and systematic errors characterizing a circular compact cluster of points.

The Youden plot is a special case of the bivariate control chart to evaluate the performance of several laboratories, and the idea behind is the principal component analysis [26].

In 1974, the Youden plot was extended by Mandel and Lashof by using an ellipse instead of a circle [27]. In Youden’s original method, the concentration of the analyte in the two materials was nearly the same, so that the repeatability as well as the laboratory biases would be the same for two materials [28].

Mandel and Lashof investigate the situation where two samples do not have a similar concentration so that random and systematic errors are no longer necessarily the same for both methods. They showed that in all cases, the points in the plot fall within an elongated ellipse. When Youden’s original plot (two similar samples) is applied to, then the major axis forms a 45° angle. In retrospect, when samples are not similar to one another, different angles may be obtained. Their paper contains a procedure to decide whether lab bias occurs or not and also contains an estimate of all variance components.

The confidence ellipse has been proposed in ISO 13528:2005 to indicate anomalies in the between- and the within-laboratory errors in qualitative terms. For laboratory monitoring, interlaboratory tests performed according to ISO 5725-2 require much effort, especially because a large volume of samples must be provided by the organizer for K= 4 repeated analyses per laboratory. It is worth noting at this point that this is only suited for process standardization.

As already mentioned (see pp. 3–4), the performance and evaluation of interlaboratory programmes by Youden’s method are recommended above all for laboratory monitoring. It allows to obtain information concerning both precision and systematic errors from analytical results without much effort. In addition to the above, Youden’s method requires less effort for organizers and participants alike. It equally showcases a simple evaluation meaning that a potential manipulation is less likely.

Youden’s method has been recommended in modern statistical manuals, procedures and protocols [29, 30] as well as recent papers, reports and government agencies, as depicted in Table 1 [3134].

This study discusses the Youden method and elaborates its applications in a number of diverse areas.

The experimental systems selected first for gaining experience and training in the application of the method have been the determination of polyunsaturated fatty acids in fats and oils [35], total blood cholesterol [36] and aspirin in pharmaceutical preparations [37], i.e. food and clinical and pharmaceutical applications, respectively.

Finally, a detailed procedure for the determination of methylparaben in surface waters, of special relevance nowadays in the environmental field, has been developed by using liquid chromatography-mass spectrometry.

At last, the confidence ellipse as proposed by ISO 13528:2005 will be described, and a practical case of concentrations of antibodies for two similar allergens is used as an aid to interpret the plot.

2.1. Youden plot development

The Youden plot is developed as follows:

  1. Draw on the graph the points (X, Y) with the results submitted by the laboratories and reject any obvious anomalous points.

  2. Calculate the centroid (X-mean, Y-mean) and draw up the lines. The vertical line is the average value for sample X, and the average value for sample Y is shown by the horizontal line.

  3. Draw up the line X = Y passing through the centroid.

  4. The difference, D, between the results Di = Xi−Yi is referred to as random error. To estimate the total contribution from random error, the standard deviation of these differences, SD, for all laboratories is used as follows:


where l is the number of laboratories and the factor of 2 is the result of using two values to determine Di.

  1. In the same way, the total, T, of each laboratory’s results (Ti = Xi + Yi) contains contributions from both random error and twice the laboratory’s systematic error:


  1. The standard deviation of the totals, ST, provides an estimate for σtot:


Again, the factor of 2 in the denominator is the result of using two values to determine Ti.

  1. If the systematic errors are significantly larger than the random errors, then ST is larger than SD, a hypothesis that can be evaluated using a one-tailed F-test, where the degrees of freedom for both the numerator and the denominator are n−1.

  2. If ST is significantly larger than SD, σtot2may be split into components representing random error and systematic error:


  1. Calculating the radius of the confidence circle:

R = SD·bE5

where % P is the percentage of selected confidence level (usually 95%).

  1. Draw a circle with radius R and centroid (X-mean, Y-mean). The laboratories falling outside the 95% circle are said to provide biased results. The radius of the circle is based on a multiple of SD, depending on the desired percentage of observations anticipated to fall within a bivariate normal distribution. A circle whose radius is a multiple of SD = (2.5; 3) represents the smallest circle that can be contained in almost every point, in the absence of bias.

2.2. An alternative approximation: the Z-score

An alternative to Youden Plot is the punctuation Z-Score. This value is used to “score” a parameter in a particular round of a laboratory’s participation. This is done by means of the following calculations:

  1. Calculate the median of X and Y.

  2. Calculate the total error (εT) for each laboratory:


  1. Calculate the systematic error component (Cs) as


  1. Calculate the random error component (CR) as


  1. Systematic and random components calculation required so that their sum is equal to the magnitude of the total error:

Systematic error:


Random error:



  1. Calculate the typical deviation:


where n is the number of laboratories.

  1. Finally, the z-score is calculated as


The results are classified as

|z| ≤ 2“satisfactory”
2 < |z| ≤ 3“questionable”
|z| > 3“unsatisfactory”


3. Application to experimental systems

3.1. Determination of polyunsaturated fatty acids in fats and oils

To illustrate the procedure, data from the interlaboratory study of a method for determining polyunsaturated fatty acids (PUFA) in fats and oils is used. The procedure consisted of saponifying a sample, treating it with an enzyme and measuring the absorbing product at 234 nm. Palm oil, corn oil and three hydrogenated blends were used in the study. One blend of hydrogenated oil was separated into two parts and designated as samples X and Y, respectively. Random subsamples from the two samples were analysed in each of the 17 laboratories as blind duplicates. The test incorporated an official Food and Drug Administration (FDA) method and a slightly modified one using boron trifluoride-methanol (BF). The aim in this case was to identify the laboratories that have higher quality results.

The results for FDA method are shown in Table 2. The data from laboratory 1 were rejected because inaccurate results were proportioned; those from laboratory 11 are listed but were not used (because their results were beyond the ones shown by others (8.2 g/100 g for sample X and 26.3 g/100 g for sample Y) in the calculations.

LaboratoryFDA method
LaboratoryBF method
Sample XSample YSample XSample Y

Table 2.

Determination of cis, cis-PUFA in blind duplicate samples by two methods (g trilinolein/100 g sample).

*Not included in mean.

The vertical line at 28.6 g/100 g is the average value for sample X, and the average value for sample Y is shown by the horizontal line at 28.2 g/100 g. To estimate σrand and σsyst, the values for Di and Ti are calculated first. Next, the standard deviations for the differences, SD, and the totals, ST, are computed using Eqs. (1) and (3), yielding SD = 1.53 and ST = 3.11. To determine if the systematic errors between the laboratories are significant, the F-test is applied so as to compare ST and SD.

Because the F-ratio (4.141) is larger than F(0.05,14,14), which is 2.484, it is concluded that the systematic errors between the analysts are significant at the 95% confidence level, which is estimated using Eq. (4) giving 3.67.

The results are plotted in Figure 3. The latter reveals that laboratories 11 (aberrant result), 12, 13, 14, 15 and 16 are outside the 95% circle, indicating high systematic errors.

Figure 3.

Youden plot (determination of PUFA, FDA method).

The results for the BF method are shown in Table 2.

Again, the data from laboratory 1 were rejected because of a mistake; those from laboratories 11 and 12 are listed but were not used (10.3 and 10.5 g/100 g, respectively, for sample X and 29.6 and 10.1 g/100 g, respectively, for sample Y), as they lie beyond the set limit [35] in the calculations.

Again, the F-ratio (3.268) is larger than F(0.05,13,13), which is 2.577, so it is determined that the systematic errors between the laboratories are significant at the 95% confidence level. All the results are plotted in Figure 4. By observing the latter, one may observe that the laboratories 11, 12 (aberrants) and 2 and 13 are outside the 95% circle and are displaced far from the cluster of the others.

Figure 4.

Youden plot (determination of PUFA, BF method).

Notice that in both methods, about half the points lie above, and about half lie below the horizontal lines through the two means. Likewise, the vertical lines also separate the laboratories into equal groups, as do the 45° lines. However, in neither plot are the results equally distributed amongst the four quadrants; there are more in the upper right and lower left quadrants than in the upper left and lower right. Dispersion along the 45° line indicates that laboratories are high or low on both samples, while dispersion at right angles to the 45° line indicates a lack of agreement between results from the same laboratory.

If there were no systematic variations amongst laboratories, the pattern of points would be expected to be circular. The greater the systematic variations, the more elliptical the pattern will become.

The results obtained by the laboratories may now be compared by using the two methods as follows:

  • Three laboratories (3, 6 and 9) are within the 95% circle and near each other on both plots.

  • Five laboratories (4, 5, 8, 10 and 17) are within the 95% circle on both plots but widely separated from each other.

  • Three laboratories (2, 14, 16) are outside the limit on one plot but not on the other.

  • Three laboratories (11, 12 and 15) are outside the limit on both plots.

  • Laboratory 13 is systematically low using both methods.

  • The two standard deviations are approximately equal.

3.2. Determination of cholesterol levels in the blood

As part of a collaborative study to assess a new method that allows the determination of the total amount of cholesterol in the blood, two samples and the instructions to analyse each sample are sent to ten laboratories [36].

Table 3 shows the results obtained in mg total cholesterol per 100 mL of serum.

LaboratorySample XSample Y

Table 3.

Determination of cholesterol in serum (mg cholesterol/100 mL of serum).

Figure 5 provides a two-sample plot of the results. The clustering of points suggests that the systematic errors of the analysts are significant. Two laboratories (1, 5) are outside the limit on the plot and widely separated from each other.

Figure 5.

Youden plot (determination of cholesterol).

The vertical line at 248.9 mg/100 mL is the mean value for sample X, whereas the horizontal line at 246.5 mg/100 mL corresponds to the mean value of sample Y. To estimate σrand and σsyst, the values for Di and Ti are calculated first, followed by the standard deviations.

Because the F-ratio (2.530) is lower than F(0.05,9,9), which is 3.179, it is concluded that the systematic errors between the analysts are insignificant at the 95% confidence level.

If the true values for both samples are known, it is possible to test the presence of systematic errors. When there are no systematic method errors, the sum of the true values, μtot, for samples X and Y is equal to

μtot = μX + μYE16

should fall within the confidence interval around T. A two-tailed t-test of the following null and alternate hypotheses is applied:

H0: T = μtot HA: T ≠ μtotE1700

This occurs so as to determine if there is evidence for a systematic error in the method. The test statistic, texp, is


with n−1 degrees of freedom. The 2in is included in the denominator because ST underestimates the standard deviation when comparing Tto μtot.

Because this value for texp is smaller than the critical value of 2.26 for t(0.05, 9), there is no evidence for a systematic error in the method at the 95% confidence level.

3.3. Determination of aspirin in pharmaceutical preparations

The results of determinations, made in ten laboratories, on two similar aspirin preparations are given in Table 4 [37].

LaboratorySample XSample Y

Table 4.

Weight content (%) of aspirin in pharmaceutical preparations [37].

The analysis of data concerning materials X and Y reveals the following:

  • A high level of interlaboratory variation.

  • For each of the laboratories, the observed difference between the aspirin content of the two materials is approximately the same.

The average values for the materials (50.054% for X and 52.068% for Y) are used for the centroid. The results are plotted in Figure 6 where it can be deduced that the data pointed on the diagram fall in either the first or the third quadrant (line X = Y). This is a consequence of the fact that laboratories which obtained “high” values for material X also obtained “high” values for material Y. The opposite is also true; laboratories reporting “low” values for X also reported “low” values for Y.

Figure 6.

Youden plot (determination of aspirin).

Only two laboratories (2 and 9) are within the 95% circle. Because the F-ratio (25.834) is larger than F(0.05,9,9), which is 3.179, it is reasoned that the systematic errors between the analysts are significant at the 95% confidence level.

All this suggests that the variability in the difference between the two samples is evident. There are two factors which influence the variability of the differences. These are the random error of measurement and the heterogeneity of the materials. It is easy to infer that if the materials were relatively heterogeneous, then the reported differences would show considerable variability. That this is not so in this example is evidence that the laboratories were dealing with relatively homogeneous materials, i.e. the composition of the samples of materials X and Y received by each laboratory was essentially the same.

The larger the random error of measurement associated with an analytical procedure, the more varied the results of replicate measurements are. A relatively large random error of measurement will also cause the differences between them to vary considerably. The example presented herein indicates that the observed differences are approximately the same. This suggests that the random error experienced in each laboratory is relatively small. Perpendicular dispersions to the bisector, for homogeneous materials, are reflections of the within-laboratory variability. It is worth mentioning at this point that the example introduced here reveals that the within-laboratory variabilities are small, compared with the systematic between-laboratory variabilities.

Finally, it is sometimes believed that the centroid in a Youden diagram gives a good estimate of the true values of the two materials. It is the authors’ view that this may often not be the case. The scatter of the results obtained by several laboratories around the mean value is a result of both random within-laboratory and systematic between-laboratory variability. Averaging may result in the mean value being close to the true value. It may also result in a mean value which is, for example, much higher than the true value. Suppose, for example, that all the laboratories had a similar positive bias in one of the steps of the procedure. The result will be a scatter about a mean value which maybe higher than the true value.

3.4. Determination of methylparaben in surface water by liquid chromatography-negative electrospray ionization tandem mass spectrometry

Since parabens were discovered due to their antimicrobial activity, they have been widely used as bactericides, fungicides and preservative agents in many cosmetics, pharmaceuticals, personal care products and food, amongst other consumer products.

Although the toxicity of these compounds is very low, they present a weak estrogenic activity and are considered as endocrine disruptors. That is why they have been classified as emerging contaminants attracting scientific attention on a global scale. These compounds, after consumption, reach wastewater treatment plants, where they are not efficiently removed; thus, they end up in the environment. These chemical compounds consist of detergents, soaps and/or other products.

An analytical method for the determination of methylparaben (MeP) in surface water samples is applied during a period of 15 days.

The method is based on solid-phase extraction (SPE) and subsequent analysis by high-performance liquid chromatography-triple quadrupole mass spectrometry (HPLC-QqQ-MS) [38]. The Youden plot has been applied to the results. A detailed description of the completed experimental procedure is shown below:

3.4.1. Experimental part Materials and reagents

HPLC-grade water, acetone and methanol were purchased by a company in Spain. Analytical-grade formic acid (98%), sulphuric acid (97%) and hydrochloric acid (37%) were acquired from another specialist industry in Spain. Ammonium acetate, MeP (≥99%), was bought from a firm in the USA.

Three millilitres SPE cartridges, packed with 60 mg of Oasis HLB, were purchased from Waters (Milford, MA, USA).

Stock solution, at a concentration of 1000 mg L−1, was prepared in methanol and stored at 4°C. Working solutions were prepared by diluting the stock standard solutions in methanol. Sample collection

Surface water samples were collected in May 2016 and were taken from Guadalquivir River (Seville, Spain). These samples were collected in amber glass bottles precleaned with acetone and methanol. In order to stabilize them, acetonitrile was immediately added after sampling to achieve a final concentration of 0.5% v/v. Stabilized samples were stored at 4°C until further analysis, which was carried out within 48 h after sample collection. Prior to extraction, samples were filtered through a 1.2 μm glass-fibre membrane filter supplied by a British manufacturer. Solid-phase extraction

Oasis HLB cartridges were conditioned using 3 mL of methanol followed by 3 mL of 0.5N hydrochloric acid and 3 mL of de-ionized water. Prior to extraction, the pH of the sample was adjusted to 2 by the addition of sulphuric acid 40% (v/v). The acidified sample (250 mL) was percolated through the cartridge at a flow rate of approximately 10 mL/min−1. Then, the volumetric flask containing the sample was rinsed with 5 mL of de-ionized water, and the extract was added to the cartridge.

After loading the cartridges, they were washed with 5 mL of de-ionized water and dried for 10 min. The elution of the analytes was carried out with four successive aliquots of 1 mL of methanol at a flow rate of about 1 mL/min. The eluates were collected in 10-mL collection tubes and evaporated to dryness at room temperature by a gentle nitrogen stream. Finally, the extracts were reconstituted in 1 mL of methanol, filtered through a 0.45 μm nylon filter, and a 20-μL aliquot was injected into the HPLC instrument. High-performance liquid chromatography-mass spectrometry

Separation was carried out using an Agilent 1200 series HPLC chromatography system equipped with a vacuum degasser, a binary pump, an autosampler and a thermostated column compartment. MeP was isolated with a Zorbax Eclipse XDB-C18 Rapid Resolution HT (4.6 mm × 50 mm i.d.; 1.8-μm particle size) column, using an isocratic elution with methanol (30%) and aqueous 5 mM ammonium acetate solution (70%) as mobile phase. Flow rate was 0.6 mL/min. The injection volume was 20 μL. The column temperature was maintained at 25°C.

The HPLC system was coupled to a 6410 triple quadrupole (QqQ) mass spectrometer (MS) equipped with an electrospray ionization source operating in negative mode. Two transitions were used for its identification (92.1m/z) and confirmation (136.1m/z).

The ionization of analytes was carried out using the following settings:

  • MS capillary voltage 3000 V.

  • Drying-gas flow rate 9 L/min−1.

  • Drying-gas temperature 350°C.

  • Fragmentor 70 V.

  • Collision energy 16 V.

  • Nebulizer pressure 40 psi. Instrument control and data acquisition were carried out with MassHunter software.

3.4.2. Results and discussion

The results for the different days are shown in Table 5.

DayArea MeP (HPLC-MS/MS)MeP concentration (ng/L)

Table 5.

Determination of MeP in two similar surface water samples in different days.

Because the F-ratio (3.000) is larger than F(0.05,14,14), which is 2.484, it is concluded that the systematic errors between the analysts are significant at the 95% confidence level.

Figure 7 depicts the results. By observing it, one may see that most of points fall in either the first or the third quadrant. Two days (5 and 11) are outside the 95% circle. The following comments may be drawn hereto:

  • Days 4, 14 and 15 are within the 95% circle, near each other in the first quadrant.

  • Days 2, 3 , 8, 10 and 13 are within the 95% circle, close together, in the third quadrant, although the latter is the very edge of the circle.

  • Days 1, 9 and 12 are within the 95% circle, close together but in different quadrants.

  • Day 7 is also within the circle but farther away from the other days.

  • Day 6 is within the circle but at the very edge of it.

Figure 7.

Youden plot (determination of MeP).

Finally, the z-score approximation is applied (pls. refer to Section 2.2) [39]. The results are shown in Table 6. A comparison of the results obtained by the laboratories using the Youden plot and Z-Score is done as follows:

  • The results obtained with the z-score are in line with those observed in the Youden plot.

  • In both methods, days 5 and 11 whose systematic errors are relatively high compared to random errors showed unsatisfactory results.

  • Day 6 is discarded using z-score but not on the Youden plot, although it is found very close to the boundary of the 95% circle.

  • Days 1, 2, 3, 9, 12 and 14 showed satisfactory results with the z-score method applied.

  • Days 4, 7, 8, 10, 13 and 15 showed questionable results. Day 7 is the furthest from the other days in the Youden Plot, and day 13 is on the edge of the 95% circle.


Table 6.

Application of z-score to the determination of MeP in two similar surface water samples.

3.5. Antibody concentrations: an ISO-13528:2005(E) example

Finally, a confidence ellipse, calculated as described in ISO 13528 [40], has been used as an aid to interpret the plot so as to deal with those situations, in which the two samples differ in magnitude of the property measured. A Youden Plot for the original data may be derived from the z-scores (as explained below). It is constructed by plotting the z-scores obtained on one of the materials against the z-scores obtained on the other material.

For ease of reference, let A and B denote the two materials:

  1. Calculate the averages and standard deviations of the two sets of data and the correlation coefficient (ρ^).

  2. Calculate the z-scores for the two materials as follows:


  1. Calculate the combined scores for the two materials:


  1. In terms of the standardized variables, the confidence ellipse may be written in terms of Hotelling’s T2:




Here, F(1 − α)(2, p − 1) is the tabulated (1-α)-fractile of the F-distribution with 2 and (p-1) degrees of freedom.

As recommended by the International Organization for Standardization (ISO), the ellipse may be drawn on a graph with the z-scores ZA and ZB as the axes by plotting a series of points for −T < ZA < T with


To interpret the Youden Plot, the combined z-scores may be used. The highest combined z-score corresponds to the highest significance level of 100%.

Also, the combined z-scores aid to identify the outlying points.

When a Youden Plot is constructed, it may be interpreted as follows:

  • If a point is well separated from the rest of the data, it means that the result is subject to bias because the laboratory did not follow the test method correctly. Points far away from the major axis could also represent laboratories showing a considerable variation and inadequate repeatability outcomes.

  • A positive relationship between the results for the two materials indicates that there is a cause of between-laboratory variation that is common to many of the laboratories, suggesting that the methodology may not have been adequately specified. If the method is reproduced, it may lead to an overall improvement.

Table 7 shows data obtained by testing two similar samples for antibody concentrations and the calculations required to derive the confidence ellipse. With p = 29 laboratories and using a significance level of 100% = 5%, F(1 − α)(2, p − 1) = 3.34. Hence, T = 2.632. The ellipse is shown, together with the points representing the z-scores, in Figure 8, in tandem with the ellipses pertaining to probability levels of 100% = 1% and 0.1%.

Combined Z-score
Allergen A (U)Allergen B (U)ZAZBZAB
Standard deviation3.2942.8971.0001.000
Units (U) in thousands (k) per litre (l) of sample, where a unit is defined by the concentration of an international reference material
Hotelling’s T26.927

Table 7.

Data and calculations on concentrations of antibodies for two similar allergens.

Figure 8.

Youden plot of z-scores fromTable 7(concentrations of antibodies for two similar allergens).

As depicted in Figure 8, laboratories 5 and 23, with combined z-scores of 1.641 and 2.099, respectively, are found in the top right-hand quadrant. Laboratory 26 has a high z-score on material B (2.019) compared to material A (−0.055) and a combined z-score of 2.059 followed by laboratory 8 with a combined z-score of 1.501. The points for laboratories 23 and 26 fall between the ellipses for the 5% and 1% probability levels. Thus, the results may be perceived as giving rise to warning signals.


4. Conclusions

Intercomparison exercises are of great value in systems of quality assessment allowing the examination of the analytical process and the generated results. Youden plot or Youden analysis is particularly aimed at interlaboratory comparisons, obtaining accurate information without much effort. The main characteristic is its ability to separate the systematic and random errors with minimal effort on the part of the participants. To implement the method:

  • Two similar materials (samples A and B) with small differences in the concentration of the characteristics (magnitude) are required with the purpose of determining their content [1993].

  • A scatter plot is drawn in which the x-axis indicates one of the reported values and the y-axis the other, being the scale units the same along its axis. Each pair of results, corresponding to a given laboratory, is a point in the Youden plot 2.0.

  • The points occur mainly in the (+ +) and (− −) quadrants, forming an elliptical pattern around a line bisecting these quadrants at a 45° angle, when systematic errors are larger than random errors. The circle centred at the intersection of the reported value medians (once outliers removed) affords a test on the randomness of results, its radius being a multiple of the within-laboratory standard deviation. The suitability and benefits of the Youden method have been applied to a number of results obtained from different fields such as food; clinical and pharmaceutical applications, in order to determine the concentration of polyunsaturated fatty acids in fats and oils; total blood cholesterol; and aspirin in pharmaceutical preparations, respectively.

  • In most cases, it is observed that systematic errors may be regarded as the main cause of variation with most of points in quadrants (+ +) and (− −). Finally, a detailed procedure for the determination of methylparaben in surface waters, of special relevance nowadays in the environmental field, has been developed by liquid chromatography-tandem mass spectrometry. In this experimental system, an alternative to the Youden method based on the z-score has also been assessed showing no discrepancy between both methods.

  • A confidence ellipse is proposed in ISO 13528:2005 to deal with those situations where the two samples differ in magnitude of the property measured. The extension of the Youden method based on the confidence ellipse may be used as the building platform for further studies, incorporating amongst other three- and four-dimensional Youden Plots.


Abbreviation list


Boron trifluoride-methanol


Food and Drug Administration


International Organization for Standardization




Mass spectrometer


Polyunsaturated fatty acids


Triple quadrupole


Solid-phase extraction

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Julia Martín, Nieves Velázquez and Agustin G. Asuero (February 22nd 2017). Youden Two-Sample Method, Quality Control and Assurance - An Ancient Greek Term Re-Mastered, Leo D. Kounis, IntechOpen, DOI: 10.5772/66397. Available from:

chapter statistics

2095total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Using Lot Quality Assurance Sampling to Monitor the Prevalence of Abortions and the Quality of Reproductive Health Care in Armenia

By Joseph J. Valadez and Lusine Mirzoyan

Related Book

First chapter

Integrated Management Systems and Sustainable Development

By Burhan Başaran

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us