Open access peer-reviewed chapter

High-Order Calibration and Data Analysis in Chromatography

By Hai-Long Wu, Xiao-Dong Sun, Huan Fang and Ru-Qing Yu

Submitted: November 29th 2017Reviewed: May 11th 2018Published: November 5th 2018

DOI: 10.5772/intechopen.78624

Downloaded: 293

Abstract

Multiway data analysis and tensorial calibration are gaining widespread acceptance with the rapid development of multichannel chromatographic instruments. By combining chromatographic techniques with chemometrics based on high-order calibration methods, some traditional problems in analysis, such as complicated pretreatment steps, long elution times, or even worse analysis results, can be avoided/improved. This chapter presents an overview from second-order to third-order data that cover theories and applications together with corresponding data processing in chromatography.

Keywords

  • chemometrics
  • second-order advantages
  • trilinear model
  • high-performance liquid chromatography-diode array detection
  • liquid chromatography-mass spectrometry

1. Introduction

Chromatography, first employed in Russia by the Italian-born scientist Mikhail Tsvet in 1900, is a laboratory technique for the separation of a mixture. In the first decade of the twentieth century, scientists continued to work with chromatography primarily for the purpose of separating plant pigments such as chlorophyll, carotenes, and xanthophylls. Since these pigments separated showed various colors (green, orange, and yellow, respectively), they gave the technique its name. After various types of chromatography had sprung up in the 1930s and 1940s, it became useful for many separation processes. Up to now, many chromatographic techniques have been developed, and they can be classified according to different properties. Based on chromatographic bed shape techniques, they can be divided into column chromatography and planar chromatography. Also, gas chromatography and liquid chromatography are classified by the physical state of mobile phase. In addition, there are also many other categories classified by other properties (i.e. separation mechanism, special techniques); but the chromatographic classification is out of scope of this chapter.

The purpose of chromatography is to separate the components of a mixture for later use. The mixture is dissolved in a fluid called the mobile phase, which carries it through a structure holding another material called the stationary phase. The separation is based on differential partitioning between the mobile and stationary phases. Subtle differences in a compound’s partition coefficient result in differential retention on the stationary phase and thus affect the separation.

Nowadays, due to its prominent separation properties, chromatography techniques have become an indispensable tool for the routine analysis and research in pharmaceutical, biomedical, food, and environmental industries [1]. However, there are two main drawbacks needed to be solved/improved. The first one is about the sample itself; when complex matrix samples are analyzed, some proper tedious pretreatment procedures, such as extraction and purification, are necessary to remove the potential interferences contained in complex matrices. Optimizing these procedures is rather tedious and large sum of solvents’ consumption are inevitable, making this method become uneconomical and environmentally unfriendly. What’s more, in traditional chromatography analysis, when a complex sample is analyzed, the overlap between the analytes and matrix constituents is frequently observed; consequently, a long time or much more complex chromatography condition is required for the separation. In general, the elution time for each sample often costs 30–50 min, which is quite time consuming and inefficient. In the same time, some other problems such as baseline drift, changes in the shape of the peaks, incomplete extraction of the analytes, and shifts in the elution times may also decrease the quality of the final result of the analysis. Another problem with chromatography is due to its universal aspect. There are now hundreds of different chromatographic columns, which can be obtained from the market and new ones are being developed constantly [2, 3]. However, when faced with the large number of possible columns, it is hard for analysts to select which could be the most appropriate one for a given condition. Meanwhile, many laboratories and public institutions may not possess all available stationary phases and the column performance may become worse during long-term storage and/or usage in LC analysis [4]. Thus, the analysts often waste a lot of time in search of the most appropriate one from several different stationary phases for analysis. All of the above shortcomes may hinder the further development for chromatographic applications.

A current trend in quantitative analysis is to avoid tedious sample preprocessing steps and long chromatographic elution, exploiting the ability of modern data processing tools for mathematical resolution of coeluting components [5]. The combination of suitable chemometric tools along with chromatographic-spectral data or chromatographic-mass data may solve/improve the problem. With less time and solvent consumption, better quantitative results can be obtained. The multiway (second- and third-order) calibration based on “mathematical separation” is a dazzling pearl in the field of chemical analysis and can calibrate the potential interferences and resolve coeluting peaks successfully in real samples with minimum sample preparation steps. Accurately concentration profiles of individual components of interest can also be obtained. This property generally refers to the prominent “second-order advantage,” which has enormous potential in multiway analysis and becomes a recent focus of theoretical research and practical uses. Combining chromatography with multiway calibration has some distinct advantages because it can simplify the tedious multistep pretreatment and exploration of complicated chromatography separate conditions, showing the potential abilities for the analysis of various samples with different interferences at a time. Tedious pretreatment or purification procedures can be discarded by using prominent “mathematic separation” instead of tradition “physical/chemical separation.” HPLC coupled with second-order calibration methods is especially popular for it can rapidly and simultaneously determine multiple compounds in complex backgrounds with unknown interferences, resolve coeluted peaks, and remove baselines drifts [6, 7, 8, 9, 10].

So far, a lot of algorithms for decomposition of multiway data arrays have already been proposed and genuinely provided alternative tools to analytical chemists for the convenient study of the body of multiway data arrays. Several methodologies have also been expounded in “Encyclopedia of Analytical Chemistry” [11] and “Factor Analysis in Chemistry” [12] at some length. To help readers systematically and intensively understand about concerning algorithms, a detailed description including multilinear models, the multiway cyclic symmetry property, the algorithms for multiway calibration, the estimation of the chemical rank, the toolbox for multiway calibration, and other fundamental issues and applications in chromatography has been presented in this paper.

2. Terminology and nomenclature in multiway data

To facilitate understanding for readers when dealing with multivariate analysis on multiway data arrays, it is necessary to introduce the terminology and nomenclature used in multiway data in the following.

2.1. Terminology

The relationship and difference between the concepts of “data order” and “data way” should be investigated firstly. The term “order” is the dimensions for data of a single sample and term “way” represents the data arrays stacked by all samples with similar properties. As shown in Figure 1, zeroth order corresponds to instruments producing a single response per sample, such as the reading of a pH meter or the absorbance at a single wavelength. First-order data are arranged as a vector or first-order tensor for a single sample, such as UV, fluorescence, infrared, and nuclear magnetic resonance spectra. At the same time, second-order data are formed when matrix data can be obtained for a single sample. There are two ways that second-order data can be obtained: (i) using a single instrument such as excitation-emission spectrofluorimeter (EEMs) or diode-array spectrophotometer to monitor the kinetics of a chemical reaction and (ii) using the hyphenated instruments such as high performance liquid chromatography-photodiode array detection (HPLC-DAD) or liquid chromatography-mass spectrometry (LC-MS). When the second-order data that obtained from a series of samples (calibration and prediction samples) are stacked in one direction, three-dimensional array, which is also called as three-way array can be obtained, and the corresponding data are usually known as three-way data. Hence, when a series of samples are stacked into a single, zeroth-order (a scalar), first-order (a vector), second-order (a matrix), third-order (a three-way array), and higher order tensors can yield the corresponding one-way, two-way, three-way, four-way, and N-way data sets, respectively. The zeroth-order tensor calibration is also called as univariate calibration. This method has great restraint on its application as it needs full selectivity for the signals of target analytes. Except univariate calibration for the analysis of data, others are known as multivariate calibration, the analysis of second-order tensor and higher order tensor is denoted as multiway multivariate or multicomponent calibration.

Figure 1.

Relationships and differences between the concepts of “data order” and “data way” described with symbols.

Meanwhile, a detailed description for various sample types is also provided. Based on different functions, samples can be divided into calibration, prediction, and actual sets. Actual sets include predicted samples (the target analyte(s) is (are) unambiguously included) and/or real samples (whether the target analyte(s) is (are) included or unknown). Constituents present in the samples used for calibration and validation are regularly called “known” or “expected,” which is expected in these sets as they are expected to be existed in actual samples. The expected constituents can be further divided into “calibrated” and “uncalibrated.” The concentrations of former ones used for calibration are predesigned and known, while those of “calibrated” components in actual sets can also be available, involving the analyte(s) of interest. On the other hand, the constituents which are only included in actual sets are called “unknown” or “unexpected” and also potential interferences.

2.2. Nomenclature

In this chapter, lowercase italics represent scalars; two-way matrices are denoted by bold capitals; underlined bold capitals designate three-way arrays, the superscript T represents the transpose of a matrix, and the superscript + is the Moore-Penrose generalized inverse of a matrix. || · ||F designates the Frobenius matrix norm. To have a better understanding about the multiway calibration, readers are advised to comprehend an inner cyclic symmetry property of trilinear decomposition proposed by our laboratory in 1996 and also called as three-way cycle symmetry. As shown in Figure 2, elements, vectors, subscripts, and physical modes in resolved matrices, sliced matrices, and unfolded matrices, together with residue and resolution formulas, all obey the principle of inner cyclic symmetry property, circumrotating along the same way. Table 1 provides the detailed information of the nomenclature mentioned. Similar to the three-way cyclic symmetry, the four-way and five-way cyclic symmetries for quadrilinear and quinquelinear decomposition can be obtained easily by simple mathematical manipulation of exchanging the symbols similar to three-way cyclic symmetry. These regularities provide useful instructions for the standardization of symbol systems in multiway data analysis, for better understanding the essence of multiway multilinear decomposition, developing new multiway calibration algorithms, and exploring multilinear algebra in mathematics.

Figure 2.

Schematic representation for three-way cyclic symmetry property.

XThree-way data array
I, J, KThe three dimensions of three modes of X
xijkThe ijkth element of X
AI × N, BJ × N, CK × NThe three underlying profile matrices of X with I × N, J × N, and K × N, respectively
ain, bjn, cknThe inth, jnth, and knth elements of the three underlying profile matrices A, B, and C, respectively
a(i), b(j), c(k)The ith, jth, and kth row vectors of profile matrices A, B, and C, respectively
diag(a(i)), diag (b(j)), diag(c(k))Diagonal matrices with elements equal to the elements of a(i), b(j), and c(k), respectively
Xi.., X.j., X..kThe ith horizontal, jth lateral, and kth frontal slices of X, respectively
Ei.., E.j., E..kThe ith horizontal, jth lateral, and kth frontal slices of the three-way array residue E, respectively
eijkThe ijkth element of the three-way residue array E

Table 1.

Detailed information of the nomenclature mentioned.

3. Theory

3.1. Multilinear models

According to the data type and its inner cyclic symmetry property, the multilinear models can be divided into trilinear, quadrilinear, quinquelinear, and even higher linear models. In chromatographic analysis combined with multiway calibration, the trilinear and quadrilinear models are commonly used.

3.1.1. Trilinear model

Harshman [13] together with Carroll and Chang [14] first proposed the PARAFAC (PARallel FACtor analysis) model with the name of CANDECOMP in the year of 1970. In this trilinear model, each element xijk of a three-way array X (I×J×K) can be reasonably fit to the following equation:

xijk=n=1Nainbjnckn+eijk,i=1,2,,I;j=1,2,,J;k=1,2,,K.E1

where N represents the total number of detectable components including the component(s) of interest, uncalibrated background(s), and unknown interferences. Figure 3 illustrates the graphical representation of a trilinear model of three-way data array X. A, B, and C are the three underlying profile matrices of X with I × N, J × N, and K × N, respectively; I is the three-way diagonal core array of size N × N × N with ones on the superdiagonal and zeros elsewhere; and E is the three-way residue data array of size I × J × K. To further comprehend the mathematic meaning of the trilinear model graphically expressed, the response data array X is returned with three inverse steps as described in Figure 4.

Figure 3.

Schematic representation of trilinear model.

Figure 4.

The inverse procedures to return three-way data array X.

3.1.2. Quadrilinear model

Considering a model of the real-valued four-way data array X (I × J × K × L), in which each element xijk can be expressed as [8, 15]:

xijkl=n=1Nainbjnckndln+eijkl,i=1,2,,I;j=1,2,,J;k=1,2,,K;l=1,2,,L.E2

where ain, bjn, ckn, and dln correspond to the underlying profile matrices AI × N, BJ × N, CK × N, and DL × N of X (I × J × K × L), respectively. The term eijkl is the element of the four-way residual array E (I × J × K × L). Then, the modeled part of xijkl is quadrilinear in the parameter sets ain, bjn, ckn, and dln. The graphical representation of a quadrilinear model of four-way data array X is shown in Figure 5.

Figure 5.

Schematic representation of quadrilinear model.

3.2. Data preprocessing

The correctness of decomposition of a multilinear model requires that the multilinear model holds multilinearity. However, there are some nonmultilinear factors which can cause a multilinear model to deviate the multilinearity. For example, in the chromatography type of trilinear model such as HPLC-DAD and LC-MS data, the time shift and baseline problem among different runs will cause the trilinear model to deviate the trilinearity. Thus, the data arrays in multivariate calibration must need appropriate data preprocessing procedures before a multilinear decomposition. The schematic representation of entire chemometrics-assisted LC-DAD and LC-MS analytical strategy is shown in Figures 6 and 7, respectively.

Figure 6.

Schematic representation of entire chemometrics-assisted LC-DAD analytical strategy.

Figure 7.

Schematic representation of entire chemometrics-assisted LC-MS analytical strategy.

3.3. Algorithm

3.3.1. ATLD

The ATLD algorithm is a universal second-order calibration method for decomposition of three-way data arrays. It is based on an alternating least squares principle without any constrains and an improved iterative procedure that utilizes the Moore-Penrose generalized inverse based on singular value decomposition. It has been widely used in three-way data analysis due to the advantages of being insensitive to excessive component numbers and fast convergence.

According to its cyclic symmetry property, the trilinear model can also be expressed in matrix notation as follows:

Xi..=BdiagaiCT+Ei..,fori=1,2,,I,E3
X.j.=CdiagbjAT+E.j.,forj=1,2,,J,E4
X..k=AdiagckBT+E..k,fork=1,2,,K,E5

Due to the property called as cyclic symmetry of the trilinear model, the three expressions are equal to each other in mathematics. According to Eqs. (3)(5), the loss function to be minimized is the sum of the squares of the elements of the residual matrices, which can be expressed as:

σai=i=1IXi..BdiagaiCTF2,E6
σbj=j=1JX.j.CdiagbjATF2,E7
σck=k=1KX..kAdiagckBTF2,E8

By using the loss functions abovementioned, ATLD alternately minimizes the three objective functions over C on fixed A and B, over A on fixed B and C, and then over B on fixed C and A. The updates for the three profile matrices (A, B, and C) are based on the least squares principle and can be represented as follows:

aiT=diagmB+Xi..CT+,fori=1,2,,I,E9
bjT=diagmC+X.j.AT+,forj=1,2,,J,E10
ckT=diagmA+X..kBT+,fork=1,2,,K,E11

herein diagm(·) stands for a column N-vector and its elements are diagonal elements in square matrix. In every iteration cycle, A and B are normalized column-wise with unit length. With the help of the resolved profile matrices C, we can get the concentrations of analytes of interests in actual samples via regression of the appropriate column of C corresponding to each analyte against its standard concentrations.

Due to the operation based on sliced matrices with less size and two other major strategies, ATLD holds the fastest convergence. The truncated least squares method employs the tolerance to truncate the small singular values in the singular value decomposition. In addition, selecting diagonal elements makes ATLD retain trilinearity property indeed and be insensitive to the excessive estimation of component numbers. The advantages have been reviewed by Fleming and Kowalski [16]. Based on the above advantages, it is very suitable to handle second-order data obtained from hyphenated instruments, such as HPLC-DAD, LC-MS, and GC-MS.

3.3.2. SWATLD

The SWATLD algorithm, as a derivative of ATLD, is also widely employed as it can yield better results in many cases. It alternately minimizes three objective functions with intrinsic relationship and also holds the characteristics of fast convergence and being insensitive to excessive component numbers. The detail explanations of these properties have been provided by authors in the original paper [17]. Three new residues can be expressed as:

B+Xi..=diagaiCT+B+Ei..,Xi..C+T=Bdiagai+Ei..C+T,fori=1,2,,I,E12
C+X.j.=diagbjAT+C+E.j.X.j.A+T=Cdiagbj+E.j.A+T,forj=1,2,,J,E13
A+X..k=diagckBT+A+E..k,X..kB+T=Adiagck+E..kB+T,fork=1,2,,K,E14

By introducing some reasonable weight terms, three new objective functions are established and can be expressed as follows:

SA=i=1IB+Xi..diagaiCTT×diagsqrt1./diagmCTCF2+i=1IXi..CT+Bdiagai×diagsqrt1./diagmBTBF2,E15
SB=j=1JC+X.j.diagbjATT×diagsqrt1./diagmATAF2+j=1JX.j.AT+Cdiagbj×diagsqrt1./diagmCTCF2,E16
SC=k=1KA+X..kdiagckBTT×diagsqrt1./diagmBTBF2+k=1KX..kB+TAdiagck×diagsqrt1./diagmATAF2.E17

Due to the unique optimizing strategy, this algorithm is more efficient than others. It can provide more satisfactory results than ATLD with moderate noise levels. Moreover, it can deal with the problem of moderate collinearity, but it is not so effective when data are collinear severely.

3.3.3. APTLD

The APTLD algorithm was developed by Xia et al. [18], and it can provide some improved properties. It alternately minimizes three new least squares-based objective functions by using the constraint functions as penalty terms of the PARAFAC error. Eqs. (12)(14) are the new objective functions, which alternately used as the constraint terms. By introducing large penalty terms and combining them with residue functions (18)(20) to establish three objective functions, APTLD transforms these constrained problems into non-constrained ones. Then, it alternately minimizes the following three objective functions to resolve the model:

S(A)=k=1KX..kAdiag(c(k))BTF2+q(j=1jdiag(sqrt(1./diagm(BTB)))(C+X.j.diag(b(j))AT)F2+k=1K(X..k(BT)+Adiag(c(k)))diag(sqrt(1./diagm(CTC)))F2),E18
S(B)=i=1IXi..Bdiag(a(i))CTF2+r(k=1kdiag(sqrt(1./diagm(CTC)))(A+X..kdiag(c(k))BT)F2+i=1I(Xi..(CT)+Bdiag(a(i)))diag(sqrt(1./diagm(ATA)))F2),E19
S(C)=j=1JX.j.Cdiag(b(j))ATF2+p(i=1Idiag(sqrt(1./diagm(ATA)))(B+Xi..diag(a(i))CT)F2+j=1J(X.j.(AT)+Cdiag(b(j)))diag(sqrt(1./diagm(BTB)))F2),E20

where p, q, and r represent penalty factors. The performance of APTLD depends on the choice of the penalty factor values. When the values are very small, it will lead to a lot of iterations and sensitivity to excess factors, which is close to that of PARAFAC algorithm; particularly, when p = q = r = 0, APTLD can be regarded as a variant of PARAFAC. However, this algorithm will become insensitive to excess factors and speed up convergence when larger values of p, q, and r are selected. According to the variance among different trials and computational burdens, a further increase in p, q, and r values will make APTLD perform theoretically better. Therefore, its performance can be exquisitely improved by adjusting the penalty factors p, q, and r on the basis of particular circumstances and special needs.

3.3.4. APQLD

The APQLD algorithm [19] as an extension of APTLD for decomposition of quadrilinear data is applied to third-order calibration. Similar to APTLD, four objective functions can be obtained as:

SA=k=1Kl=1LX..klAdiagdldiagckBTF2+q(k=1Kl=1LX..klBT+AdiagdldiagcksqrtWBF2+j=1Jk=1KsqrtWDD+Xjk..diagckdiagbjATF2),E21
SB=l=1Li=1IXi..lBdiagaidiagdlCTF2+r(l=1Li=1IXi..lCT+BdiagaidiagdlsqrtWCF2+k=1Kl=1LsqrtWAA+X..kldiagdldiagckBTF2),E22
SD=j=1Jk=1KX.jk.DdiagckdiagbjATF2+p(j=1Jk=1KX.jk.AT+DdiagckdiagbjsqrtWAF2+i=1Ij=1JsqrtWCC+Xij..diagbjdiagaiDTF2),E23

where WA = diag(1./diagm(ATA)), WB = diag(1./diagm(BTB)), WC = diag(1./diagm(CTC)), and WD = diag(1./diagm(DTD)). APQLD algorithm decomposes the quadrilinear model by alternatively minimizing the four objective functions abovementioned. The performance of APQLD also depends on the selection of the penalty factors p, q, r, and s. Obviously, it can be considered as a variant of the four-way PARAFAC when the four penalty factors equal to 0.

APQLD retains the second-order advantage possessed by second-order calibration and holds additional advantage. By introducing a new fourth mode, it can relieve the serious problem of collinearity, which cannot be solved by three-way algorithms.

3.4. Rank estimation

It is always an important and intractable problem to estimate chemical ranks (the number of factors or components) for the trilinear model before decomposing a three-way data array. Theoretically, it can be seemingly solved by selecting the appropriate algorithms, which are insensitive to the excessive component numbers (chemical ranks). Nevertheless, these algorithms also guarantee that the component number (chemical rank) chosen should be no fewer than the underlying one. As a matter of fact, when the component number selected is far more than the actual one, it may lead to a model fitting error and a large deviation for the predicted results. On the contrary, the performances of the algorithm on providing accurate solutions will be largely improved when the most appropriate factors are chosen in analytical system.

Based on this, a lot of methods have been developed for estimating the chemical ranks. In general, they can be roughly fallen into two main categories. The first one is on the basis of the trilinear model, which includes split-half analysis [20], Wu’s maximum rank method [21], core consistency diagnostic (CORCONDIA) [22], ADD-ONE-UP [23], and self-weighted alternating trilinear decomposition and Monte Carlo simulation (SWATLD-MCS) [24]. The core of split-half analysis concerns a relatively complex splitting skill, and hence the result depends on splitting schemes greatly. CORCONDIA and ADD-ONE-UP are two of the most commonly used methods in determining the chemical ranks. However, they are quite time consuming sometimes. Furthermore, the severe collinearity data may also lead to a heavy computation burden and even get error results. Self-weighted alternating trilinear decomposition and Monte Carlo simulation (SWATLD-MCS) operate in two main steps. First of all, Monte Carlo simulation is applied to generated one pseudo three-way data array. Sorted mean relative concentration values can then be obtained by applying SWATLD to decompose the three-way data array created by MCS. By comparing the sorted mean relative concentration value, this method can determine the chemical rank. The other ones belong to nonmodel methods such as orthogonal projection approach (OPA) [25], two-mode subspace comparison (TMSC) [26], factor indicator function (IND) [27], subspace projection of pseudo high-way array (SPPH) [28], linear transform method incorporating Monte Carlo simulation (LTMC) [29], and region based on moving windows subspace projection technique (RMWSPT) [30]. Though all of the above methods can be applied to rank estimation, it is impossible to find one among them which can guarantee the correct results under all situations. Actually, more than one method is often utilized in analysis to ensure the accuracy of the analytical results [8, 15].

3.4.1. Maximum rank method

The maximum rank method was firstly proposed by Wu et al. [21] to estimate the chemical rank for ATLD and ATLD’s variants, as the following form shows:

rankX¯=maxrankXI×JKrankXJ×KIrankXK×IJ,E24

In practice, the number of factors will also be determined as follows:

rankX¯=maxranki=1IXi..rankj=1JX.j.rankk=1KX..k,E25
rankX¯=maxrankXI×JKXI×JKTrankXJ×KIXJ×KITrankXK×IJXK×IJT,E26

where rank (.) denotes the numerical rank estimate of a matrix based on a singular value decomposition procedure with a default tolerance. This method is universal and suitable to be used in any instance and can get satisfactory results when estimating the chemical rank of the three-way data.

3.4.2. ADD-ONE-UP

ADD-ONE-UP was proposed by Chen et al. in [23] for determining the chemical rank. It operates by fitting two reconstructed three-way data arrays by PARAFAC with a gradually increasing component numbers and then determines the chemical rank by examining the residual sum of squares (SSR). The method is convenient and powerful, and some nonideal experimental conditions (such as slight collinearity and unknown backgrounds) can be handled.

  1. Unfold the obtained three-way data array X into a two-way data set XI × JK.

  2. Decompose XI × JK by SVD, XI × JK = USVT.

  3. Define Xc = UcScVcT, Uc and Vc consist of the first c columns of U and V, respectively; Sc is a diagonal matrix with diagonal elements equal to the first c diagonal elements of S.

  4. Fold Xc into a three-way data array Xc, then resolve it by PARAFAC with N = c (c = 1, 2, 3,…,). The residual sum of squares is denoted by SSRc.

  5. Repeat steps 3 and 4 until SSRc reaches its minimum or satisfies the equations below: SSRc1 < sc12 and SSRc1 + 1 > sc1 + 12 and SSRc1 + 2 > sc1 + 22 (si represents the ith diagonal element of matrix S and sc12 denotes the variance obtained by the inclusion of c1th component in the truncating step).

  6. Unfold X in another dimension to obtain XIK × J, then perform the same steps from 2 to 5 to get c2, which meets similar relationships like c1.

  7. The factor numbers applied in decomposing the trilinear data array X should be the smaller one between c1 and c2, i.e. F = min(c1, c2).

This method utilizes the eigenvalues of factor analysis and the residuals of trilinear decomposition. It can cope with nonideal experimental conditions like varying backgrounds and moderate collinearity. However, as it is based on the PARAFAC algorithm, ADD-ONE-UP has some drawbacks. It is rather time consuming due to the need to run PARAFAC for many times. Furthermore, this method may suffer from a heavy computational burden by reason of two-factor degeneracies and may yield inaccurate results.

3.4.3. CORCONDIA

The principle of CORCONDIA is to assess the similarity between the superdiagonal array T and the least squares-fitted G with a gradually increasing number of components. CORCONDIA is defined as:

core consistency=100×1d=1Ne=1Nf=1Ngdeftdef2d=1Ne=1Nf=1Ntdef2,E27

where gdef stands for the element of G, tdef represents the elements of T, and N denotes the number of factors in the model.

For an ideal trilinear model, gdef is equal to tdef and the value of core consistency will be equal to 100%. Usually, the model can be regarded as “very trilinear” as the value of the core consistency above 90%, whereas a value nearly 50% will indicate a problematic model, which contains both trilinear and non-trilinear variations. A value close to zero or even negative means that the model is not valid. Although it is an effective method, it suffers from the drawbacks of PARAFAC.

3.5. Some related fundamental issues

3.5.1. Chromatographic peak alignment procedure

Chromatographic peak alignment is a challenge in the field of complex system analysis by multiway calibration methods. Some methods for peak alignment have been developed based on the second-order instruments, which generate a matrix data for per sample. These methods [31], for example iterative target factor analysis coupled to COW (ITTFA-COW), rank minimization (RM), parallel factor analysis alignment, and other recently proposed methods based on multivariate curve resolution-alternating least squares, employ signals of two-way structure to align chromatographic peaks shifts. In theory, these methods are aimed at the alignment of local chromatographic regions and therefore satisfactory results can be obtained for the time shifts existed in the whole chromatogram. They can achieve accurate time alignment regardless of the presence of unknown interferences. Not long ago, Yu and co-workers developed a new algorithm for chromatographic peak alignment, derived from the famous rank minimization method. It aligns time shift among samples and then utilizes trilinear decomposition algorithm to interpret the overlapping chromatographic peaks to quantify target analytes [31].

Figure 8(A) depicts the graphical representation of the rank minimization method (RM). A significant advantage of this method is that alignment can be successfully carried out even when the potential interferences coeluted with the analyte of interest. To have a better view on this method, a series of fixed-size time window (rectangles) along the retention time directions is applied in Figure 8(A). In particular, the red rectangle M0 stands for the retention time range of analyte in the response of reference sample, and the retention time range between green and blue rectangles in the response of a test sample is the underlying time shift ranges of the analyte. By row-wisely moving the fixed-size time window on the test sample along the retention time direction, the rectangles from M1 to Mn, can be extracted from the response of the test sample; then, augmented matrices, which are defined as [M0 | M1],…, [M0 | Mn] [stage 2 in Figure 8(A)] in the retention time direction, can be obtained. Finally, the singular value decomposition is performed on these augmented matrices and results in a list of residual variance. Consequently, the percentage of the residual variance plotted against each chromatographic time shift will mark clearly the time shift point correction corresponding to the minimum residual variance.

Figure 8.

Graphical illustration of RM (A) and ASSD (B).

The abstract subspace difference (ASSD) method uses abstract chromatographic profiles for alignment. Accordingly, the response matrix X can be expressed in the form of singular value decomposition (SVD) notations as follows:

X=USVT+EE28

herein, the column vector U represents the abstract chromatographic profiles, while the V is the abstract spectra profiles; in the strict sense, all of them are not necessarily correspond to the real ones. Suppose that two data matrices have been collected: a reference data, Xref, which includes only one analyte, and a test data, Xtest, which collects the analyte together with other unknown interferences. Hence, based on the singular value decomposition, the abstract chromatographic profiles for reference and test samples can be acquired separately:

Xref=UrefSrefVrefT+Eref,E29
Xtest=UtestStestVtestT+Etest.E30

In the ideal situation, no noise is present, and there is no time shift between reference and test samples. In this case, the mathematical rank of the augmented matrix [Uref | Utest] will be identical to that of Utest. However, in the situations where the chromatographic retention time of the analyte is not the same for the reference and test samples, the mathematical rank of the augmented matrix, [Uref | Utest], will become larger than the actual ones. Therefore, the core of ASSD method is to look for the augmented matrix with minimum mathematical rank for alignment, which is the same as the rank minimization method, except that ASSD uses the abstract chromatographic profiles for alignment instead of the underlying ones.

Figure 8(B) shows the graphical illustration of the ASSD method. In order to calculate the abstract chromatographic profiles for each of the extracted matrices M1 to Mn, an additional step, SVD, has been introduced in the Stage 1 of Figure 8(B). Additionally, this new method uses the last singular value instead of the percentage of residual variance in the last stage to represent time shift correction. In practical measurement, aligning time shift for target analyte between the reference and a test sample according to the critical criterion of the mathematical rank of the augmented matrix is impractical. However, the augmented matrix, [Uref | Utest], will become a seriously ill-conditioned matrix provided that the time shift has been successfully aligned. Hereby, chromatographic peak alignment can be transformed to find the most ill-conditioned augmented matrix among the augmented matrices as shown in the Stage 3 of Figure 8(B). As the total variance is the sum of the squared elements of the augment matrix, [Uref | Utest], it will be a steady state value and equal to the column numbers. Hence, a smaller last singular value will definitely correspond to a more ill-conditioned matrix.

3.5.2. Background drift

Non-trilinear factors such as background drift is unavoidable sometimes in the chromatographic analysis due to the composition of gradient elution and/or nature of complicated matrices, which may lead to wrong analysis results by the aforementioned chemometric algorithms. Amigo and co-workers have summarized the intuitive graphics and mathematical models used in handling chromatographic data issues [32]. Multivariate curve resolution (MCR) methods are typical examples.

A chromatographic background drift correction strategy [33] was developed in 2007 by our group for LC × LC × DAD data. The core idea is to perform trilinear decomposition, which is based on the alternating trilinear decomposition (ATLD) algorithm for the instrumental response data. In analysis, the background drift can be eliminated by regarding it as an extra component or factor. This method uses trilinear decomposition to resolve the raw data, to extract, and subtract the background component from the raw data for acquisition of the signal of analytes with a flat baseline. A detailed schematic description on how to subtract the background drift from raw three-way chromatographic data is illustrated in Figure 9.

Figure 9.

Schematic description on how to remove the background drift from three-dimensional instrumental data.

Recently, a method that uses orthogonal spectral signal projection (OSSP) to simultaneously solve various kinds of chromatographic background drift was studied [33]. The analytical results indicated that OSSP coupled with PARAFAC can be used for handling coelution and background drift problems in chromatographic analysis. It indicates that more accurate analysis results can be obtained, regardless of the presence of background drift and unknown interferences.

4. Application

Based on the “second-order or high-order advantages” provided by chemometrics methods, some actual applications have been developed for the analysis of pharmaceuticals, biological matrices, foods, cosmetics, environmental matrices, and others. Multiway calibration algorithms have been employed to enhance the selectivity and can obtain accurate predicted concentration of analyte(s) of interest free from interference of potential interfering matrix. These applications summarized in Table 2 are reviewed in the following six aspects.

Type of dataAlgorithmAnalytesRef.
Pharmaceuticals
HPLC-DADATLDPuerarin, daidzin, and daidzein[34]
HPLC-DADATLDCostunolide and dehydrocostuslactone[35]
HPLC-DADATLD, SWATLD, AFRIsoniazid and pyrazinamide[36]
Biological matrices
HPLC-DADATLDEleven antihypertensives[37]
HPLC-DADATLDFour tyrosine kinase inhibitors[38]
LC-MSATLDTen β-blockers[40]
LC-MSATLDSix antidiabetic agents[48]
HPLC-DADATLDFive vinca alkaloids[39]
Foods
HPLC-DADPARAFAC, ATLD, SWATLDSudan I and Sudan II[49]
HPLC-DADATLDSix synthetic colorants[1]
HPLC-DADAPTLDSynthetic phenolic antioxidants[9]
HPLC-DADATLD,PCAEight coeluted compounds in tea[7]
HPLC-DADATLDEight flavonoids[42]
HPLC-DADATLDnine polyphenols[43]
HPLC-DADATLDTwelve quinolones[44]
HPLC-DADATLD, PCA-LDAThirteen phenolic compounds[45]
HPLC-DADATLDTwelve polyphenols[46]
LC-MSATLDTen mycotoxins[47]
Environmental matrices
HPLC-DADSWATLDThree pre-emergence herbicides[50]
HPLCATLD1-Chloro-2,4-dinitrobenzene and 3,5-dinitrobenzoic acid[51]
HPLCATLDFive dimethylphenol isomers[53]
HPLCATLDCatechol, resorcinol and hydroquinone[52]

Table 2.

Reviewed applications.

4.1. Pharmaceuticals

In this field, two or three drugs have been simultaneously detected in aqueous solution or Chinese traditional medicine. The data analyzed are second-order tensors, which are obtained by high performance liquid chromatography-photodiode array detection (HPLC-DAD).

Su et al. proposed a method for simultaneously quantifying the main effective constituents such as puerarin, daidzin, and daidzein in traditional Chinese medicine kudzuvine root by using HPLC-DAD with ATLD algorithm [34].

Nowadays, traditional Chinese medicine (TCM) plays an important role in the healthcare system. Thus, considerable attention has been paid to Chinese patent medicine (CPM), which generally consists of several TCMs and other ingredients. It is significantly important to quantify the constituents of CPM and plasma for pharmacological analysis. Liu et al. determined two effective constituents, costunolide and dehydrocostuslactone, in plasma sample and Chinese patent medicine Xiang Sha Yang Wei capsule by using HPLC-DAD coupled with alternating trilinear decomposition (ATLD) algorithm [35].

Besides, Ding et al. determined isoniazid and pyrazinamide by using HPLC-DAD coupled with three different second-order calibration algorithms including ATLD, alternating fitting residue (AFR), and self-weighted alternating trilinear decomposition (SWATLD). The results showed that all the three algorithms could be used for solving overlapped chromatograms and unknown interferences successfully, and the analysis results obtained from AFR were slightly better in this situation [36].

4.2. Biological matrices

Biological samples often contain various endogenous substances such as amino acids, hormones and neurotransmitters. Determining the concentrations of these molecules or metabolites is an integral part of clinical research and also helpful for understanding pathophysiology and mechanism of diseases. Human urine and plasma are commonly primary research systems.

High blood pressure, widely called hypertension, is a cardiac chronic disease with a symptom of sustaining rise in systemic arterial blood pressure. Zhao et al. carried out the simultaneous quantification of 11 antihypertensives, human serum, health product, and Chinese patent medicine samples by using HPLC-DAD with the aid of second-order calibration based on ATLD algorithm [37].

Tyrosine kinases are critical regulators of cell growth and differentiation growth and differentiation. The measurement of concentration of TKIs in different biofluids plays a significant role in optimizing the individual dosage regimen and reducing the risk of inapposite dosages. For the analysis of four tyrosine kinase inhibitors in different plasma samples, HPLC-DAD was utilized without absolutely chromatographic separations by resorting to ATLD algorithm. The contents of four tyrosine kinase inhibitors in different complex plasma samples can be accurately determined [38].

Liu et al. simultaneously determined vincristine, vinblastine, vindoline, catharanthine, and yohimbine in Catharanthus roseus and human serum samples utilizing ATLD algorithm to analyze the resulting three-way data array stacked by HPLC-DAD [39].

β-blockers are the first-line therapeutic agents for treating cardiovascular diseases and also a class of prohibited substances in athletic competitions. Therefore, rapid screening for multiple β-blockers in a single analysis has been of growing demand in clinical toxicology, forensic science, and antidoping control as well. Gu et al. proposed a smart strategy that combines three-way liquid chromatography-mass spectrometry (LC-MS) data with second-order calibration method based on alternating trilinear decomposition (ATLD) algorithm for simultaneous determination of 10 b-blockers in human urine and plasma samples [40]. The quantitative results were validated by the LC-MS/MS operated in multiple reaction monitoring (MRM) mode.

4.3. Foods

The applications in this field cover the analysis of contaminants, essential ingredients, and additives.

Synthetic phenolic antioxidants as food additives were successfully determined in edible vegetable oil by using HPLC-DAD and APTLD [9]. Some extraction procedures, in which the antioxidants of interest would be separated, is unnecessary and the 10 antioxidants can be eluted within 6 min.

Yin et al. proposed a smart strategy that combined HPLC-DAD with ATLD algorithm to solve varying interfering patterns from different chromatographic columns and sample matrices for the rapid simultaneous determination of six synthetic colorants in beverages with little sample pretreatment [1].

Tea is one of the most widely consumed beverages in the world. The biological functions of tea have been reported in numerous studies, such as anti-inflammation, antiatherosclerotic, antioxidant, anticarcinoma, antiobesity, and antiviral properties. These beneficial effects are related to the presence of purine alkaloids and polyphenols in tea. An attractive chemometrics-enhanced HPLC-DAD strategy was proposed by Yin et al. for simultaneous and fast determination of eight coeluted compounds including gallic acid, caffeine, and six catechins in 10 kinds of Chinese teas by using second-order calibration method based on ATLD algorithm [41]. Subsequently, based on the quantitative results, principal component analysis (PCA) was used to conduct a cluster analysis for these Chinese teas.

Propolis is a naturally occurring resinous hive product gathered by worker honeybees from buds and barks of different plant species. Sun et al. developed a fast analytical strategy by combining HPLC-DAD with ATLD algorithm for simultaneous determination of eight flavonoids in propolis capsule samples [42].

Honey is a wholesome natural food product well known for its high nutrition. The antioxidant ability of a number of honeys has been determined and found to be significantly correlated to the contents of polyphenols, which can affect the quality of honeys and their products beneficial for improving overall health and preventing some diseases. By using second-order calibration for development of HPLC-DAD method, Zhang et al. quantified nine polyphenols in five kinds of honey samples successfully [43]. Quinolones, a kind of antibacterial, which is widely used in agriculture for its high antimicrobial activity, were also detected by HPLC-DAD with ATLD algorithm in honey samples [44].

Wine phenolic compounds, as secondary metabolites and functional components, determine the important sensorial characteristics of wines, such as mouth-feel, fragrance, and color. The combination of HPLC-DAD and second-order calibration method based on ATLD has been used for the determination of 13 phenolic compounds in red wines, and linear discriminant analysis (PCA-LDA) was applied for distinguishing wines aged for years [45]. Similarly, the same strategy was carried out by Wang et al. for simultaneously quantify 12 polyphenols in different kinds of apple peel and pulp samples [46].

Mycotoxins are a class of highly carcinogenic substances often naturally occurring in the moldy foods, especially cereals. Liu et al. proposed a smart strategy that combines three-way LC-MS data with second-order calibration method based on ATLD algorithm for direct, fast, and interference-free determination of multiclass regulated mycotoxins in complex cereal samples [47]. Ten mycotoxins with different property could be fast eluted out and detected by full scanning MS with a segmented fragment program to enhance the sensitivity.

By using LC-MS in combination with second-order calibration method based on ATLD algorithm, Gu et al. simultaneously green determined six coeluted sulfonylurea-type oral antidiabetic agents in healthy herbal teas and human plasma samples [48]. The strategy proved to be a promising method for resolution and determination of coeluted multianalytes of interest in complex samples while avoiding elaborate sample pretreatment steps and complicated experimental conditions as well as more sophisticated high-cost instrumentations.

For the determination of Sudan dyes in hot chilli samples, HPLC-DAD was employed without completely chromatographic separations by using PARAFAC, ATLD, and SWATLD [49]. The low contents of Sudan I and Sudan II could be accurately determined in complex chilli mixtures.

4.4. Environmental matrices

In this field, we analyzed the analytes in aqueous solution, soil, tap water, river, and effluent water, mainly containing organic contaminants and pesticides.

Herbicides, which are chemicals often employed to kill weeds without causing injury to desirable vegetation, have been widely used. These may lead to their accumulation in the environment and cause continuous and serious pollution or even toxicity to crops and humans. Qing et al. developed a novel strategy for analysis of three pre-emergence herbicides in environment samples using HPLC-DAD with SWATLD algorithm [50].

Chemometrics-assisted HPLC-DAD strategy has a great potential in analysis of target analytes in complex environmental matrices. So far, this strategy has been utilized for determination of 1-chloro-2,4-dinitrobenzene and 3,5-dinitrobenzoic acid [51], catechol, resorcinol, and hydroquinone [52] as well as five dimethylphenol isomers [53] in environment successfully.

5. Conclusion

This chapter scientifically describes in detail the various multiway chemometrics methodologies and applications in chromatography. We have built more canonical symbol systems, noted the inner mathematical cyclic symmetry property for multilinear decomposition, introduced several multiway calibration algorithms, explored the rank estimation of multiway data array, and analyzed numerous actual systems by homemade methods. Some fundamental issues related to chromatographic analysis such as peak alignment and background drift were also discussed and solved. By combining chromatographic techniques with chemometrics based on multiway calibration methods, complicated and tedious sample pretreatment can be greatly simplified and long chromatographic elution can be avoided. All the applications abovementioned are universal, rapid, and sensitive for the determination of a variety of analytes in complex matrices.

Acknowledgments

The authors gratefully acknowledge the National Nature Science Foundation of China (Grant Nos. 21575039 and 21775039) and the Foundation for Innovative Research Groups of NSFC (Grant No. 21521063) for financial supports.

Conflict of interest

There are no conflicts to declare.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Hai-Long Wu, Xiao-Dong Sun, Huan Fang and Ru-Qing Yu (November 5th 2018). High-Order Calibration and Data Analysis in Chromatography, Chemometrics and Data Analysis in Chromatography, Vu Dang Hoang, IntechOpen, DOI: 10.5772/intechopen.78624. Available from:

chapter statistics

293total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Introductory Chapter: Mathematical Methods in Liquid Chromatography - The State-of-the-Art Developments and Challenges

By Vu Dang Hoang

Related Book

First chapter

The Conditions Needed for a Buffer to Set the pH in a System

By Norma Rodríguez‐Laguna, Alberto Rojas‐Hernández, María T. Ramírez‐Silva, Rosario Moya‐Hernández, Rodolfo Gómez‐Balderas and Mario A. Romero‐Romo

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us