Multivariate Data Processing in Spectrophotometric Analysis of Complex Chemical Systems

There are a great variety of processing the analytical spectroscopy data, especially useful in multicomponent systems [Ewing et al., 1953; Garrido et al., 2004; Lykkesfeld, 2001; Oka et al., 1991; Sanchez & Kowalski, 1986]. These methods essentially are based on different strategies of mathematical strategies including specific formalism of mathematical statistics and of matrix algebra [Garrido et al., 2004; Szabadai, 2005]. The matrix-based methods reffer to quantitative analysis [Bosch-Reigh et al., 1991; Garrido et al., 2008; Li et al., 2011; Lozano et al., 2009; Ruckenbusch et al., 2006; Szabadai, 2005], to determination of the number of independent chemical equilibria in multicomponent systems [Szabadai, 2005] and for correction the action of various perturbing factors such as stray light or backgroud absorption [Burnius, 1959; Fox & Mueller, 1950; Melnick, 1952; Morton & Stubbs, 1946, 1947, 1948; Owen, 1995; Page & Berkovitz, 1943; Szabadai, 2005].

In the present chapter original approaches of matrix treatment of the aforementioned items are presented, with special consideration to the simultaneous assay of compounds in a mixter, to backgruond correction procedures and to the standard addition method in a generalized form.

Simultaneous assay of nonreacting compunds in a mixture
The issue of the quantitative analysis of a mixture, when the components do not interact chemically, can be approached, in a rigorous and general manner, with the help of matrix computation [Ewing et al., 1953;Garrido et al., 2004Garrido et al., , 2008Lozano et al., 2009;Lykkesfeld, 2001;Oka et al., 1991;Ruckenbusch et al., 2006;Sánchez & Kowalski, 1986;Szabadai, 2005]. In the case of a mixture with M component, the quantitative determination of the components, one has to measure the absorbance at  distinct values of wavelength ( > M). Given a set of N standard solutions (N > M and supposing that, as a rule, each standard solution may contain all of M chemical components of interest in known concentrations), absorbances are to be measured at the same set of wavelengths and in identical conditions as done for standard solutions.
Macro to Nano Spectroscopy 292 The following notations will be used in what follows: X n m () represents a quantity X referring to the standard mixture of number n (superscript index), at the individual chemical component of number m (subscript index), measured at the wavelength of number  (between parentheses). Thus, The left side of the relation includes the matrix of absorbances of the standard solutions and the optical path the radiation has been covered, "d" (i.e. the width of the cell used).
It may be allowed that the absorbance of the sample, measured at the same set of wavelengths as in the case of standard solutions, consists of the weighted contributions of the standard solutions. The contribution weight of each standard solution to the absorbance of the sample depends on the concentration of the chemical components in the sample under analysis and in the individual standard solutions. This is expressed, in matrix form, according to relation (2).
In what follows, bold characters are used for denoting matrices: the matrix of the absorbances of the sample will be denoted by A, the matrix of the absorbances of the standard solutions by A st , the matrix of the concentrations of chemical components in the analysed sample and in the standard solutions by C and C st respectively , the matrix of the molar absorptivities by E and the matrix of the contribution weight of the standard solutions, generating the absorbance of the sample, by P. In order to comprehend more easily the matrix formalism, the symbol of matrices is followed (between right brackets) by the specification of the number of rows and columns in the respective matrix. Therefore, matrix A st , made up of  rows and N columns, is denoted as follows : A st [,N]. Relations (1) and (2) are equivalent to matrix expressions (3) and (4).
Relations (1) and (2) [M,N] are known, and the further aim is to calculate the elements of matrix C[M,1]. These matrices satisfy relations (6) and (7). In what follows, the desired result is to eliminate matrix E[,M] from these two matrix relations and to explicit the resulting relation in relation to matrix C[M,1].
In order to solve the above system of equation in relation to matrix C[M,1], both members of equation (6) are multiplied on the right by the transpose of matrix C st [M,N]. The In what follows, both members of equation (7)  A particular case of the above reasoning is that with each of standard solutions contain only one dissolved chemical component (other than those present in the other standard solutions), so N = M. In this case notation S refers to their common value (N = M = S).
Consequently, matrix C st [S,S] of the concentrations of components in standard solutions is square and diagonal (only elements on the matrix main diagonal differ from zero) (17)).
If the entry data (the absorbance readings at the selected wavelengths and the concentrations of the standard solutions) do not form sets of relevant data, then singular matrices may be obtained when processing the data (whose determinant is null), namely matrices which do not admit an inverse. In order to avoid this failure, the condition  ≥ N ≥ M is imposed. This is the necessary (but no sufficient) condition to avoid the apparition of singular matrices. The necessity of the condition above results after inspecting the relations (10) and (12). In relation (10) (10), the appropriate choice of concentrations of standard solutions is imposed, so that in matrix C st [M,N] both rows and columns should be linearly independent. Otherwise expressed, it is essential that there should not be any significant intercorrelation neither between different columns nor between different rows of the matrix of standard concentrations (in algebraic terms, the concentrations in standard solutions must form a complete basis in the linear M-dimensional field). In other words, the spectra of individual chemical components should differ significantly in the spectral field chosen for analysis (more precisely, for the selected set of wavelengths). The relevance of the choice of the wavelength set, from the point of view of the abovementioned facts, can be tested by calculating the eigenvalues of the square and symmetric If one eigenvalue of this matrix is null (or very close to the null value), the selection of the wavelength set is not adequate for the intended analysis. The selection of another wavelength set is therefore necessary. The general issue of row (or column) intercorrelation is solved in linear algaebra by taking into consideration the issue of eigenvalues and eigenvectors. However, the complete and rigorous mathematical treatment of the issue of basis vectors in linear algaebra goes beyond the purpose of the present work.

Example
Let be N = 5 standard solutions containing M = 3 components of known concentrations. The concentrations, expressed in mg/l, are included in matrix C st [3,5]. As illustrated by this matrix, each of the 5 standard solutions contains (in different and known concentrations) all three dissolved chemical components.
The matrix ( All three eigenvalues are different from zero (taking into account the concentration values and the precision in expressing concentration values), so the rank of the matrix C st [3,5] is 3. In other words, the set of concentration values allows to determine quantitatively all three chemical components in their mixture (provided that the wavelength set at which the absorbance values are going to be measured is chosen correctly).
The situation would differ if the matrix of concentrations of the standard solutions contained the following values: In this case the rank of matrix C st [3,5] is only two because the second element in the column matrix of eigenvalues ( EV[3,1] ) is a lot smaller than the elements of the initial matrix and a lot smaller than the estimated accepted errors in expressing the standard concentrations. Consequently, even if a number of N = 5 standard solutions were used (with the considered concentrations), the concentrations of the three components in their mixture cannot be determined (irrespective of the wavelengths set chosen for measuring the absorbances), because the values of the concentrations of the standard solution have not been chosen properly.

Example
For numeric illustration of the spectrophotometric data processing with matrix formalism, the measurement data obtained analyzing the mixture of salicylic acid, caffeine and acetaminophen will be further presented [Szabadai, 2005]. The number of standard solutions is N = 5 and each standard solution contains all three components (in known concentrations). Table 1 contains absorbance values for the 5 standard solutions (A st ) and for the mixture of three substances (A), registered at the same set of 18 wavelengths. Table 1 also presents the known concentrations of the three components in the five standard solutions (elements of matrix C st [3,5]), i.e. M = 3, N = 5,  = 18. The matrix of concentrations of the standard solutions C st [3,5], the matrix of absorbances of the standard solutions A et [,5] and the matrix of absorbances of the sample A[,1] have the following forms:  [ After performing the matrix operations in relation (16), the elements of matrix C[3,1] are obtained. They represent the concentrations, expressed in mg/l, of the three components of interest (salicylic acid, caffeine and paracetamol) in the analysed sample. 3.538 1.553

Generalization of the 3-point method to correct backgroud absorption
Before dealing generally with the issue of foreign components in the sample (components which cannot be found in standard solutions) -which may cause deviations from the hypothesis according to which the sample spectrum is formed by adding (with different weights) the spectra of standard solutions -the quantitative analysis method and the baseline correction algorithm suggested by Morton and Stubbs [Burnius, 1959;Ewing et al., 1953;Fox & Mueller, 1950;Melnick et al., 1952;Morton & Stubbs, 1946, 1947, 1948Owen, 1995;Page & Berkovitz, 1943;Szabadai, 2005;] (also known as "3-point method") will be presented.
The Morton -Stubbs method takes into account that the sample often contains -besides the chemical substance of interest -other foreign absorbent chemical components. If the chemical removal of these foreign components is difficult, the elimination (or at least the minimisation) of their contribution to the final result of the analysis by correcting the absorbance read could be a confortable solution. Accordind to the original form of the Morton and Stubbs method [Morton & Stubbs, 1946, 1947, 1948, it is possible to eliminate the disturbing effect of a foreign component only in the case in which the absorption of the disturbing component, manifested in the spectral field taken into consideration, does not present a maximum of absorption, but appears as a baseline absorption, dependent on the wavelength according to a linear function, which overlaps the absorption spectrum of the chemical component of interest.
The absorption spectrum of the component of intereset is deformed because of the background absorption (linearly dependent on the wavelength), and the effect of this deformation is eliminated through the special method of processing the measured absorbance values. According to the original Morton -Stubbs formalism, it is essential to determinate the absorbance of the sample at at least three wavelengths [Morton & Stubbs, 1946]. The wavelengths values involved are selected as follows: the wavelength used ( max ) is the one at which the standard solution of the substance of interest (where the disturbing component is not present) presents a local absorbance maximum and another two wavelengths (( 1 and  2 ,  max being between these wavelengths) at which the substace of interest presents equal molar absorptivities (A ' (1) = A ' (2)). Figure 1 represents the spectrum of the standard solution by dotted line whereas the spectrum of the mixture, where the quantitative determination of the substances of interest is intended, is represented by a continuous line. The absorbance values corresponding to the three wavelengths selected ( 1 ,  2 and  max ) are denoted as A(1), A(2) and A(max) in the spectrum of the sample and as A ' (1), A ' (2) and A ' (max) in the spectrum of the pure (standard) component. The purpose is to calculate quantity A ' (max) (namely the absorbance associated with the substance of interest but without the backgroud absorbance) from the measured values A(1), A(2) and A(max).
The absorbance A ' (max) is obtained by subtracting from the measured value A(max) the value denoted by x + y in Figure 1.
The value x is expressed from the similarity of two triangles chosen conveniently: For calculating the value y in expression (18), the ratio of the absorbances A ' (max) and A ' (2) is needed, which can be determined from the spectrum of the standard solution. When elaborating an analytical method in order to determine a certain substance of interest, in a standardized work method, the ratio of the absorbances A ' (max) and A ' (2) once determined, it can be used for subsequent analyses, provided that analyses should be performed strictly in unchanged conditions (in the same solvent, at the same pH, the same temperature, with the same slit program of the spectrophotometer, preferably the same type of spectrophotometer as the one used for determining the above mentioned ratio). Let be denoted the aforementioned ratio as : In possession of the ratio , the value y is obtained from relation (18) and (21).
A ' (2) = A(2) -y After dividing member by member relations (18) and (21), results: (2) (max) The liniarity of the background absorption in a large spectral field is not always satisfied. In the case of wide absorption bands it is recommended to measure the absorbance of the sample at several wavelengths; in these cases however, the processing of the absorbance values measured requires more elaborated mathematical methods.
As it can be noticed, the Morton -Stubbs formalism allows the presence in the spectrum of the sample a linear background (a linear foreign spectrum in relation to the wavelength) which cannot be put down to any component of the standard solutions, ensuring corrected results (sample concentrations of the components of interest).
The original algorithm may be extended to ensure the obtention of corrected results in the case in which the sample spectrum contains, besides the chemical components represented in the standard spectra, a G degree polynomial baseline in relation to the wavelength. The spectrum of the sample is thus considered to consist of the spectra of the standard solutions and of the background spectrum, the latter being approximated to an adequate G degree polynomial (relation 24).
1 0 ( ) ( ) ; ( 1,2, , ) The purpose is to calculate the contribution weight p k of each standard solution to the spectrum of the sample, namely the coefficients p k (k = 1 , 2 , . . . , K). In the ideal case, when the spectrum of the sample does not contain a foreign baseline, but only the components represented in standard solutions, the coefficients q g (g = 0 , 1 , 2 , . . . , G) are all null. Because of inherent measurement errors these coefficients are not null, but if the polynomial (25) is positive and has small values (for all wavelengths  i selected) in relation to the measured absorbances, the approach of the issue is correct and there are still chances to remove, by calculation, the effect of the polynomial backgroud (G degree) from the spectrum of the sample on the results. On the contrary, if the polynomial (25) has a high value or a negative one, even for one wavelength (one i value), the foreign backgroud cannot be approximated to a G degree polynomial form, and forcing the algorithm might lead to an erroneous result. 0 ( ) Obviously, the highest the G degree of the polynomial (25) which corrects the foreign backgroud in the spectrum of the sample, The more flexible the correction algorithm of a real backgroud absorption, but the more wavelengths should be selected where the absorbance readings are performed (in other words the inegality N > G + K + 1 is imposed in practice in order to obtain, from the measured absorbance values, a supra-determined system of equations).
For the statistical processing of the set of N absorbance values obtained for the sample and NK absorbance values for the K standard solutions, the function (26) is defined imposing that for the values p k (k = 1 , 2 , . . . , K) and the values q g (g = 0 , 1 , . . . , G), which ensure the best global correspondence between the measured absorbances of the sample and the absorbances approximated with the relation (24), the function F(p k ,q g ) should present a local minimum. The condition formulated is equivalent cancel the partial derivatives of the function (26) calculated in relation to p k (k = 1 , 2 , . . . , . . . , K) and q g (g = 0 , 1 , . . . , G). The cancellation of partial derivatives in (26) represents the necessary (but sufficient) condition for a local minimum of the function (26).
After derivation and equalization the derivatives to zero, a system of K + G + 1 linear equations is obtained, having the same number of unknowns (27).
In order to express the wavelength (and its different powers) any unit of measure can be used, provided that the same unit of measure is used in all equations and for all wavelengths.
The generalisation of the Morton -Stubbs algorithm for the polynomial correction of the spectrum of the sample can also be presented in a matrix form. The equation system (24), written in a conventional algebraic form, is equivalent to matrix relation (28).
If the matrix of the absorbance values of the sample (on the left member of the equation (28)) is denoted by X[N,1], the first matrix factor on the right member by Y[N,K+G+1] and the second matrix factor on the right member by Z[K+G+1,1], the equation (28) can have the form (29).

X [N,1] = Y[N,K+G+1] . Z[K+G+1,1]
The unknowns of interest are found in matrix Z[K+G+ 1,1] ; the relative weights are p k (k = 1, 2 , . . . , K). In order to explain the elements of matrix Z, both members of relation (29) It is decisively important to determine the correct set of wavelengths at which the absorbance values should be measured in case of a concrete analytical problem. The choice of the optimal wavelength (or wavelengths) is often a difficult issue even in case of a single component of interest. In real samples the component of interest may be accompanied by different other components whithout analytical interest ("the sample ballast"), but which can modify the molar absorptivity of the component of interest and so the sensitivity of the spectral answer of the chemical substance representing the object of the analysis. If one can identify the wavelength value at which the absorption of the sample ballast is negligible and at which the absorption of the component of interest is considerable, the respective wavelength is recommended for the determination. When at this wavelength the component of interest has even a local absorption maximum, this is an additional advantage, because at this wavelength the absorbance value depends in a minimum extent on the possible disorders in setting the wavelengths of the spectrophotometer. In the less fortunate case, where the sample ballast covers the entire spectral field available, more wavelengths are selected in order to determine the component in the sample in order to improve the specificity of the spectral answer in favour of the component of interest.
When the absorption of the component of interest and that of the ballast cannot be separated, a set of wavelengths can often be chosen so that the absorbances measured express the concentration of the component of interest through a multilinear relation (32).
The aim is to determine numerically the coefficients f( i ) ; (i = 1 , 2 , . . . , N) for each wavelength in the spectral field considered (it is considered that the entire spectrum consists of N absorbance values associated to N discrete wavelength values) to calculate according to ( If S = N, the number of unknowns equals the number of equations, so we dispose of the minimum number of equations necessary to solve the system (33) in relation to the N unknowns. For the reasons discussed above, the creation of a supra-determined system of equations is preferred (S > N), as well as the search for a solution with an optimal global fit with least squares method.   (32) is insignificant, they can be considered null, and the respective wavelengths are not relevant for the intended quantitative analysis. Therefore, to each wavelength in the spectrum a coefficient is associated expressing the relevance of that wavelength for the quantitative analysis of the component of interest in the presence of the matrix included in the calibration stage.
By excluding the irrelevant wavelengths, which do not improve the selectivity of the analytical method, one may reduce the number of wavelengths at which the measurement of absorbances is imposed when executing a real sample analysis.
In possession of the coefficients f( i ) ; (i = 1 , 2 , . . . , N), the concentrations in the standard samples can be recalculated by relation (4.65) (the concentrations obtained are denoted c 1 , c 2 , . . . , c s , . . . , c S ). Ideally, concentrations for all standard samples can be found. In reality, the correspondance between the set of existing (and known) concentrations in the S standard samples and the set of concentrations recalculated with relation (33) is not perfect. The success of the calibration operation can be expressed through the value of the linear correlation coefficient between the set of existing concentrations in the standard samples and the recalculated ones. Since the arithmetic mean of the existing (and known) concentrations in the S standard samples and the arithmetic mean of the concentrations recalculated with relation (33) are equal (according to a known theorem of mathematical statistics), their notation with a common symbol is justified: If the correlation coefficient (34) has an acceptable value from a statistical point of view (for example r > 0,95), it is likely that the set of coefficients f( i ) ; (i = 1 , 2 , . . . . . . , N), obtained by solving the equation system (33) will allow to find the correct concentration of the substance of interest in real samples, provided that the real sample ballast is not completely different from the ballast range covered when calibrating the method (when determining the coefficients f( i ) ; (i = 1 , 2 , . . . , N)). This requirement is met to a certain extent in the case of serial analyses, where the nature of individual samples does not differ much, meaning that their ballast is similar.
Presenting a spectrum in a spectral field through pairs of wavelength-absorbance values (A() vs. , "digitized presentation") implies a large amount of data (for a faithful reprensentation of a spectrum the N number of sampling points is large). It results that, in order to genearate a supra-determinant equation system (33), an even larger number of standard samples is necessary (S > N). This is generally inconvenient to realize in practice because it implies the use of a too large number of standard samples.
If S < N, the equation system (33) allows several sets of wavelengths for which the concentrations in standard samples correlate satisfactorily with the absorbance values, and the remaining problem is to identify at least one of these sets. This method is frequently used in practice, and establishing a profitable set of wavelengths involves the following stages: ( In relation ( The correlation coefficients, calculated with relation (37) for j = 1, 2, . . . , S, in relation to "j" and the value "j" is retained (denoted by 1) for which the correlation coefficient is highest (in case of obtaining equal values of the correlation coefficient for more "j" values, one of these "j" values is retained arbitrarily).
In relation (38) A -( i ) and B j represent the arithmetic means of the corresponding matrix elements in columns of order "i", and "j" respectively.
In each column "j" of the matrix (39) an element r ij with maximum absolute value is sought. The set of order numbers "i", which associates an "i" for each column"j", corresponds to the researched set of wavelengths.
In order for the equation system (40) to be solvable in relation to the unknowns f(  1 ) , f(  2 ) , . . . , f(  J ), it is necessary that the number of selected wavelengths (J) be smaller than (or equal) to the number of standard samples (S). It is also essential that the determinant of matrix At the simultaneous determination of several chemical components which do not interact chemically, the equation system (1) and (2) has been constituted, with the help of N standard solutions, measured at  distinct wavelength values. In order to correctly solve the analytical problem, it is recommendable that the spectra of the N standard solutions be "as distinct as possible", because in the extreme (and imaginary) case where two standard solutions had identical spectra, the equation system would be undetermined, so impossible to solve. It is necessary to rigorously express the requirement that the spectra be as "different as possible". A method of characterizing the difference between spectra consists in considering the absorbances of a standard solution, measured at the selected set of wavelengths, as components of a vector in the -dimensional space. The N spectra of standard solutions will thus form a set of N vectors.
The value of the Gramm determinant (41) of the vector set expresses quantitatively the difference between vectors. The higher the value of the determinant (41), the more satisfied the requirement that the standard spectra be "as different as possible". At a higher value of the Gramm determinant the absorbance measurement error affects to a smaller extent the precision of the final results.

Generalization the standard addition method for several components of interest
In a real sample, subjected to be analyzed, one must take into consideration that the sample contains, besides the substance of interest, various other ingredients. Although it is possible to choose a wavelength at which the absorbance of the substance of interest should be significant and the absorbance of the ingredients negligible, it may happen that the ingredients, through their presence, modify the molar absorptivity of the component of interest, and thus modify the sensitivity of the spectrophotometric response to the component of interest. This possibility is more plausible in real pharmaceutical products, If the column matrix on the left member of the quation (42) is denoted by A[,1], the matrix of molar absorptivities on the right member of the equation (42) The particular case of standard addition method applied to a system with two components to be determined, is illustrated graphically in Figure 2. In this case, the procedure is reduced to determining the plane  passing through a number of figurative points and to reading the intersection points of this plane with the negative semi-axes of the concentrations. At the graphic representation of absorbances A i () vs. the increase of concentrations c 1i and c 2i (i = 1 , 2 , . . . , n), the figurative points are situated theoretically on a plane (denoted by  in Figure 4-20). The axis of absorbances is intersected by plane  in point P, corresponding to the absorbance A 0 (), measured in the case of the solution with i = 0. If at the selected wavelength () the absorbance of the ingredients can be left out, the points X and Y, situated at the intersection of plane  with the negative parts of axes c 1 and c 2 , have the coordinates -c 10 respectively -c 20 (in other words, the lengths of the segments OX and OY are proportional to the concentrations c 10 and c 20 ). From the values c 10 and c 20 , and knowing the volumes V a , v and V b , one may calculate the concentrations c 1 and c 2 of the components of interest in the first solution, and finally their content in the primary sample.

Example
In order to illustrate the application of the standard addition method and of the subsequent data processing procedure, let consider the mixture of salicylic acid, caffeine and acetaminophen, discussed in a previous example. The aim is to determine the concentrations of the three chemical components. Table 2 includes the modifications of the component concentrations (5 modifications are performed) and the absorbances both for the original solution (where concentrations have not been modified) and for the five solutions in which the three chemical components have been modified. All absorbance values are read at the same set of 18 wavelengths ( = 18).
The elements of matrix E are calculated with relation (49) and are expressed in the tolerated unit of measure l/(mol . cm), employed in spectrophotometric practice, and the elements of matrix C, calculated with relation (51) are expressed in mol/l.

Conclusions
The application of matrix algebra to the quantitative spectrophotometry provides a unified formalism for treatment the mathematical issues. Unlike the usual mathematical approaches, the matrix description of the phenomena behind the analytical spectrophotometry promise new dimensions for the automatic processing of results.