Applications of NIRS-PLS regression for determination of trace metals in environmental samples.
Environmental contamination by trace elements is becoming increasingly important problem worldwide. Trace metals such as cadmium, copper, lead, chromium, and mercury are major environmental pollutants that are predominantly found in areas with high anthropogenic activities. Therefore, there is a need for rapid and reliable tools to assess and monitor the concentration of heavy metal in environmental matrices. A nondestructive, cost-effective, and environmentally friendly procedure based on near-infrared reflectance spectroscopy (NIRS) and chemometric tools has been used as alternative technique for the simultaneous estimation of various heavy metal concentrations in environmental sample. The metal content is estimated by assigning the absorption features of metals associated with molecular vibrations of organic and inorganic functional groups in organic matter, silicates, carbonates, and water at 780–2500 nm in the near-infrared region. This chapter, reviewed the application of NIRS combined with chemometric tools such as multiple linear regression (MLR), principal component regression (PCR), and partial least squares (PLS) regression. The disadvantages and advantages of each chemometric tool are discussed briefly.
- near-infrared spectroscopy
- principal component regression
- partial least squares
- multiple linear regression
- trace metals
Due to fast industrial development and growth that have happened in most areas of the world during the recent years, water and soil are getting a large amount of pollutants such as trace elements from different sources . This can, however, lead to environmental contamination, thus affecting the ecosystem. Contamination refers to the condition of the land or water where any chemical substance or waste has been added at the above background level. The signifies of water, air, and land pollution include an adverse health or environmental impact [2–4] runoff, aerial deposition of chemicals used for agriculture or industry, materials stored or dumped on the site, and contaminants in imported fill, and building demolition can also result in contamination of the soil and water that are close to residential communities . Contaminants such as trace metals may be introduced into drinking water via the aforementioned activities or leached from the soil into groundwater . Additionally, trace metals occur naturally in the earth’s crust . For this reason, they can be present in soils at a background level. Trace metals persist for a long time in the environment because they are not degradable. In addition, they are translocated to different components, thus affecting the biota [2–4]. The persistence of trace metals can result in bioaccumulation and biomagnifications causing heavier exposure for some living organisms that are present in the environment .
Trace metal contaminations threaten agriculture and other food sources for human population as well as poor vegetation growth and that lower plant resistance against pests . This situation poses different kinds of challenges for remediation. Furthermore, people can be exposed to contaminants in the soil through different ways. These include dermal exposure or inhalation and penetration via the skin or eyes (includes exposure to dust) . Trace metal exposure is normally chronic (exposure over a longer period of time), due to food chain transfer . But the case of acute (immediate) poisoning is rare through ingestion or dermal contact but is possible . The toxicity of trace metals is one of the major environmental health concerns and potentially dangerous due to bioaccumulation through the food chain .
In view of the abovementioned challenges, the development of sensitive and selective analytical procedure for the determination of trace metals is of great importance. Flame atomic absorption spectrometry (FAAS) , graphite furnace atomic absorption spectrometry (GFAAS) , cathodic and anodic stripping voltammetry [12, 13], inductively coupled plasma optical emission spectrometry (ICP-OES) , and inductively coupled plasma mass spectrometry (ICP-MS)  are the most widely used analytical techniques for determination of trace metals in different matrices. However, these techniques are expensive, tedious, complex, and highly time-consuming [15–17]. In addition, investigation of trace metal concentration distributions in environmental matrices is based on numerous samples and laboratory analysis. Therefore, a rapid, reliable, and environmentally friendly method is required to detect and survey the distribution of trace metals in environmental matrices. This is done in order to diagnose suspected contaminated areas as well as control the rehabilitation processes . Reflectance spectroscopy is the study of the absorption and emission of light and other radiations by matter as related to the dependence of these processes on the wavelength of the radiation . It is based on the distinct vibrations and electronic processes of chemical bonds in molecules . These vibrations can be observed in three regions, namely, far IR (25 × 103 nm–1 × 106 nm), mid-IR (25 × 102 nm–25 × 103 nm), and near IR (8 × 102 nm–25 × 102 nm), with mid-IR and near IR being the most useful for qualitative and quantitative analysis . The MIRS is known to have a greater predictive ability for soil geochemistry compared with the visible-near infrared (vis-NIR) [22, 23]. However, more interest is on NIRs because it is generally cheaper . In NIR, many trace elements are spectrally featureless and only exhibit characteristic spectral features at high concentrations (>4000 mg kg−1) [23, 24]. Therefore, due to high detection, low levels of heavy metals can be indirectly determined from spectra due to their association with Fe oxides, clays, and organic matter . For this reason, cost-effective and nondestructive analytical techniques based on near-infrared (NIR) spectroscopy coupled with chemometrics have been developed to overcome the problems encounter when using traditional methods [17, 25].
The aim of this chapter is to review the application of near-infrared spectroscopy (NIRS) combined with chemometrics for the estimation of the concentration and distribution of trace elements in environmental matrices. The disadvantages of NIRS without chemometrics for analysis of trace elements are discussed. In addition, this chapter aims at promoting the application of simpler and greener methods such as the combination of NIRS and multivariate tools for monitoring of trace metal contaminations in different matrices.
2. Application of near-infrared spectroscopy (NIRS) for analysis of trace metals
Near-infrared spectroscopy (NIRS) is a fast and nondestructive analytical technique that is used to provide multi-constituent analysis of almost all types of sample matrices [15–17, 25]. This technique covers wavelength range closer to the mid-infrared and broadens up to the visible region [26, 27]. Typically, NIRS is primarily based on absorbance characteristics caused by vibrations of covalent bonds between H, C, O, S, and N, which are the main components of the organic matter . Pure metals do not absorb in the NIR region . However, their indirect detection is possible via their complexion with organic molecules containing C–H, N–H, and O–H bonds, which are detectable . This concept was termed “aquaphotomics” which is based on fact that the characteristic absorbance pattern of water (O–H overtones) can change as a consequence of the binding reaction with the metal [31, 32]. This effect is described and demonstrated in the study carried out by Putra et al.; although some complexes might be similar in different samples, slight differences in spectral features such as shifts in peak wavelength may still be seen depending on the nature of the cation . In addition, the electromagnetic radiation spectrum in the near-infrared region contains useful information about environmental sample constituents such as soil that can be used for prediction of metal concentrations [25, 28, 33]. For instance, the absorption features associated with electronic transitions of Fe3+ and Fe2+ ions in Fe-bearing minerals can be found in the near-infrared region at 780–1200 nm [33–35]. In addition, the absorption features of metals associated with molecular vibrations of organic and inorganic functional groups in organic matter, silicates, carbonates, and water can be found in near infrared at 780–2500 nm region [24, 33, 34, 36].
Wu et al. [24, 36] reported the feasibility of using NIRS for monitoring and predicting trace metals in suspended solids, sediment, and soil. This was achieved by quantitative evaluations of the spectral activity of sediment and soil properties [23–37]. However, due to challenges such as the collinearity, band overlaps, and interactions for some soil properties, the spectra of soil, sediment, or suspended solids are often broad and nonspecific . To overcome these challenges, some chemometric tools have been used to be applied to the quantitative analysis of the spectroscopic data . These chemometric tools include multiple linear regression (MLR) , principal component regression (PCR) [40, 41], and partial least squares (PLS) regression . These chemometric tools have been used to characterize soil spectra and build models for estimating the trace metal concentrations in soil or sediments and other matrices .
2.1. Applications of NIRS-multi-linear regression determination trace metals in environmental samples
Multiple linear regression (MLR) is a conventional chemometric method that commonly correlates a linear combination of several selected spectral bands/indices which have high correlations with the heavy metal concentrations . The disadvantage of using MLR is that it does not perform well with hyperspectral measurements. This is because the NIR spectral data usually exhibit high collinearity [18, 39]. This challenge has been solved by applying the enter and stepwise MLR approaches [18, 39]. In enter-MLR approach, a procedure for variable selection is adopted, and the selected variables are then used to calibrate the MLR model . In stepwise-MLR approach, on the other hand, a forward or backward method is applied to progressively select the independent variables according to a tolerance significance level, which is generally set to 0.05 [18, 39]. Due to the challenges associated with MLS, there are very few reports on its application together with NIRS for determination of trace elements in environmental samples.
Kemper and Sommer  explored the possibility to adapt chemometrics approaches for the quantitative estimation of As, Cd, Cu, Fe, Hg, Pb, S, Sb, and Zn in polluted soils using stepwise multiple linear regression (MLR) analysis and an artificial neural network (ANN) approach. The authors reported that the models predicted six out of nine elements with high accuracy. In addition, it was discovered that most wavelengths important for prediction were attributed to absorption features of iron and iron oxides. Furthermore, their results revealed the feasibility to predict heavy metals in contaminated soils using the rapid and cost-effective NIRS. Other applications are reported by Malley et al.  and Choe et al. .
2.2. Applications of NIRS-PCR for determination trace metals in environmental samples
Principal component regression (PCR) is a chemometric tool that combines principal component analysis and MLR [18, 42]. In this method, the independent variables are first decomposed into orthogonal principal components using the nonlinear iterative partial least squares algorithm and full cross validation of the calibration set . The maximum number of principal components is then defined according to the minimum value of the root-mean-square error of the cross validation . In the final step, the chosen principal components are used to calibrate the MLR models . The advantage of PCR over the normal MLR is that the principal components are uncorrelated and the noise is filtered . There is very limited information on the application of NIRS-PCR on the analysis of trace metals in different matrices. However, some of the reports are available in the literature [42, 43].
Wu et al.  reported the practicality of using NIRS for the determination of Hg concentration agricultural soil samples. The accuracy of the prediction models was optimized by applying several spectral pretreatments to the reflectance spectra. The univariate regression and principal component regression were used for the prediction of Hg concentration. According to their results, the optimal model was achieved using the PCR combined with Kubelka-Munk transformation. In addition, the results obtained from the correlation analysis revealed that Hg concentration correlated negatively with soil reflectance, while positively with the absorption depths of goethite at 0.496 μm and clay minerals at 2.21 μm . The findings suggest that the adsorption of Hg by clay-size mineral accumulations in soils was the mechanism that can be used to predict the spectral absorption band of Hg.
2.3. Applications of NIRS-partial least squares regression environmental samples
Partial least squares regression (PLSR) is a chemometric method that is widely used to quantitatively derive information from NIR spectra [18, 44]. The PLSR allows a refined statistical approach using the full spectral region rather than unique and isolated analytical bands . The principle of PLSR is based on incorporation of the dependent variables in the calculation of the principal components [45, 46]. For this reason, the PLSR is able to handle data with strong collinearity and noise [18, 44]. In addition, PLSR provides the possibility of handling cases where the number of variables significantly exceeds the number of available samples . The applications of NIRS combined PLSR for the determination of metals in different environmental matrices have been widely reported in the literature (see Table 1).
|Cr, Co, Ni, Cu, Zn, As, Se, Cd, and Tl||Soil||0.32–110 μg g−1|||
|Zn, Pb, and Cd||Soil||0.17–6530 mg kg−1|||
|Pb2+, Zn2+, Cu2+, Cd2+, and Cr3+||Water|||
|Cd and Zn||Soil||2.25–51.48 mg L−1|||
|K, Ca, Mg, Fe, and Zn||Manure compost||0.676–80.97 mg kg−1|||
|Cd, Cu, Zn, Pb, Ni, Mn, and Fe||Freshwater sediments||7.63–198.20 g kg−1|||
|Zn and Pb||Soil||2–425 mg g−1|||
|Al, Ag, As, Ba, Be, Bi, Ca, Cd, Ce, Cs, Co, Cr, Cu, Fe, Ga, In, K, La, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, S, Sb, Sc, Sn, Sr, Te, Th, Ti, Tl, U, V, W, Y, and Zn||Soil||1672–4601 mg kg−1|||
|Fe, Zn, Mn, and Cu||Agricultural soils||50–100 mg kg−1|||
|Cu, Mn, Zn, and Fe||Water and HNO3||–|||
|1–10,000 μg L−1|
Moros et al.  evaluated the potential of near-infrared (NIR) diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) combined with PLSR for nondestructive determinations of trace elements in foods. This analysis was achieved without physical or chemical sample pretreatment. The authors compared two spectral pretreatments that are multiplicative signal correction (MSC) and standard normal variate (SNV). Their results revealed that the PLS models built after using SNV provided the best prediction results for the determination of arsenic and lead in powdered red paprika samples. The concluded results showed that NIR diffuse reflectance spectroscopy combined with the PLS could be used to estimate the concentration of As and Pb at 100 μg kg−1 level with a standard error of prediction of 39 and 50 μg kg−1 for As and Pb, respectively. Furthermore, the estimated percentage errors were lower than 25% without the need of using sophisticated and high-cost instrumentation (such As ICP-MS and GFAAS) together with tedious and expensive digestion procedures for sample preparation. Moreover, the suggested NIRS-PLSR methodology was found to be an important tool for screening of trace elements in foods in the laboratories .
In another study, Li et al.  reported a method for simultaneous determination of mercury, lead, and cadmium ions in water samples using solid-phase extraction and near-infrared diffuse reflectance spectroscopy (NIRDRS). In order to analyze trace metal content in water samples using NIRS, thiol-functionalized magnesium phyllosilicate (Mg-MTMS) was used as an adsorbent for extraction of target analytes from aqueous solution. The adsorbed metals were measured using NIRDRS combined with PLS models. This combination resulted in fast and simultaneous quantitative prediction. The metal ions interacted with a functional group of the adsorbent and their absorption bands were observed in the spectra, thus leading to an efficient and precise prediction models. The concentration of the three metals that can be correctly determined was found to range between 4 and 6 mg L−1. This proved that the adsorbent used (thiol-functionalized magnesium phyllosilicate) had a high efficiency for the enrichment of Hg, Pb, and Cd in dilute solution. Furthermore, the results obtained revealed the feasibility of NIRDRS-PLSR for quantitative analysis metal ions in river water. Other applications of NIRS combined with PLSR are presented in Table 1.
2.4. Nonlinear calibration models for near-infrared spectroscopy
The abovementioned linear calibration models (especially PLS and PCR) have been extensively due to their ease of use, fast computation, good predictive performance, and easy interpretable representations . However, the linearity assumption is not always valid, and when the spectra exhibit nonlinearities, they tend to give nonoptimal results . Therefore, in such cases, it is of greater significance to develop the robust model system based on different nonlinear calibrations [49, 50]. These models include kernel PLS (KPLS), support vector machines (SVM), least squares SVM (LS-SVM), and among other artificial neural networks (ANN). Brief descriptions of these models are given in the subsequent paragraphs.
Kernel PLS is a nonlinear extension of linear PLS in which input data are transformed into a high-dimensional feature space via nonlinear mapping . Briefly, the KPLS includes two steps. The first step includes embedding data in an input space via nonlinear mapping. The second step is that a linear algorithm is designed to discover the linear relationship . The ANN on the other hand is a flexible mathematical structure capable of identifying complex nonlinear interactions between input and output data sets. This method is reported to be useful and efficient, especially in problems for which the characteristics are difficult to describe using physical equations . The ANN model has been shown to perform better than linear models .
Support vector machines are the models that involve a solution of a quadratic programming problem leading to global models that are often unique . The application of this type of model is further discussed and investigated by the authors in Refs. [56–59]. Finally, LS-VSM model is a simplification of the computational calculations of SVM by implementation of a least squares version for SVM . Least squares SVM is capable of dealing with both linear and nonlinear multivariate calibration problems relatively fast . In LS-VSM, a linear estimation is done in a kernel-induced feature space; the use of LS-SVM and NIR has been investigated by Borin et al. .
There are few studies on application of the nonlinear multivariate calibration in NIR spectroscopy that have been reported for analysis of trace metals. For instance, Shao and He  investigated the two sensitive wavelength (SW) selection methods combined with visible-near-infrared (vis/NIR) spectroscopy to determine the levels of some trace elements (Fe, Zn) in rice leaf. Calibration models using SWs selected by latent variables analysis (LVA) and independent component analysis (ICA) and nonlinear regression of a least squares support vector machine (LS-SVM) were developed. In the nonlinear models, six SWs selected by ICA provide the optimal ICA-LS-SVM model when compared with LV-LS-SVM. The coefficients of determination (R2), root-mean-square error of prediction (RMSEP), and bias by ICA-LS-SVM were 0.6189, 20.6510, and −12.1549 ppm, respectively, for Fe, and 0.6731, 5.5919, and 1.5232 ppm, respectively, for Zn . The overall results indicated vis-NIR spectroscopy combined with ICA-LS-SVM provided accurate determination of trace elements in rice leaf. Other methods are reported by Xu et al.  and Barbosa et al. , among others.
This chapter revealed that NIRS combined with multivariate tools has a great potential tool to improve the understanding of trace metal concentration in environmental matrices. It was also concluded that the use of chemometrics offered a rapid and cost-effective alternative to measure multielement particularly in soils and sediments. However, according to literature chemometric models such as MLS and PCR used in the NIRS are not reliable as compared to PLSR. Therefore, it is important for researchers to select proper chemometrics for their application. Due to the limitations encountered when using some of the chemometric tools, it is necessary to develop new chemometric methods or modify the conventional ones so as to improve their reliability and accuracy. It is reported in the literature that the limitation of NIRS is that some trace metals and other mineral compounds such as phosphorus do not absorb radiation in the NIR region. However, in most cases, this problem is solved by using the absorption features of metals associated with molecular vibrations of organic and inorganic functional groups in organic matter, silicates, carbonates, and water in near infrared (780–2500 nm region). Alternatively, some researchers combine different detection techniques such as UV-visible with NIRS [17, 64].