Raman spectroscopy is a noninvasive optical technique that can be used as an aid in diagnosing certain diseases and as an alternative to more invasive diagnostic techniques such as the biopsy. Due to these characteristics, Raman spectroscopy is also known as an optical biopsy technique. The success of Raman spectroscopy in biomedical applications is based on the fact that the molecular composition of healthy tissue is different from diseased tissue; also, several disease biomarkers can be identified in Raman spectra, which can be used to diagnose or monitor the progress of certain medical conditions. This chapter outlines an overview of the use of Raman spectroscopy for in vivo medical diagnostics and demonstrates the potential of this technique to address biomedical issues related to human health.
- Raman spectroscopy
Raman spectroscopy is based on the inelastic scattering of photons, also known as Raman effect, discovered by C. V. Raman in 1928 . When a sample is illuminated with a light source, the incoming photons are absorbed or scattered. If absorbed, the photon energy is transferred to the molecules, whereas if a photon is scattered and the energy is conserved, it is called elastic scattering. However, a small portion of scattered photons (1 in every 10 billion photons) can be scattered inelastically, which means a slight change in the photon energy. This small energy difference between the incident and the scattered photon is the Raman effect. Raman spectroscopy has several advantages for biomedical applications, including being nondestructive and relatively fast to acquire, and provides information at the molecular level. Additionally, water produces weak Raman scattering, which means the presence of water in the sample does not interfere with the spectrum that is being analyzed. The main disadvantages of Raman spectroscopy include the extremely weak Raman signal and the presence of undesirable noise sources such as the intense fluorescence background present in biological samples.
A Raman spectrometer useful for
3. Data preprocessing
A big issue in biological Raman spectroscopy is the presence of undesirable background elements related to different sources such as intrinsic fluorescence, noise introduced by the equipment used, and the noise generated by external sources.
3.1. Smoothing and denoising
The main sources of noise present in Raman spectra from biological samples are the shot noise, fluorescence background, flicker noise, dark current, and thermal noise. One alternative to reduce the thermal noise and dark signal is the use of a Raman system with high quality, thermoelectric cooled spectrometers. In Raman spectra, most of the time, the shot noise is the predominant noise associated with the particle nature of light. The approximate shot noise associated with measurement of n counts is n1/2. Thus the signal to noise ratio (S/N) can be improved incrementing the number of counts n. In other words, S/N can be improved by increasing averaging time due to the fact the signal increases proportionally with time. There are several multitude noise removal techniques that can be applied to Raman spectra. Smoothing is often employed for the removal of high-frequency components from Raman spectra, based on the fact that noise appears as high-frequency fluctuations, whereas signals are assumed to be low frequency. One smoothing technique is Fourier filtering . In this technique, the higher frequency fluctuations, which are considered only noise, can be removed and the lower frequency ones can be used to reconstruct Raman spectra without noise. One drawback of this method is that the removal of the higher frequency noise may often introduce artifacts and distortion in Raman spectra. A commonly used smoothing technique is Savitzky-Golay (SG) filtering. The SG filter is a moving window–based local polynomial fitting procedure . As the moving window size increases, some of the Raman bands may disappear. Therefore, it is very important to choose the appropriate parameters such as the polynomial order and the moving window size to avoid loss of Raman data. Other smoothing methods are locally weighted scatter plot smoothing (LOWESS)  and wavelet filtering  whereby the spectrum is decomposed using the discrete wavelet transform in order to isolate the noise by localizing it in space and frequency. Once it is isolated, it can be set to zero and the inverse wavelet transform is used to reconstruct the data. In all the mentioned methods, parameters have to be chosen carefully to avoid the important Raman bands being eliminated during smoothing.
3.2. Background removal
As mentioned is the last section, one noise source in biological Raman spectra is the fluorescence background. This intrinsic fluorescence emission is several orders of magnitude greater than the Raman scattering intensity of biological tissues; therefore, fluorescence appears as a strong band that obscures Raman signals and must be removed in order to perform the analysis on the Raman spectra. Background elimination has been performed using two approaches: experimental and computational. The experimental methods are related to changes in the instrumentation and those include shifted excitation , photo bleaching , and time gating . One drawback of these methods is the relatively complex instrumentation, the long acquisition times, and alterations in the sample that could make the analysis of biological samples difficult. On the other hand, background removing by using computational approaches has the advantages such as easy to implement, inexpensive, and fast. Such methods include polynomial fitting [10, 11, 12], Fourier transform , wavelet transform , first- and second-order differentiation , multiplicative signal correction , linear programming , geometric approach , asymmetric least squares , methods based on iterative reweighted quantile regression , iterative exponential smoothing , and morphology operators [21, 22]. However, the most used method is polynomial fitting due to simplicity. In this method, a polynomial is fitted and subsequently subtracted from the Raman spectrum to eliminate background effects. The selection of polynomial order is extremely important, because a higher order polynomial fitting may consider Raman bands as background and may be affected by high frequency noise. To solve this issue, some modified polynomial fitting methods were proposed. Figure 2 shows the Raman spectra of
For example, the algorithm proposed by Zhao et al.  also known as the Vancouver Raman algorithm (VRA) is widely used for baseline correction in biomedical applications due to effectiveness and simplicity. The main advantage of this method is that it accounts for noise effects and Raman signal contribution.
Raman spectra from the same sample could have different intensity levels if they were acquired at different times or under different experimental parameters such as changes in laser power levels. Normalization process deals with these differences in intensity levels by making that the intensity of a specific Raman band of the same material is the same or similar possible in all the spectra recorded under the same experimental parameters. One approach is the normalization to area. In this method, the intensity at each frequency in the spectrum is divided by the square root of the sum of the squares of all intensities. This normalization is useful when the spectra do not share a common band and it is better to normalize the spectra so that the total area under the spectrum is 1.0. This method has the advantage that is not dependent on any single band but one disadvantage is that the background can contribute to the normalization . Another approach is the peak normalization, which uses intensity corresponding to the central frequency of a particular Raman band as reference (internal or external). The 1660 cm−1 (amide I) and the 1450 cm−1 band (C─H vibrations) are commonly used as reference due to their intensities that are not significantly affected by other changes in the sample . This method assumes the reference does not change from one spectrum to other and therefore is not suitable when the nature of the samples could lead to a shift in the band position.
Chemometrics uses mathematical and statistical methods to provide chemical/physical information from chemical data or for the subject under consideration, spectroscopic data. In order to identify components in a sample, one possibility is to use individual bands, but this approach is not the best option because one band is not specific for a molecule, as many molecules have the a band in the same localization. A more precise identification is to use multiple bands or the complete spectrum. Such approach considers each point in a spectrum as a variable and spectroscopic data can be displayed as a matrix where columns represent the variables (Raman shift or wavenumber) and the rows represent observations (Raman spectra). To analyze data with more than one variable, multivariate data analysis is used. There are many multivariate data analysis techniques available and their correct use depends on the objective of the analysis. The objective can be data description or exploratory analysis, discrimination, classification, clustering, regression, and prediction. Also, the data analysis methods can be divided into unsupervised and supervised methods. The supervised methods are used when there is no a priori knowledge available and are very useful to find hidden structures in the unlabeled data and sometimes are used as a first step to supervised methods. Hierarchical cluster analysis (HCA) and principal component analysis (PCA) are examples of unsupervised methods. On the other hand, supervised methods need a priori information such as class labels and the analysis involves the use of a training data set to find the patterns in the data and later validate the model using a test set. One example of the supervised method is partial least squares (PLS).
4.1. Principal component analysis (PCA)
Principal component analysis (PCA) is an unsupervised method often used to reduce the number of variables  and exploratory analysis of data. PCA is based on the eigenvector decomposition of the covariance matrix of the spectra matrix into eigenvectors and eigenvalues. The eigenvectors (or principal components) are orthogonal along n-dimensional axes and are ordered by decreasing value of each associated eigenvalue. This means the principal components are independent of each other and uncorrelated, as opposed to the original ones, which may be correlated. Also, their decreasing order means that the first principal component explains the maximum amount of variance of the original data, and the second one explains more variance than the third, and so on. The original data can be considered as an M×N matrix of M spectra sampled at N wavenumbers. Applying the PCA to this matrix, PCA yields three results: N principal components, an N×N matrix containing the coefficients for the transformation between the original data and the principal components, and N eigenvalues describing the importance of the corresponding principal components. The original N experimental spectra are transformed into a new set of N ‘synthetic’ spectra called principal components. In summary, one advantage of PCA is that by evaluating the relative importance of the consecutive principal components, it is possible to reduce the dimension of the original dataset by finding a smaller collection of variables that explain the highest amount of variance. Additionally, because changes in Raman signal are uncorrelated with the noise in the spectra, the random noise and the significant spectral changes will be separated into different principal components. Therefore, many principal components can be discarded, removing noise without losing useful information from Raman signal.
4.2. Partial least squares (PLS)
PLS is one of the most widely used multivariate data analysis techniques along with vibrational spectroscopy to estimate and quantify components in a sample . As a supervised method, the concentrations of all constituents in the calibration samples are known. As with PCA, the noise observed in the spectra is isolated into separate latent variables (LVs), which are left out of the calibration, improving prediction precision, and nonlinear relationships between the properties of interest and intensity can be accommodated in a PLS model by including multiple LVs.
4.3. Classification and clustering models
Several data analysis methods are focused on looking for differences between the spectra so that groups of spectra can be identified and classified. The most common methods used in biomedical Raman spectroscopy are k-nearest neighbors (KNN), hierarchical cluster analysis (HCA), artificial neural networks (ANN), discriminant analysis (DA), and support vector machines (SVM). The KNN method compares all spectra in the dataset through the use of the metrics of similarity between spectra like the Euclidean distance. This method has been used in combination with PCA and Raman spectroscopy for the diagnosis of colon cancer . HCA uses a variety of multivariate distance calculations such as Euclidean and Mahalanobis metrics to identify similar spectra and is one of the used methods in Raman and IR imaging . Similarly, artificial neural networks can be used to identify clusters or to find patterns in complex data. ANNs are computational models inspired by the functionality and structure of the central nervous system and the networks consist of interconnected group of nodes or neurons, which have different functions such data input, output, storage, or forwarding. The layout of ANN is composed of a number of layers and a number of neurons per layer. The use of ANN in the data analysis of blood serum Raman spectra allows for the differentiation between patients with Alzheimer’s disease, other types of dementia, and healthy individuals . DA is a supervised data analysis technique, which requires a priori knowledge of each sample group membership. DA computes a set of discriminant functions based on linear combinations of variables that maximize the variance between groups and minimize the variance within groups according to Fisher’s criterion. Sometimes it is very useful to combine both PCA and LDA approaches (called PC-LDA model), which improves the efficiency of classification as it automatically finds the most diagnostically significant features [29, 30, 31]. SVMs are kernel-based algorithms that transform data into a high-dimensional space and construct a hyperplane that maximizes the distance to the nearest data point of any of the input classes. Raman spectroscopy and SVM have been used as methods for cancer screening .
The importance of the
5.1. Cancer diagnosis
One of the most common clinical targets under investigation with Raman spectroscopy is cancer due to the possibility to measure biological samples minimally invasive,
5.1.1. Lung cancer
Short et al. designed a Raman probe for
5.1.2. Gastrointestinal cancer
In 2014, Bergholt et al.  performed an
5.1.3. Oral cancer
In a study conducted by Guze et al. , Raman spectra of oral diseases from 18 patients were classified into a benign or malignant category using PCA-LDA, and the method provided 100% specificity with 77% sensitivity. Murali Krishna et al. reported the potential for Raman spectroscopy to identify early changes in oral mucosa and the efficacy of this approach in oral cancer applications . Comparing noncancer locations in a smoking and nonsmoking population demonstrated prediction accuracies from 75 to 98%. Another group reported the discrimination of normal oral tissue from different lesion categories with accuracies ranging from 82 to 89% . Recently, Lin et al.  reported the utility of fiber-optic–based Raman spectroscopy for real-time
5.1.4. Skin cancer
A clinical study of 453 patients to investigate different types of skin cancer was published in 2012 by Lui et al. . The instrument used by the authors allowed an acquisition time of approximately 1s and the software preprocessed the spectra immediately, which allowed to investigate skin lesions in real time. Benign and malignant skin lesions including melanomas, basal cell carcinomas, squamous cell carcinomas, actinic keratoses, atypical nevi, melanocytic nevi, blue nevi, and seborrheic keratosis were investigated and discriminated by multivariate analysis tools with sensitivities between 95 and 99%. Lim et al. determined the diagnostic capability of a multimodal spectral diagnosis for
5.2. Skin diseases
5.2.1. Atopic dermatitis
Several published works have used Raman spectroscopy to analyze the molecular composition of skin and correlate it with history of atopic dermatitis (AD) and filaggrin gene (FLG) mutations; Kezic et al. measured NMFs noninvasively on the skin of 137 Irish children with a history of moderate to severe AD . González et al. detected the presence of the protein filaggrin in the skin of newborns using Raman spectroscopy and PCA as an early detection procedure for filaggrin-related AD . In order to detect the presence of filaggrin in the Raman spectra, the coefficients of the principal components for each of the skin spectra from newborns were calculated. The first and second principal components accounted for 93.86% of all the explained variance of the original data. Figure 3 shows a graph of these two principal components, also known as scores plot. In the figure, the gray solid circles correspond to those infants who developed AD; the rest of the subjects are grouped together around the location of the filaggrin spectrum, represented as a black solid circle. The geometrical distance of each Raman spectra to the spectrum of filaggrin in the principal component plane indicates the amount of filaggrin in the subjects. Lower distances indicate higher amount of filaggrin and higher distances indicate lees amount of filaggrin or a filaggrin with a different molecular structure than the molecule that was taken as a reference spectrum.
This result indicates that this approach can be used to identify the persons who are more susceptible to develop AD, making it possible to use this technique as a method for early detection of AD. González et al. validated the use of Raman spectroscopy as a noninvasive tool to detect filaggrin gene mutations . In this study, the amount of filaggrin was estimated by performing the correlation between the pure filaggrin Raman spectrum and the skin spectra obtained from Mexican patients with AD; the genetic analysis showed that 8 out of the 19 patients (42%) presented an FLG mutation. These 8 patients presented the 2282del4 FLG mutation, 2 of which (10.5%) were homozygous and 6 (31.5%) heterozygous, whereas 1 (5.2%) resulted in a compound heterozygote for the 2282del4 and the R501X mutations. These genetic results were compared to the filaggrin amount estimated; a lower correlation value of the spectra with the filaggrin spectrum indicates a lower filaggrin concentration. Figure 4 shows the results of the correlation for the patients with an FLG mutation (FLG –) and without an FLG mutation (FLG +). The patients with an FLG mutation presented an average correlation of 0.286, while the patients without an FLG mutation showed an average correlation of 0.4. Their results show that the correlation of the filaggrin Raman spectrum with the Raman spectra of skin can be an indicator of filaggrin gene mutations. In another work, Baclig et al. used a genetic algorithm to demonstrate that strongly reduced Raman spectral information is sufficient for clinical diagnosis of atopic dermatitis .
5.2.2. Skin aging
Tfayli et al. reported slight variability in skin lipids upon aging . The Raman spectral features of the skin lipids shifted in lateral packing with increasing age of the volunteers. González et al. differentiated between chronological aging and photoinduced skin damage by PCA of
5.2.3. Nickel allergy
Alda et al.  detected biochemical differences in the structure of the skin of subjects with nickel allergy when comparing with healthy subjects. The Raman spectral differences between groups were classified using PCA.
Moncada et al.  used Raman spectroscopy in melasma patients treated with a triple combination cream (Tretinoin, Fluocinolona, and Hydroquinone) and found that the Raman skin spectra of the melasma patients showed differences in the peaks associated to melanin at 1352 and 1580 cm−1 (Figure 5). The Raman skin spectrum of patients who did not respond to treatment (Figure 1B) showed peaks that are not well defined, which are consistent with molecule degradation and protein breakdown. These results are consistent with the results reported previously by González et al. .
5.2.5. Other in vivo applications: UV/Vis Raman, Raman imaging, and SERS
In most of the
Other alternatives for
Among the disadvantages of Raman spectroscopy for biomedical applications is the weakness of the Raman effect, which most of the time is often accompanied by a stronger background signal particularly in biological samples. The background removing includes changes in instrumentation, which means high-complexity and high-cost systems. One alternative is the algorithm-based methods for fluorescence background removing. However, these methods cannot deal with all types of fluorescence without user intervention to adjust algorithm parameters. Additionally, the complexity of the fitting algorithms makes it difficult to use by nonexperts. Other limitation is that not all the molecules are Raman active, which means that some molecules do not give Raman signal. The potential of damaging the sample due to the laser exposure, which depends on the excitation wavelength, has to be taken for
7. Conclusions and outlook
From the applications described in this chapter, it is clear that Raman spectroscopy has a great potential for
Raman spectroscopy is likely to become a key player for
An area of development that would accelerate the use of Raman spectroscopy in a clinical environment is the design of low-cost and portable Raman spectrometers, which would make their use more appealing for the medical community. Research in this area could also lead to an integrated optics Raman spectrometer, which would make the use of this technique useful in wearable health devices and monitoring of health parameters in a clinical environment.
It is the authors’ belief that the combination of optimized instrumentation, standardized measurement procedures, preprocessing, and data analysis will allow Raman spectroscopy to become a powerful tool for disease diagnostics and a common clinical tool in a hospital environment.
This work was supported by the Consejo Nacional de Ciencia y Tecnología (CONACyT), Mexico, through Grants CEMIE-Sol 32, and by the National Laboratory program from CONACYT, through the Terahertz Science and Technology National Lab (LANCYTT).