## 1. Introduction

### 1.1. Development of terahertz spectroscopy in biomolecular detection and medical diagnosis

The terahertz (1 THz is equal to 10^{12} Hz) is an electromagnetic wave located between the infrared and microwave regions of the electromagnetic spectrum. Its frequency is defined from 0.1 up to 10 THz. In this region, the spectra can respond and display spectra absorption to low-frequency vibrational modes of molecules, such as torsional and collective vibrational modes and hydrogen-bond modes, and rotational modes [1, 2]. THz waves have a feature of low energy, non-ionizing which provide the advantage of harmless to analyze living tissues [3, 4]. THz spectroscopic methods have been used in the biological sciences for investigating DNA [5], proteins [6, 7], and tissues [8].

The characterization and quantification of DNA are often regarded as a complex laborious process in bioscience. A number of different techniques are therefore offering a variety of approaches, like spectrophotometry [9], UV-induced fluorescence [10, 11], chip-based nucleic acid analysis system [12], etc., for charactering DNA. The terahertz spectrum has been determined to be a promising candidate for the characterization of DNA. Several alternative methods, including fluorescent chromophore labeling and techniques that use terahertz radiation, have been proposed and are currently in use [13]. Terahertz spectroscopy can characteristic DNA samples pretreatment free, fast, and sensitively. Nagel et al. reported a promising approach for the label-free analysis of DNA molecules using direct probing of the binding state of DNA with terahertz spectroscopy [14]. In comparison with free-space detection scheme formerly used, this method provides an impressively promoted sensitivity enabling analysis down to femtomol levels. Debanjan Polley et al. reported a dielectric relaxation study using terahertz spectroscopy of extended hydration sheathe of dilute aqueous solution of salmon sperm (SS) and calf thymus (CT) DNA samples, which are always used as model organism [15]. They fitted the frequency-dependent complex dielectric response according to a Debye relaxation model, which assumes three relaxation modes in their work on SS DNA and CT DNA. The observed relaxation time constants have high relation with that of bulk water and vary from any particular trend indicating to the extended hydrogen-bonded network of DNA in marginal modification. Though a variety of methods were established for characterization of DNA, they have disadvantages like alteration to the nucleic acid sequence, requirement of a thick DNA testing layer, and conductor structure complexity. From that point, THz spectroscopy has its advantage to be applied on the area of DNA detection.

The THz frequency also corresponds to global correlated protein motions, molecular interaction between protein molecules, which were proposed to be essential to functional conformational changes. Niessen used THz microscopy to inhibitor binding sensitivity and test reproducibility of the narrow-band resonances for lysozyme protein crystals. To achieve the data analysis of THz spectra data, they applied a rapid data acquisition technique. The THz spectra were changed dramatically and can be reproducible with inhibitor binding [16]. Chen et al. proposed an approach for automatic identification of biomolecule terahertz (THz) spectra based on the most used chemometric methods, like principal component analysis (PCA) and fuzzy pattern recognition [17]. Chen investigated THz transmittance spectra of saccharide biomolecular samples, and some typical amino acid and their results demonstrate that THz spectroscopy can be utilized for identification of biomolecules efficiently.

In the application of medical testing and diagnosis, THz spectroscopy and THz imaging have been applied to complex analysis [18]. THz imaging has been used for detecting micrometastatic foci of early-stage cervical cancer in the lymph nodes [19]. The in vivo tissue spectroscopic response in vivo tissues is highly depending on the constituent materials and their physical arrangement for the heterogeneous of tissues. This means that the measurements of in vivo tissues will be different from spectroscopic measurements on homogeneous samples of DNA, saccharide, fat, or proteins. THz interactions with biological components of tissue were reviewed by Smye et al. [20]. Woodward et al. demonstrated the application of terahertz pulse imaging (TPI) on skin or related cancer tissues. Using this technique, they detected imaging in reflection approach for the study of skin tissue and corresponding cancer tissue both in vitro and in vivo. The sensitivity of terahertz radiation to polar molecules makes THz spectroscopy and imaging be used for analyzing the hydration levels in the skin. And, it also has potential to be applied on the preoperative determination of the lateral spread of skin cancer. The terahertz pulse shape in the time domain was studied, and the results show that they were able to differentiate diseased and normal tissues for the study of basal cell carcinoma [21].

The study of Nazarov et al. showed that in the terahertz frequency range, small organic molecules had characteristic absorption lines. Large molecules and tissues had crucial absorption linearly increases with frequency. THz refection spectroscopy provides possibility to study strongly absorbing substances [22]. Using differences in THz absorbance, pathologic diagnosis between normal and cancerous tissues has been reported in the medical literatures [23, 24]. Knobloch’s results show that different kinds of tissues can be clearly distinguished from both the larynx of a pig and cancerous human liver using THz spectroscopy. Cherkasova et al. studied human and rat skin reflection spectra in vivo and the effect of glucose and glycerol on these spectra by THz spectroscopy [25]. Variations in skin optical properties were found in the frequency of 0.1 THz.

Hyperspectral imaging can provide information in both space and spectral dimensions. THz imaging has potential for tissue analysis and medical diagnosis as a very promising harmless approach for future imaging applications. It is hopefully to develop the THz imaging system in the future, which requires high-frequency resolution and a cost-effective and much more compact setup that does not necessarily require a laboratory environment based on the development of THz techniques.

### 1.2. Application of chemometrics on terahertz spectra analysis

THz spectroscopy should take advantage of chemometrics, which has applied in other fields of spectroscopy like infrared, near-infrared, Raman, or fluorescence. Chemometrics [26] provides multivariate tools for exploring the relationships among the objects and tested variables in collected dataset as well as classifiers. Chemometrics has been applied for qualitative and quantitative analysis of the THz spectra of not very complex mixture systems. Absorption intensity of THz spectra is proportional to the concentration of analytes in commonly used dynamic ranges; thus, normal linear modeling methods in chemometrics can be used in their calibration and prediction [27].

A review of terahertz pulsed spectroscopy summarized the most common chemometric methods applied for processing the THz spectra including the way of quantitative univariate and multivariate methods, and it can be found from this chapter [28]. Two quantitative analysis approaches are mostly used: the first is the method applied to a single spectrum without any calibration but using the intensity of the spectra for modeling, and the other is calibration based on a series of THz spectra of reference samples in order to predict quantitative information from unknown samples.

THz spectroscopy combine with chemometrics was reported for quantitative or qualitative analysis of mixture systems in environment, food, agriculture, material, biology, and medicine. Otsuka et al. did quantitative analysis of mefenamic acid polymorphs by terahertz spectroscopy with chemometric methods [29]. They studied the effect of spectra data preprocessing on the chemometric parameters of the calibration models. Hua et al. mostly used regression models like partial least squares (PLS) and principle component regression (PCR) methods for quantitative evaluation of cyfluthrin in n-hexane by THz-TDS [30]. Partial least squares (PLS) is one of the most effective and reliable methods normally being applied for quantitative analysis of various spectra. El Haddad et al. applied principal component analysis (PCA), PLS, and artificial neural networks (ANN) to quantitative analysis of ternary mixtures by THz-TDS [31], and they obtained good results. Ellrich et al. presented a postscanner by THz spectroscopy using chemometric methods for the evaluation of detected THz fingerprints [32].

Assessment of THz spectroscopy with chemometrics is still under studying including preprocessing, data selection and calibration methods. We strongly believe that THz spectroscopy should take advantage of multivariate analysis for advanced data processing, classification and calibration methods of chemometrics.

## 2. Instrumentation

For all of the work introduced in this chapter, a transmission THz-TDS cell configuration was used, as depicted in **Figure 1**. **Figure 2** gives schematic of a terahertz time-domain transmission spectrometer system used in this chapter. The THz spectroscopy system was equipped with a commercially available femtosecond laser (SPECIM, MaiTai) for generating the THz pulse. The femtosecond laser light is separated into two beams using a prism. One beam is the probe, and it travels across a free space to focus on the detecting antenna, so that the probe beam provides a relative time delay periodically. The other beam goes through a GaAs-based semiconductor antenna to generate the THz pulse. Then, a parabolic mirror with a hemispherical silicon lens is applied for improving the coupling efficiency of the THz radiation. The beam that passed through the sample placed at the focus of the parabolic mirror, and it is collected by another parabolic mirror. Finally, a photoconductive detector is used for signature collection [33]. In the experiments, the volume of the THz spectra system was filled with dry nitrogen (N_{2}) to reduce absorption caused by vapor in air.

## 3. Theory and practical applications

### 3.1. Parameter extraction from THz spectra

The measurement of the reference pulse and the sample pulse are necessary for calculating the THz absorption coefficient of a sample. When the tissues are analyzed by THz spectra for medical testing or diagnosis, the sample pulse collected is transmitted through the tissue slides, and the reference signal is the THz signal transmitted without the tissue slides. The THz electric field pulses can be calculated as a function of time and the frequency for both signal crossed the tissue sample and reference passed nothing. The frequency domain spectra are obtained by the fast Fourier transform (FFT) in this work. The refractive index *n*(*ω*) describing the dispersion and absorption coefficient *α*(*ω*), describing the absorption characteristics, can be calculated through the following equations [33]:

for which *ω* is the frequency, and *ρ*(*ω*),*k*(*ω*) and *ϕ*(*ω*) are functions for the amplitude ratio, extinction coefficient, and phase difference of the sample and reference signals, respectively. *d* is the thickness of sample, and *c* is the velocity of light in vacuum.

### 3.2. Chemometric methods applied in this chapter

#### 3.2.1. Data preprocessing methods

The Savitzky-Golay (SG) method is a polynomial filter that performs numerical differentiation and smoothing [34]. This filter removes the noise in the dataset analyzed and simplifies the computation during the model building. SG method has the ability to process the signals with little delay and with no shifts of the peaks, and it can be performed in a computationally efficient procedure by applying least squares on subsets of the data. In the dataset, a window moves forward within 2m + 1 points fitting by a polynomial of degree *p* (in which *p* ≤ 2 m). The *d*th (0 ≤ *d* ≤ *p*) differentiation of the original data at the midpoint is obtained by performing the fitness polynomial. Finally, the convolution of the entire input data with a digital filter of length 2m + 1 is performed by running least squares polynomial fitting [35, 36].

The multiplicative scatter correction (MSC) is a transformation method used to cope with scaling and offset effects, which is mostly applied in spectral data analysis [37]. It is used to counterbalance for additive and/or multiplicative effects in spectral data. MSC assumed that each spectrum collected from the samples is determined on one hand by the actual sample characteristics and on the other by the particle size. It can also decrease or remove physical effects like particle size and surface blaze, and it corrects differences in the baseline and in the trend. The sample preparation for THz spectra collection always has problems about the difference of particle size during grind and compactness during tableting. So, it has an advantage that the transformed spectra are similar to the original spectra without the effect of baseline effect and the trend of a standard spectrum and that an optical interpretation is therefore more easily accessible [38].

Orthogonal signal correction (OSC) can be applied to remove systematic noise such as baseline variation and multiplicative scatter effects, which is a data processing technique introduced by Fearn [39]. The basic idea of the OSC method is to remove the systematic variations in the collected data that are orthogonal or not related to the properties of the dependent variables. The removed information can be structured noise, such as baseline, instrument variation and measurement conditions. Some reports show that the use of OSC may not result in calibration models with lower prediction errors than models based on raw data. The advantage of using OSC lies in the analysis and interpretation of the corrected data but not in decreased prediction errors. By removing orthogonal information, the important calibration information will be concentrated in fewer principal components instead of being distributed among many linearly dependent variables.

The PC-OSC method used the constraints based on OSC and also applied the theory of principal component analysis (PCA). The detailed procedure of PC-OSC can be referred in Ref. [40]. The emphatic orthogonal signal correction (EOSC) method can be used for the baseline correction of Raman spectra or near- infrared spectra, and it is a method that can be extended to apply on THz spectra. The theory and procedures of EOSC can be found in Refs. [4, 41].

Asymmetric least squares (AsLS) method calculates complex baseline shapes by adjusting the asymmetry parameter and the smoothness parameter. The asymmetry parameter is related to the position of the baseline, and the smoothness parameter related to the flexibility in the shape of the baseline. The AsLS method has been proposed by Zhang et al. [42]. By minimizing the penalized least squares function based on the Whittaker smoother, AsLS method estimates a background contribution and removes or decreases the baseline [43]. The application of this method can be found in Ref. [44]. The initial range of lambda is 10^{2}–10^{5}, and p is 0.099, which are experienced parameters based on literatures and our laboratory work experience. We adapted the results to PLS models, respectively, and obtained the best result when lambda is 10^{3} [27].

Wavelet transform not only can compress the data to extract feature information but also can remove the noise in the spectral data. In brief, wavelet analysis is based on the wavelet transform of analysis signal. The signal of different spatial scales (frequency) is divided into high-frequency and low-frequency part, and the position of each component on the time axis remains the same. A more detailed description of wavelet analysis algorithm can be found elsewhere [45].

**Figure 3a** displays raw THz absorption spectra of the prepared samples of binary amino acid mixtures of L-glutamic acid and L-glutamine. **Figure 3b**, **3c** and **3d** displays the THz spectra of the binary amino acid mixtures with preprocessing of SG smoothing, MSC, and AsLS, respectively. From **Figure 3**, we can observe that the absorption spectra with processing of SG smoothing have eliminated the effect of noise and display spectral characteristics more clearly. By using MSC, the scaling and offset effects were removed. Normally, using OSC can get very similar results to that applied in MSC. AsLS can eliminate or decrease the baselines in the THz spectra of the samples. The absorption bands are more easily attributed to certain wavelength in this work. The binary mixtures under different concentration ratios display certain quantitative relation on THz absorption coefficients, which would be the basis for quantitative analysis by THz-TDS transmission spectroscopy. We did not apply wavelet transform on the sample analysis in this chapter, for the other preprocessing could obtain good result already. Anyway, wavelet transform still should be considered in the future work, since it is a powerful method for de-noising, compressing the dataset, removing the background information, etc.

#### 3.2.2. Examples of chemometric methods for regression model building of THz spectra

### 3.2.2.1. Principal component analysis

Principal component analysis (PCA) is commonly used to reduce the number of predictive variables. PCA condenses all the spectral information into a few linear combinations of the latent variables instead of the original variables. The linear combinations of the variables can be used to summarize the data without losing too much information but remove noise in the process [46]. It can be used to identify the underlying structure of large datasets and can be used to identify groups within the data from complex mixtures. It can also be used for removing any contribution from noise.

### 3.2.2.2. Partial least squares

Partial least squares (PLS) is one of the most used chemometric techniques to build quantitativemodel based on principal component analysis and principal component regression. PLS extracts the orthogonal features from the spectrum and then constructs the correlation between the spectra matrix (independent variable, X) and concentration matrix (dependent variable, Y). The detailed procedure of PLS can be referred to in Ref. [47].

We have investigated PLS for quantitative analysis of L-glutamic acid and L-glutamine using THz spectroscopy. Also, we compared the difference between iPLS and PLS. iPLS divided the whole spectrum into several intervals and builds PLS model for each subset to evaluate the most suitable sub-dataset for a stable model. The subset or several subsets with the lowest root mean square error of cross validation (RMSECV) are chosen for the PLS model building [27]. The iPLS yielded better results with low RMSEP (0.39 ± 0.02%, 0.39 ± 0.02%), and higher R^{2} values (0.9904, 0.9906) for glutamine and glutamic acid comparing to the conventional PLS models.

We also analyzed binary isomer of saccharide mixtures, D-(−)fructose and D-(+)galactose anhydrous quantitatively using THz-TDS combined with PLS. The result showed that correlation coefficient (R^{2}) between true and predicted values is higher than 0.9773. The mean value of root mean square error of prediction (RMSEP) in cross validation set was less than 1.26%.

Therefore, THz-TDS combined with chemometrics is feasible for quantitative analysis of the biomolecular mixtures and may also be extended to analysis of more component mixtures.

#### 3.2.3. Examples of chemometric for classification model building of THz spectra

### 3.2.3.1. Partial least squares-discriminant analysis

Partial least squares-discriminant analysis (PLS-DA) is a supervised classification method based upon partial least squares regression [47]. The PLS-DA algorithm models the relationship between the measured variable of the dataset and the target variables corresponding to the class label [48]. PLS-DA extracts the latent variables by reducing the dimension of the dataset like principal component analysis and finds the maximum separation among the classes. The latent variables explain both the variance of the THz spectral data and the high correlation with the response matrix that encodes the class membership [33]. Component number, which has high relative with the accuracy rate and the percentage of the explained variable, is a very important parameter for prediction accuracy and explanation of the model that needs to be estimated for PLS-DA.

### 3.2.3.2. Support vector machine

Support vector machine (SVM) is a powerful machine learning method with associated learning algorithms that analyze data used for classification and regression analysis [49]. For classification tasks, based on the structural risk minimization principle, this method attempts to find the separating hyperplane which has the largest distance from the nearest training data points. LIBSVM was one of the mostly used toolboxes, and the SVM calculations in the example in this chapter used the linear kernel function [50].

The two essential factors that affected SVM classification performance are (1) error penalty parameter C, which is the compromise between the proportion of error classification samples and algorithm complexity, and (2) form of kernel function and its parameters. Different kernel functions have influence on the classification performance, while different parameters of same kernel function may also affect the results.

### 3.2.3.3. Fuzzy rule-building expert system (FuRES)

FuRES is a classification tree model using fuzzy entropy of classification which each rule is a temperature-constrained sigmoid logistic function [51]. The rules of FuRES are similar to the processing units in most feed-forward artificial neural networks, while FuRES processing units differ in that the weight vector is constrained to an optical length. The inductive dichotomizer 3 is used for classification by minimizing *H*(*C*| *A*), the classification entropy. The weight vector *w* should be normalized before modeling. A temperature parameter *t* is used to control the fuzziness of each rule. When optimizing the computational temperature, the maximized extend of the entropy for classification can be found [4]. The equations for this method are given below:

for which a is the bias value and *x*_{A}(*x*_{k}) is the degree of fuzzy membership of object *x*_{k}. [4]. The conditional probability *p*(*c*_{i}| *a*_{j}) is obtained by summing the membership functions with the attribute *a*_{i} and the class of *c*_{i}.The equation is given as follows:

where *n*_{i} is the number of objects in class *c*_{i}. The classification entropy *H*(*C*| *a*_{i}) of the attribute *a*_{i} is given by

The classification entropy *H*(*C*| *A*) of the system is the weighted sum of the entropy for each attribute:

The FuRES model provides inductive logic in the tree structure of the classifier. In this way, it can accommodate overlapping the data and avoid overfitting the data by the temperature constraint. The advantage of fuzzy classification trees comparing to network classifiers is that they furnish a simple inductive structure that is amenable to interpretation.

### 3.2.3.4. Fuzzy optimal associative memory (FOAM)

An optimal associative memory (OAM) is using a one-way data to replace binary image of encoded multivariate data as a two-way binary image for fuzzy method [52]. Bipolar matrix with similar size grid unit is built first. A vector of ν variables is converted to ν × *h* bipolar matrix. After removing *u* unused grid, the number of grid is *k* ((ν × *h*) − *u*) [4]. The FOAM stores pattern in a weight matrix ** W,** which expressed by

for which ** y_{i}** is the ith bipolarly encoded approach [4]. The stored grid-encoded spectra are orthogonalized to form a basis using singular value decomposition. The encoded predicted background scan

**can be obtained by**

*z*_{f}for which ** V** is a matrix following the orthogonalized pattern of the collected data variables. Then,

**is decoded to a spectrum vector by changing the gridding procedure. The data object can be assigned to the corresponding classes to make the building fit with the minimum error after reconstructing the raw data. In this chapter, FOAM used its standard configuration of 100 intensity grids and a 19-point triangular fuzzy membership function [4].**

*z*_{f}### 3.2.3.5. Validation

Bootstrapped Latin partitions (BPLs) are generalized for evaluation of calibration methods based on cross validation and random sampling verification [53]. Unbiased evaluation of classification or calibration methods is important, especially as these methods are applied to increasingly complex datasets that are under-determined like THz spectra dataset. Precision bounds, such as confidence intervals, are required for interpreting any experimental result. By using BPLs unbiased and reliable evaluation can be gotten by systematically model with samples drew from an arbitrary discrete distribution.

### 3.2.3.6. Example of diagnosis of cervical cancer

In this work, THz-TDS system was applied to detect the normal and malignant tissue sections as an example for chemometric applications on THz spectral analysis. The classification models combined with different pretreatment methods were established to build a new diagnosis technique for cervical cancer diagnosis based on terahertz spectroscopy. The effects of different preprocessing methods on THz spectra data to de-noise, remove baseline and optimize model were investigated.

The normal and cancerous cervical tissues were collected and provided by Beijing Haidian Maternal & Child Health Hospital. To keep the tissues, all the cervical tissues were put into 4% formaldehyde solution. The tissues were washed with ethanol solutions for dehydration when we analyzed the samples. The tissues were put into xylene for hyalinization and then embedded by paraffin wax before sliced into 8 μm thick sections. The water-flatted sections were spread upon quartz plates and then put in a regulated heating oven and dried. Two replicate slides were prepared for each of the tissue sections [33].

To establish a model for diagnosis of cervical cancer, PLS-DA, FuRES, FOAM, and SVM were used to build classification models. The parameter for FuRES and FOAM was determined using a self-optimizing PLS-DA from the training datasets. A method to verify the accuracy of classification and calibration models, bootstrapped Latin partition, was used for cross validation of the calibration dataset. When we built PLS-DA and SVM models for this study, the matrix of category variables was used, one for the normal samples and two for the cancer tissue samples. When the predicted value of an external test tissue sample in PLS-DA model was smaller than 1.5, the sample was assigned to normal class or assigned to cancer class otherwise [33].

The classification results of these methods were compared when the data was processed with different preprocessing. MSC, SG smoothing, SG first derivative, EOSC, and PC-OSC were used for pretreatment of the THz spectra, respectively, and the data were normalized before modeling. The performance of preprocessing methods applied in this work was compared. The results of the modeling approaches after pretreatment are evaluated by the pooled prediction rates. The raw data were divided as training datasets and prediction sets based on KS method. The pretreatments were constructed from the training datasets and applied to the prediction sets. For cross validation, five Latin partitions bootstrapped 50 times were applied to evaluate the prediction accuracy of the classification models with different parameters and different pretreatments. For each bootstrap, the data were separate as training set and test set, and each spectrum was used only once in the test set. Each time, four Latin partitions were used for calibration during model building, and the fifth was used for prediction. The predicted results of the five test sets from each partition were pooled back after each validation was finished. This approach was used for evaluations of all the classification models in this work. The average prediction results were calculated with 50 bootstraps to give 95% confidence intervals [33]. The number of components of the OSC model was selected by finding the maximum average classification rate across internal 100×5 bootstrap Latin partitions. All model optimization and construction were performed in MATLAB (MATLAB 7.14.0.334. The MathWorks Inc.).

The obtained results showed that the FuRES and FOAM model using Savitzky-Golay smooth by the first derivative and PC-OSC as pretreatment methods had provided a good predictive results and their classification rates are 92.9 ± 0.4 and 92.5 ± 0.4 %, respectively [4]. The results of the proposed methods show that terahertz spectroscopy combined with fuzzy classifiers could supply a technology which has potential for diagnosis of cancerous tissue. Combining SG first derivative with PC-OSC as signal pretreatment procedure, the prediction accuracies of the optimal SVM and PLS-DA were 94.0 ± 0.4 and 94.0 ± 0.5%, respectively. Therefore, SVM and PLS-DA with the combination of SG first derivative and PC-OSC based on terahertz spectroscopy of tissue can also provide a good application for diagnosis of cervical carcinoma.

Comparing the classification accuracies pretreated by different preprocessing methods, it indicated that the classification models applied based on terahertz spectroscopy of tissue could provide a better application for early diagnosis of cervical carcinoma, with high classification accuracies. Coupled with terahertz technology, the proposed procedure could provide a convenient, solvent-free, and environmentally friendly application that had a potential development as cancer diagnosis method.

#### 3.2.4. Examples of chemometric for resolution of THz spectra data

### 3.2.4.1. Multivariate curve resolution

Multivariate curve resolution (MCR) is designed to solve the analysis problem of mixture systems following bilinear model. The MCR methods decompose the raw mixed measurement datasets into matrix corresponding to pure concentration profiles and pure spectra. Constrains following physical and chemical property can be flexibly used during iterations when alternating least squares is applied and the maximum variance of the raw measurement data is explained. The profiles in the bilinear model resolved by MCR are physically and chemically meaningful and correspond to interpretable patterns of variation of principal components [27, 54, 55].

In our study, MCR-ALS was applied to resolve binary amino acid mixtures of L-glutamic acid and L-glutamine analyzed by THz-TDS. Non-negativity constraint was applied on both spectra and concentration directions during the iterations. The spectra of the pure analytes obtained from the MCR displayed are corresponding to glutamine and glutamic acid. MCR results provided fitting error in % (exp) is 6.731, and percent of variance explained (R^{2}) at the optimum is 99.55, and this results show that the MCR model can fit the raw data well in this case. The correlation coefficients (r^{2}) between the reference THz spectra of pure analytes and those resolved pure spectra for each principal component by MCR are 0.9990 and 0.9979 for glutamine and glutamic acid, respectively. The spectrum of glutamine resolved by MCR is in good agreement with that measured by THz in the laboratory. The fitting constant is 0.9999.

MCR-ALS was also applied to resolve the binary isomer mixtures of D-(−)fructose and D-(+)galactose analyzed by THz-TDS. The absorption spectra of the two components obtained by MCR-ALS were well fitting to spectra of pure D-(−)fructose and D-(+)galactose anhydrous, which were obtained from experiment data in the same condition, respectively. The results of correlation coefficient between THz spectra obtained by MCR-ALS and pure D-(−)fructose and D-(+)galactose samples are 0.9974 and 0.9933, respectively. Relative concentrations of the two components were resolved by MCR-ALS, and they can fit the true concentrations well. MCR-ALS successfully resolved pure THz spectra of the components in the binary isomer mixtures and their corresponding concentrations.

Hyperspectral images based on infrared, near-infrared, Raman, and fluorescence are an active area of research that has grown quickly since a decade ago [56]. THz time-domain imaging is an emerging modality and has attracted a lot of interest since THz spectra got successes [57]. MCR-ALS can easily resolve the pure spectra and their corresponding concentration distributions from hyperspectral imaging, as well as external spectral can be used for local rank constraints. In the future, MCR-ALS can be proposed to resolve the biomedical images based on THz spectroscopy based on the development of THz hyperspectral imaging [56]. We believe that THz time-domain imaging will provide potential for medical diagnosis.

## 4. Does terahertz radiation lead to DNA or tissue damage?

Investigations of the interaction between nonionizing electromagnetic radiation and biological systems are necessary before the application of electromagnetic radiation in medical diagnosis. Recent emergence and growing use of terahertz radiation for medical imaging and public security screening raise questions on reasonable levels of exposure and health consequences of it. In particular, picosecond-duration THz pulses have shown promise for novel diagnostic imaging techniques. From different studies, the researcher got different conclusions on the effects of THz pulses on human cells and tissues. Titova et al. studied the biological effects of THz radiation on artificial human skin tissues [58], and their work shows that THz pulse irradiation may cause DNA damage in exposed skin tissue when intense THz pulses are applied. They consider that DNA damage repair mechanisms are quickly activated after THz radiation. But they found that the cellular response to pulsed THz radiation is significantly different from that induced by exposure to UVA (400 nm). However, Hintzsche et al. investigated power intensities ranged from 0.03 to 0.9 mW/cm and the cells were exposed for 2 and 8 h. Chromosomal damage and DNA damage were not detected in the disdained condition. Cell proliferation was also found to be unaffected by the exposure [59]. Bogomazova studied DNA damage and transcriptome responses in human embryonic stem cells (hESCs). They did not observe any effect on the mitotic index or morphology of the hESCs following THz exposure [60]. Anyway, THz is still one of the most harmless techniques for biomolecular analysis and medical diagnosis, and it will be powerful to detect the information from DNA, protein, or tissues.

## 5. Conclusion

Terahertz technology is progressing in biological and medical diagnosis in recent years. It has potential to be applied on analysis of biological molecular, cellular, tissues and organs, since terahertz is a nondestructive technique. The study displayed in this chapter indicates that terahertz spectroscopy and terahertz imaging technology combined with chemometrics can give accurate classification for normal and cancer tissues, predict the concentration of different compounds, and resolve the pure THz spectra of biomolecules in mixture systems. Comparing the classification accuracies pretreated by different preprocessing methods, it indicated that the models applied based on terahertz spectroscopy could provide a better application for medical diagnosis. Terahertz is still one of the most harmless techniques for medical diagnosis, and it is hopefully to be developed as a powerful technique to detect the information from biomolecules or tissues in the future.