The results of PLSR models for predicting MC in tea buds based on full spectra and the effective wavelengths.
This work employed hyperspectral imaging technique to map the spatial distribution of moisture content (MC) in tea buds during dehydration. Hyperspectral images (874–1734 nm) of tea buds were acquired in six dehydrated periods (0, 3, 6, 9, 14 and 21 min) at 80°C. The spectral reflectance of tea buds were extracted from region of interests (ROIs) in the hyperspectral images. Competitive adaptive reweighted sampling (CARS) was used to select effective wavelengths (EWs) and ten representing the wavelengths were selected. The quantitative relationship between spectral reflectance and the measured MC values of tea buds was built using partial least square regression (PLSR) based on full spectra and EWs. The quantitative model established using EWs, which had a result of coefficient of correlation (RP) of 0.941 and root mean square error of prediction (RMSEP) of 5.31%, was considered as the optimal model for mapping MC distribution. The optimal model was finally applied to predict the MC of each pixel within of the tea bud sample and built the MC distribution maps by utilization of a developed image processing procedure. Results demonstrated that the hyperspectral imaging technique has the potential of mapping the MC spatial distribution in tea buds in dehydrated process.
- NIR hyperspectral imaging
- tea buds
- moisture content
- spatial distribution
- dehydration process
Tea, one of the most popular beverages worldwide, is of great interest due to its beneficial medicinal properties [1, 2]. Tea products are mainly made from the processed tea buds or fresh tea leaves of a plant called Camellia sinensis. In the tea processing, a great number of moisture are always changed along with a series of physical and chemical reactions. Especially in the drying stage with thermochemical reactions under high temperature, variations of moisture content (MC) in tea can directly affect smell, taste and others quality characteristics. With the growing consumption of the tea products, high qualities of tea products become more and more important nowadays. Therefore, in order to produce the high quality of tea products and prolong its shelf life, the determination of MC distribution in tea is quite meaningful in modern society.
The conventional way to analyze MC includes the gravimetric method, oven-dehydrated , freeze-dehydrated or lyophilization , electronic moisture analyzer , and so on. Those methods are time-consuming, tedious and fail to meet the requirements of real-time, on-line detection of MC in tea processing. In addition, the same sample cannot be reused for any other purpose and those methods may debase the quality of tea products through directly touching way.
In recent years, spectroscopy technique has proved to be a powerful tool for detecting the MC in tea products and agricultural sideline products. For example, Mizukami et al.  developed a new method for measuring the moisture in tea leaves using an electrical spectroscopy. Diffuse reflectance spectroscopy combined with chemometric analysis were employed to investigate MCs in tea . Sinija and Mishra  employed Fourier transform near infrared (FTNIR) spectroscopy to measure MC in green tea. However, spectroscopy technique is not able to provide spatial information of quality parameters, which greatly limited its application to quantify spatial distribution.
Hyperspectral imaging, a powerful analytical tool, has attracted a great deal of attention for the safety detection of agricultural and sideline products. It integrated conventional spectral information and digital imaging into one system, which made it possible for providing both spectral and spatial information of an object simultaneously . Over the past several years, hyperspectral imaging has many potential applications for quantifying and controlling of quality parameters with good precision. It is widely applying in evaluation of various agricultural products, such as beef , pork  and lamb , moisture in prawn , mushroom , moisture in banana , strawberry  and maturity and firmness of apple , and texture analysis to classify green tea .
However, to the best of our knowledge, applying the hyperspectral imaging technique to determine the moisture distribution in tea buds has not been found to date. There are also some broadband peaks occurring in the NIR region related to the overtone and combination vibrations of hydrogen containing bonds, such as O–H, C–H, and N–H . The presence of water (O-H) in the tea buds showed two feature wavelengths at 980 and 1450 nm (O–H stretching second and first overtones) in NIR region. So, this research employed the NIR hyperspectral imaging for predicting and mapping the distribution of MC in tea buds. The steps of the work are to: (1) obtain hyperspectral image of tea buds in NIR region of 874–1734 nm and measure the MCs of tea bud samples in dehydrated process; (2) extract spectral data of the region of interests (ROIs) from the acquired hyperspectral images; (3) select the effective wavelengths which carried the most valuable information related to MC prediction and build the quantitative models; (4) develop an image processing procedure for mapping the spatial distribution of MC in tea buds. The main steps involved in building MC distribution maps are presented in Figure 1.
2. Materials and methods
2.1 Pretreatment of tea buds samples
In this research, buds of tea bushes (C. sinensis cv. Longjing 43) were prepared for the experiment. Tea bushes were planted about 6 years in the Zijingang campus of Zhejiang University, Hangzhou (30°16′N, 120°20′E), China. Each tea bud sample contained three fresh leaf blades. A total of 216 tea bud samples were randomly collected from different tea clusters on 2 April 2013. At first, random 36 tea buds were selected for acquiring the hyperspectral images. Then the remaining 180 samples were randomly divided into five groups. These five groups of tea buds were implemented to dehydrate at 80°C by an attemperator (IKA@C-MAG HS4, Germany) for corresponding five dehydrated times (3, 6, 9, 14 and 21 min), respectively. Meanwhile, all samples were scanned by to acquire hyperspectral images in the corresponding dehydrated time.
After acquirement of hyperspectral images, the MC of all samples was measured by the gravimetric method according to the Chinese National Standard GB8304-87. In detail, all samples were dried in a constant temperature oven at 103°C for 18 h. Meanwhile, an electronic balance with accuracy of 0.0001 g was employed to weight all samples after acquiring hyperspectral images and drying. All the measurements were carried out in a room at approximate constant temperature of 25°C and relative humidity of 35–45%. In addition, all the 216 tea bud samples were divided into a calibration set (162 samples) and a prediction set (54 samples) by Kennard-Stone (K-S) algorithm .
2.2 Hyperspectral imaging acquiring equipment
In this study, a laboratory pushbroom hyperspectral imaging equipment (Figure 2) with reflectance mode was employed to scan all the samples. As Yu et al.  described, the core sensing components of the equipment consisted of several parts: a conveyor belt operated by a stepper motor (IRCP0076, Isuzu Optics Crop, Taiwan, China); an illumination unit assembled by two 150-W quartz tungsten halogen lamps (Fiber-Lite DC950 Illuminator, Dolan Jenner Industries Inc., USA); an imaging spectrograph (ImSpector N17E, Spectral Imaging Ltd., Finland) covering a spectral range of 874–1734 nm; a CCD camera (C8484-05, Hamamatsu, Hamamatsu city, Japan) coupled with a camera lens (OLES23; Specim, Spectral Imaging Ltd., Oulu, Finland) and a computer with the spectral-cube data acquisition software (Isuzu Optics Corp, Taiwan, China), which could set and adjust the speed of conveyer belt, exposure time, binning mode, wavelength range, image acquisition, images calibration and so on. Overall, all the components (except computer) were fixed inside a dark chamber to avoid any stray light which might affect the veracity of hyperspectral imaging equipment.
In order to acquire clear and undistorted hyperspectral images, some parameters of the equipment needed to be adjusted before the images acquirement. Firstly, illumination unit should set an appropriate intensity and adjust a proper angel to make the light gather in a linear area of the conveyor belt just below the imaging spectrograph. Then, two reference reflectance panels with reflectance of 99.9 and 0% were adopted for dark and white reflectance calibration of sample. In this study, the distance between samples and the lens was 165 mm. All samples placed on the conveyor belt and moved at a speed of 14.5 mm/s to be scanned with an exposure time of 5 ms during the image acquisition. Gradually in line by line pattern, a hyperspectral image called “hypercube” with dimension of (x, y, λ) was built. In this study, the hyperspectral images were obtained with 320 pixels in x-direction, n-pixels in y-direction (based on the length of each sample) and 256 wavelengths in λ-direction.
2.3 Calibration of hyperspectral images
Because of the existence of dark current in CCD camera and the uneven intensity of illumination in different bands, several bands with weaker light intensity contained the bigger noises . Based on this point, the raw hyperspectral images (Iraw) required to be calibrated and the calibration process could be finished using the following Eq. (1) [20, 21]:
where, R were the calibrated hyperspectral images of the samples; Idark were the dark reference images (~0% reflectance) obtained with light source off; Iwhite were the white reference images (~99% reflectance) acquired from a white reference ceramic tile. Then the calibrated images were used as the basis for subsequent processing and analysis.
2.4 ROIs identification and spectral data extraction
Spectral data were extracted by the region of interests (ROIs) function of ENVI software. An irregular ROI was identified by initially shape of tea bud in hyperspectral image. Then, the mean relative reflectance for each image by averaging the spectral responses of each pixel in the ROI was calculated. According to this procedure, a total of 216 mean reflectance spectra were obtained from the hyperspectral images of tea bud samples. Because of because the response of the CCD detector  and strong noise existence, the reflectance in two regions of 874–950 and 1670–1734 nm was rather low and littery. Therefore, hyperspectral images were resized to the spectral range of 950–1670 nm with a total of 214 wavebands.
2.5 Chemometric of spectral data processing
Competitive adaptive reweighted sampling (CARS), a novel algorithm for selecting important variables , was employed to select the effective wavelengths from the full range spectra of the calibration in this study. Details of the CARS methodology could be found in Li et al. .
Partial least square regression (PLSR), one of the most robust and reliable analytical tools for modeling, is a linear and supervised multivariate calibration method . PLSR projects the spectral data onto a set of orthogonal factors called latent variables (LVs), and explores the optimal function by minimizing the error of sum squares (finding the optimal LVs), which is typically done by cross-validation . The process of extracting the LVs should take the response variable into account. In this research, the quantitative model between the spectral reflectance and MCs was established using the PLSR.
The performance of a calibration model is usually evaluated according to coefficients of correlation (R) and root mean square error (RMSE) in calibration (RC, RMSEC), in cross-validation (RCV, RMSECV) and in prediction (RP, RMSEP). Generally speaking, a model with larger values of RC, RCV and RP, smaller values of RMSEC, RMSECV and RMSEP is wonderful, and it has a small difference between RMSEC, RWSECV and RMSEP.
In this research, data extractions, statistical calculations and multivariate data analyses were executed with ENVI 4.6 software (ITT Visual Information Solutions, Boulder, CO, USA), “The Unscrambler X 10.1” (CAMO PROCESS AS, Oslo, Norway) and MATLAB 7.8 (R2009a) software (The Math Works, Inc., Natick, MA, USA). The developed procedures for mapping MC distribution were completed in MATLAB.
3. Results and discussion
3.1 Spectral features of tea buds and statistics of measured MC
In general, NIR spectra region contained rich information relevant to hydrogen containing bonds than others spectra region . To compare spectral trends over six dehydrated periods, the mean spectral values of the pixels within the ROI of tea bud samples were calculated. And those values exhibited some variances and overlays between two adjacent dehydrated periods (not given here). The mean spectral reflectance curves are illustrated in Figure 3. There were also some broadband peaks occurring in the NIR region related to the overtone and combination vibrations of hydrogen containing bonds, such as O-H, C-H, and N-H . As is shown in Figure 3, the existence of water in the tea buds showed two feature wavelengths around 980 and 1450 nm (O-H stretching second and first overtones). Additionally, the absorption peak around 1200 nm (C-H stretching second overtone) was due to organic matter content in tea bud. Because of the complex chemical compositions (maybe including C-H and N-H) in tea buds, it is hard to find a clear trend of curves over MC within 950–1100 nm. However, it was worth noting that the spectral reflectance curves over MC showed a clear upward trend in the vicinity of 1450–1650 nm during the dehydrated processing in five periods (0, 3, 6, 9, 14 and 21 min).
In addition, Figure 4 summarizes the statistics of MCs including mean, max, min and standard deviation (SD) values of tea bud samples in six dehydrated periods. It could be concluded that the mean, max and min values of MC appeared an obviously decreasing trend. Especially in mean values of those groups, a remarkable gradient (declining about 10%) was easily observed.
3.2 Variables selection
In this study, CARS was employed to select the effective variables. During the CARS process, some key variables were survived, while incompetent variables were sifted out. Figure 5 demonstrated the process of variable selection by CARS.
Figure 5(a) illustrated that the number of sampled variables decreased fast at the first stage of EDF and then slowly at the second stage of EDF, which demonstrated “fast selection” and “refined selection”. And in Figure 5(b), it was clearly that along with the number of sampling runs increased, RMSECV values first reduced in sampling runs 1–4, and then fluctuated in a gentle way in the sampling runs 5–33, finally in sampling runs 34–50 increased fast. In this process, most of uninformative variables were eliminated, and finally the RMSECV value increased because of the loss of some key variables . The optimal variable subset was determined corresponding to the minimal 5-fold RMSECV value, and located by the vertical blue asterisk line in Figure 5(c). Moreover, the regression coefficient path of each wavelength was also shown in Figure 5(c). The variation of coefficient values of each variable was recorded by the colorful lines at different sampling runs. At the beginning of the each number sampling run, the absolute value of regression coefficient of each wavelength was very lowly. After that, values of some variables had a growing trend, while the rest of variables became smaller and smaller and turned into zero eventually (those were weeded out) because of their incompetence. In other words, the larger the absolute coefficient was, the more possibility the corresponding wavelength was able to survive.
Based on the calculation of CARS, ten wavelengths at 1133, 1173, 1332, 1372, 1419, 1446, 1450, 1507, 1538 and 1595 nm were identified as the EWs for predicting MC of tea buds. And the distribution of the selected EWs based on CARS was demonstrated in Figure 6.
Obviously, most of those selected EWs (1419, 1446, 1450 and 1507 nm) were scattered around the O-H stretching first overtones (1450 nm). Comparatively speaking, only two effective wavelengths (1133 and 1173 nm) were centered in C-H stretching second overtone (1200 nm), which might be related to organic matter of tea buds.
3.3 Modeling of MC in tea buds by PLSR
In this research, the multivariate models were established by PLSR algorithm with full spectra and EWs, respectively. In the PLSR model of calibration set, the quantitative relationship between the spectral reflectance and corresponding measured MC of tea bud samples was established. Table 1 displayed the statistical results with respect to the prediction of MC in tea buds by using the full spectra and EWs.
|RC||RMSEC (%)||RCV||RMSECV (%)||RP||RMSEP (%)|
Form Table 1, the PLSR model based on full spectra (F-PLSR) with RC = 0.956, RMSEC = 5.22%, RCV = 0.908, RMSECV = 5.93%, RP = 0.946, RMSEP = 5.07% had the better result for predicting MC in tea buds. Compared to the F-PLSR model, the results of CARS-PLSR model had a slight drop in RC, Rp of 0.008, 0.005, respectively. The above results indicated that models using full spectra for predicting MC of tea buds had excellent predictive accuracy and robustness.
Regretfully, full spectra had a high-dimensional data and F-PLSR model could not provide a simple linear function about the reflectance of spectral reflectance and MC of tea buds. On the contrary, the selected ten EWs had minimal redundancy and offered a commendable prediction performance (RP = 0.941). Hence, CARS-PLSR model was considered as the ideal model for the predicting MC of tea buds during the six dehydrated periods. The obtained function Eq. (2) according to CARS-PLSR model was shown as follows:
where, λi nm was the spectral reflectance at the wavelength of i nm, and Ymositure was the predicted moisture content of tea buds. In addition, the obtained function was also taken for further analysis of mapping the spatial distribution of MC in tea buds.
3.4 Distribution maps of MC in tea buds
For predicting MC in all spots of the sample, the CARS-PLSR model was then transferred to each pixel of the image. After multiplying the model’s regression coefficients by the spectrum of each pixel in the image , a prediction image (called distribution map) was built and exhibited the spatial distribution of MC of the sample. In the final distribution map, the pixels with similar spectral characteristics would generate the same predicted values of MC, which were led to a similar color in the acquired image [7, 25].
Figure 7 shows examples of spatial distribution maps of tea buds with different MC levels in six dehydrated periods. Figure 7(a) showed the pseudo-color images of six tea bud samples with different MC values. The values at the top of tea buds represented the average concentration of moisture in the whole samples. As seen clearly that the color of samples changed from emerald to silver along with MC values decrease. In addition, it was so hard to find out the difference in MC from point to point by naked eye from the pseudo-color image. Surprisingly, the different MC level among the samples was very obvious to be discerned from the final distribution maps as shown in Figure 7(b). A linear color scale was generated with the different MC values from small to large shown in different color from blue to red. The MC of tea buds from high to low was displayed in different colors from red to blue. Meanwhile, the difference of MC level within a sample could be easily identified. A comprehensive map with different colors indicated that there were mixed components and heterogeneous distribution of MC in samples. Many pixels in fresh tea bud (0 min dehydrated) were red or orange because it had an average MC value of 64.67%. Along with the dehydrated time increased, the color of pixels changed from red, orange, pale bluish green to blue, which indicated the moisture of tea bud gradually lost. It was worth noticing that the edge of blade was turned blue firstly, then the vein, and finally the petiole. Especially in two dehydrated periods (9 and 14 min), the color of leaf petiole and blade was obviously different. This is a clear indication of MC status during the dehydrated process of tea buds.
In fact, the water in the blade is free water in mesophyll cells while it exists in a form of the bound water in xylem, leaf vein and petiole. Meanwhile, the free water can be easily dried in a short time at a lower temperature, in contrast evaporation of the bound water requires a long time at a higher temperature. So the MC of the blade is lower than that of leaf vein and petiole in the early stage of drying. Through visualization analysis, and spatial variation of MC can be intuitively detected which will provide vital information for understanding the drying dynamics of tea leaf and optimization of tea leaf drying process.
This study was conducted to evaluate the dominant position of hyperspectral imaging technique in NIR region for mapping spatial distribution of MC in tea buds during dehydration. The results demonstrated that as a promising technology, hyperspectral imaging could achieve the objective of mapping the MC distribution in tea bud during the drying process. In this research, the chemometric method of CARS was employed to select EWs. After that, PLSR algorithm was used to establish the quantitative relationship between the spectral reflectance and measured MC of tea buds. At last stage, the MC of all pixels in tea buds were calculated based on the optimal PLSR model. Meanwhile, the spatial distribution maps were built using a developed image processing procedure. The spatial variation of MC could expose the different MC within tea buds in different dehydrated periods, which applied an approach to kinetic analysis of MC in drying process, and provide important information for optimization of tea processing technic.
In further research, tea products with more types of sample and different geographical locations, ages and times should be taken into account to establish more robust and generate MC determination model, which could give more help for optimization of dehydrated process of agricultural products.
This work was supported by the National Natural Science Foundation of China (Program No: 61705188), Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2017JQ3008), China Postdoctoral Science Foundation (2017 M613218), Shaanxi Postdoctoral Science Foundation, the Fundamental Research Funds for the Central Universities (2452017125), the Doctoral Scientific Research Foundation of Northwest A&F University (2452016157), and the Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, P. R. China.
Conflict of interest
The authors have declared no conflict of interest.