Sequential Classification of Hyperspectral Images

Hyperspectral imaging has become increasingly popular in applications such as agriculture, food, and environment. Rich spectral information of hyperspectral images leads to new possibilities and new challenges in data processing. In this chapter, we consider the hyperspectral classification problems in consideration of sequential data collection, which is a frequent setting in industrial pushboom imaging systems. We present related techniques including data normalization, dimension reduction, classification, and spatial information integration and the way to accommodate these techniques to the context of sequential data collecting and processing. The propose scheme is validated with real data collected in our laboratory. The methodology of result assessment is also presented.


Introduction
Hyperspectral imaging is a continuously growing area and has received considerable attention in the last decade. Hyperspectral data provide a wide spectral range, coupled with a highspectral resolution. These characteristics are suitable for detection and classification of surfaces and chemical elements in the observed images. Rich information in spectral dimension provides solutions to many problems that cannot be solved by traditional RGB imaging or multispectral imaging.
Applications include land use analysis, pollution monitoring, wide-area reconnaissance, and field surveillance, to cite a few. Typical cases related to food quality, agriculture, and environment include as follows: 1. Food safety plays an important role in our daily life. We often use a combination of appearance, hand-feel, and smell of the product to make a judgment of the quality of fruits or vegetables. But it is not enough to judge if there are abnormalities, deformations, or even visible defects in the fruit or vegetable. Awareness about food safety has exemplified the requirement for a rapid and accurate hyperspectral detection system [1].

2.
Precision agriculture is a farming management concept based on observing, measuring, and inter and intrafield variability in crops. Precision agriculture using hyperspectral remote sensing is acquired and processed to derive maps of crop biophysical parameters, to measure the amount of plant cover, and to distinguish between crops and weeds [2].
3. Due to the pressures of over consumption, population, and technology, the biophysical environment is being degraded, sometimes permanently. Many of the earth's resources are on the verge of exhaustion because they are influenced by human impacts across many countries [3]. Many attempts are made to prevent damage or manage the impacts of human activity on natural resources. Hyperspectral classification used in resource recovery can make it rapid and efficient.
One of the most important tasks of hyperspectral image processing is image classification. Rich spectral information of hyperspectral image provides the possibility to classify materials that are difficult to be distinguished by other imagery techniques. In the past decades, different kinds of hyperspectral classification methods have been proposed [4][5][6][7][8][9]. However, the existed methods may not be suitable for a real-time material sorting system. Pushboom imaging systems are frequently used in industry sorting, such a system collects columns of an image one after another in a sequential manner (see Figure 1). It is thus necessary to design a framework for online classification tasks and accommodate convectional algorithms to the sequential processing setting.
In this chapter, we present a scheme of sequential classification for hyperspectral sorting systems. This scheme can be used in various fields, such as measuring food quality and resource recovery. We present the main techniques in this sorting and processing, including data normalization, dimension reduction, classification, and spatial information integration and the way to accommodate these techniques to the context of sequential data collecting and processing.
The rest of this chapter is organized as follows. In Section 2, we propose the main steps of sequential hyperspectral classification processing system. In Section 3, detailed methods are Figure 1. Sequential hyperspectral data collecting and processing by a pushboom system. Hyperspectral camera captures data x k at time instant k, which is one of the sequential columns of the entire image, y k is the result after processing (classification label in this case).
presented for sequential hyperspectral image processing and sorting. Experiment results are then discussed in Section 4. Section 5 concludes the chapter.

System overview
Before proceeding to elaborate the proposed sequential hyperspectral image classification method, we first present the notation and the data model used in this work. We consider that the hyperspectral image under study has h pixels in column and w pixels in row, where h is a fixed size that is determined by the spatial resolution of the camera, and w actually increases toward infinity with the moving of the pushboom system. Each pixel consists of a reflectance vector consisting of p contiguous spectral bands. Then, let • N ¼ h Â w be the total number of pixels.
• X k i; j ð Þ represents a pixel, where the subscript k denotes the index of the spectral band, i and j represent the location of pixel in the spatial domain.
The data collecting and processing of a real-time hyperspectral sorting system consist of the following major steps.
The hyperspectral data used in this work set are collected by the system of GaiaField in our laboratory. The parameters of the used system are provided in Table 1. Our online processing is based on windowed columns. After collecting each column, we use this column with several previous ones to form a window and perform data processing steps within this window. Black-white normalization is used for basic data normalization. Techniques of PCA and hyperspectral decorrelation of fuzzy sets are used for dimension reduction [10]. Typical techniques such as GML and SVM are presented for material classification. Considering the positive effect of spatial information on processing results [11], we also propose to integrate spatial dimension and spectral dimension to achieve an enhanced classification accuracy. Finally, classification accuracy is characterized by metrics such as confusion matrix and κ coefficient. Details of the used techniques and results will be provided later.

Data preprocessing
Data preprocessing steps include basic data normalization and spectral decorrelation. They are performed one after another as described later.

Basic data normalization
An important preprocessing is the so-called black-white calibration. This calibration is carried out by recording an image for black and another for white, as described below, to remove the effect of dark current of the camera sensor and avoid the uneven light intensity of each band.
At an offline phase, the black image (B) is acquired by turning off the light source and covering the camera lens with its cap. The white image (W) is acquired by adopting a standard white ceramic tile under the same conditions as the raw image. Then, image correction is performed by [12], where I is the hyperspectral image after normalization, I 0 is the original hyperspectral image that is captured in our laboratory, B is the black reference image, and W is the white reference image.

Data dimension reduction
The high-spectral resolution of hyperspectral data enables us to classify materials that are undistinguishable with conventional methods. However, a large number of spectral channels result in difficulties in processing in terms of classifier training (Hughes phenomenon) and computational burdens. Data dimension reduction can be performed due to the above facts and existence of information redundancies across bands.

PCA
PCA is one of the most popular methods for data dimension reduction. PCA computes a linear transformation for high-dimensional input vectors, and this transformation maps the data into a low-dimensional orthogonal subspace. For simplicity, we assume that the data samples have zero mean. Otherwise, we can centralize the data by subtracting the mean The principle analysis is based on the eigenstructure of the data. We therefore calculate the covariance matrix of Y and perform the eigendecomposition on this matrix. The ith eigenvector of matrix Y is denoted by a i with associated eigenvalue denoted by λ i .
To reduce the dimension of data, we select an appropriate number of eigenvectors a i corresponding to the value of eigenvalues λ i from large to small, to form the representation coefficient matrix A [13].
where Z is the hyperspectral image data after decorrelation.

Fuzzy sets
Using fuzzy sets to decorrelate the hyperspectral data is based on a priori knowledge that the adjacent wavelengths of the spectrum are more correlated than the distant pairs, as the spectral information varies smoothly and successively. We consider sampling spectral characteristics by a group of adjacent spectral bands, which can be obtained by dividing the spectra in separate groups to attain the desired spectral selectivity. We propose separating the hyperspectral data into a number of M fuzzy groups where each group covers a range of wavelengths [14]. The contribution of each wavelength is modeled by a membership function Mf i λ ð Þ. We use a triangular function as the membership function, shown in Figure 2.
where λ i is the central wavelength value of the fuzzy set i, and D is the distance of central wavelengths of two adjacent fuzzy sets.
The spectral wavelengths of all points have different membership degrees in different fuzzy sets. Each wavelength has different degree of membership in two adjacent fuzzy sets, while the membership degree in the remaining fuzzy sets is 0 ( Figure 3).  The energy of each fuzzy set is calculated by weighting the intensity of each spectrum element using membership functions associated with each fuzzy set, i.e., where X i is the energy of each fuzzy set, and L λ ð Þ is the intensity of each spectrum element.
Based on the energy values of each fuzzy set, we can obtain useful information about the spectral characteristics. In this way, each hyperspectral image pixel can be defined by a vector containing the energy values of the M fuzzy sets as

Material classification
In this section, we present the algorithm to classifier/sort the captured data using features (data of reduced dimension) extracted by PCA or fuzzy set method. We first review these two popular classification methods in a general manner. Then we introduce how to incorporate spatial information into the classification. Finally, sequential processing with window-based method will be discussed.

Gaussian maximum likelihood classification
Spectra of distinct material of hyperspectral data form data clusters in a space with the dimension of the feature, and we assume that the data features of each material approximately follow a multivariate normal distribution. To be specific, data features of a material i and the p dimension probability density function in form of: p ðX ω i j Þ ¼ 1 2π where μ and Σ are the mean vector and the covariance matrix, respectively. i denotes the label of class [15]. Each pixel in the hyperspectral image is labeled as the class that achieves maximum probability.

Support vector machine
SVM is one of the most effectively and widely used methods in statistical learning. SVM aims to find the best tradeoff of model complexity and learning ability with limited sample information. SVM can effectively solve the Hughes phenomenon caused by insufficient samples in hyperspectral classification.
The goal of training algorithm is to design an optimal hyperplane. The training principle of SVM is to find a linear optimal separating hyperplane [16]. Let x be the input pixel vectors satisfying This method constructs a hyperplane that maximizes the margins between classes, specified by a (usually small) subset of the data that define the position of the separator. These points are referred to as the support vectors [17]. The decision function is as follow: where α i is the ith Lagrange coefficient, y i is the corresponding classification label, x i is the ith support vector, x is the input pixel vector, N is the number of support vector, and b is the decision offset coefficient. For two-class hyperspectral classification, f x ð Þ takes value of either 1 or 0, suggesting the class that the current pixel belongs. For multiclass classification, we can use one versus one, one versus rest, hierarchical support vector machine or other strategy to obtain the multiclass label.
Sometimes, data cannot be separated by a linear classifier. Therefore, kernel methods are used to map data from the original input space to a higher dimension space. Thanks to the kernel trick, we only need to know the form of the inner product in that space instead of using the explicit map [16]. Popular kernel functions include as follows: Linear kernel: Polynomial kernel: where q is the polynomial order.
Radial basis function kernel: where σ 2 is kernel bandwidth.
Sigmoid kernel: for appropriate values of v and c, so that Mercer's conditions are satisfied [16].

Incorporating spatial information
Conventionally, hyperspectral data classification algorithms are proposed based on spectroscopic viewpoint, and they ignore the spatial information that embeds in neighboring pixels [18]. Integration of spatial and spectral information may improve the processing performance. We propose to combine spatial dimension and spectral dimension information to improve the classification accuracy. The proposed method investigates the spatial information based on the connection component labeling in the following. We generate the mean image by averaging data after dimension reduction over spectra bands. A component labeling algorithms then applied to the binarized mean image. In our system, if an object is marked by connected component labeling and over 60% pixels are labeled as a class, we consider that all pixels within this connected region actually belong to the associated material. The classification accuracy will be improved using this strategy.

Sequential processing
We use a sliding window to assemble the acquired hyperspectral data, whose columns are collected sequentially one after another. The use of a sliding window facilitates to incorporate the spatial information in processing. The width of the sliding window should be determined by considering the data acquisition rate, data processing speed, and spatial correlation of the observed scenario. In our system, the width of the sliding window (L) is set to 15. Our hyperspectral images are captured by a pushboom system where columns of images are collected sequentially one after another. After collecting each column, we use this column with several previous ones to form a sliding window and perform data processing steps within this window. Let L be the width of the sliding window, and we set L ¼ 15 in our experimental ( Figure 4).

Experimental results
We collect the hyperspectral data with our pushboom system of Gaia. The images are acquired in the 400-1000 nm wavelength range, with a spectral resolution of 7 nm, for a total of 128 wavelengths (p ¼ 128). Their image resolutions h and w are 650 and 348 (650 Â 348), respectively. The hyperspectral data include four kinds of fruits: tomato, jujube, lemon, and orange.
In this study, we use a sliding window of size 15 for online processing of data. Twenty-three sequential hyperspectral images are extracted for classification. The datasets captured are divided into training and testing sets, where 300 pixels of each material are used for training and 30,603 pixels are used for testing.
After data preprocessing, we select 300 pixels of each material from the training set as sample points to form a hyperspectral image. The pixels of the image are converted into row vectors by row or column to form a two-dimensional matrix, which is used for data reduction. The operation of the test set is the same as that of the training set.
After the PCA transformation, the eigenvalue distribution is shown in Figure 5. This scree plot shows that the first eight factors explain most of the variability. The remaining factors explain a very small proportion of the variability and are likely unimportant. We select the principal component, which takes 99% of the eigenvalues, as the data after dimensionality reduction. For fuzzy-set data reduction, we fold 128 bands with a triangular window of length 32, and then we sample the data using at each 16 points, so that the data dimension also reduces to 8. We use eight-connected component labeling method to remove the background of data after dimension reduction.
We then study the classification results of GML principle and SVM. We classify the data obtained from dimension reduction and background material removal (see Figures 6 and 7). The result of classification with spatial information (connected region labeling) is shown in Figure 8. Sequential Classification of Hyperspectral Images http://dx.doi.org/10.5772/intechopen.73160

Confusion matrix
A confusion matrix is a table that is often used to describe the performance of a classifier on a set of test data for which the true values are known. It compares the classification result with the reference image, and we need to determine the labels of each point in the reference image in the classified image. The confusion matrixes of our experiment are shown in Table 2.
where m ij shows pixels should belong to class i, which is wrongly assigned to class j, and k is the class number of the classification results (Figures 9 and 10).  κ is performed by adopting the following equation:  where m iþ is the sum of the line i in the confusion matrix, and m þi is the sum of the column i in the confusion matrix.
κ of GML based on PCA dimensionality reduction is 98.93%, and κ of SVM is 92.55%. κ of GML based on fuzzy-set reduction technique is 98.56%, and κ of SVM is 74.69%. From the results of κ, we can see that the classification based on PCA is better than fuzzy sets, GML is better than SVM, and GML based on PCA is the best method for sequential classification of hyperspectral images.

Other metrics
Other metrics include classification accuracy, product's accuracy (PA), and omission errors (OEs).   Classification accuracy indicates the correct rate of the classifier, as illustrated in Eq. (16).
PA is used to indicate the rate of the classification result that is correctly classified, as illustrated in Eq. (17). User's accuracy is used to indicate the rate of the pixels that are correctly divided into class I to the total number of pixels that are divided into I classes, as shown in Eq. (18).
OEs represent the number of pixels in class I that is incorrectly assigned to other class, as shown in Eq. (19). Commission errors (CEs) indicate the percentage of other class pixels that are incorrectly divided into class I, as illustrated in Eq. (20).
Classification accuracy of GML based on PCA dimensionality reduction is 99.26%, and classification accuracy of SVM is 94.80%. Classification accuracy of GML based on fuzzy-set reduction technique is 99.00%, and classification accuracy of SVM is 82.03%. From this evaluation and Table 3, GML based on PCA dimensionality reduction is the proposed solution for sequential classification of hyperspectral images.

Conclusion
The major objective of this chapter is to build a sequential hyperspectral classification method for an industrial material sorting system. We propose hyperspectral images captured by a pushboom system where columns of images are collected sequentially one after another to get sequential hyperspectral images. PCA and fuzzy sets are used for data decorrelation. We study the GML and SVM classification with the data obtained from dimension reduction and background material removal and carry out the performance analysis. The results show that the accuracy rate of GML based on PCA dimensionality reduction is 99.26%, and the accuracy rate of SVM is 94.80%. The accuracy of GML based on fuzzy-set reduction technique is 99.00%, and the accuracy rate of SVM is 82.03%. After combing the spatial and spectral information, the accuracy of classification of hyperspectral images can be 100%.
The designed framework shows several advantages in terms of processing speed, efficiency, and accuracy. It may play an important role in industrial material sorting for agricultural products, food, and industrial waste sorting.