An Unsupervised Classification Method for Hyperspectral Remote Sensing Image Based on Spectral Data Mining

© 2012 Wen and Yang, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. An Unsupervised Classification Method for Hyperspectral Remote Sensing Image Based on Spectral Data Mining


Introduction
Hyperspectral remote sensing is one of the most significant recent breakthroughs in remote sensing. It obtains image in a large number (usually more than 40), narrow (typically 10 to 20 nm spectral resolution) and contiguous spectral bands to enable the extraction of spectral information at a pixel scale, so it can produce data with sufficient spectral resolution for the direct recognition those materials with diagnostic spectral features [1]. Usually classification method of hyperspectral remote sensing data are divided into two categories [2]: using subpixel classification techniques [3] and spectral matching techniques [4]. In the former, the images should not need to atmospheric correction, however, due to higher dimension of hyperspectral image, it will lead to dimensionality disaster and Hughes phenomenon [5,6] which refer to the fact that with the number of spectral bands increased the sample size required for training set grows exponentially. The solution methods usually are increasing sample size, thus this will cost a lot of human and material resources. Another simple but sometimes effective way to solve this problem is dimension reduction of hyperspectral data, but some useful information will be lost. Furthermore, it is hard to solve mixed pixels. In the latter, matched filtering method is successfully used in information extraction from hyperspectral remote sensing image. It classifies by computing the similarity of the pixel spectrum and the reference spectrum, and it needs no sample data but the image data should be atmospheric corrected beforehand. These methods based on the hypothesis that dark currents of the sensor and path radiation are removed and all spectra data have been calibrated to apparent reflectance. However, it is only the ideal condition for these effects are hard to be removed completely, so some mistakes will be caused due to atmospheric influence, especially for low reflectivity ground objects. This chapter proposed an unsupervised classification for hyperspectral remote sensing image. It can effectively extract low reflectivity ground objects such as water or vegetation in shadowed area from hyperspectral remote sensing data using spectral data mining. Firstly, extracting more than 40 endmembers from the hyperspectral image using Pixel Purity Index (PPI) and calculating the spectral angle between the pixel spectra and each endmember spectra, the pixel was assigned to the endmember class with the smallest spectral angle. Then, endmember spectra were clustered based on K-mean algorithm. Finally, pixels in the same K-mean result class were combined to one class and the final classification outcome was projected and outputted. Comparing the classification result and field data, they are in accord with each other. This method can produce the objective result with no artificial interference. It can be an efficient information extraction method for hyperspectral remote sensing data.

The study area
The study area is located at Heqing county in Yunnan province in southwest of China (25º38'N -26º30'N; 99º58'E -100º15'E) (

Remote sensing data
The image investigated in this chapter was obtained by Hyperion sensor boarded on EO-1 satellite in November 11, 2004, and it covers the 0.4 to 2.5 micrometer spectral range with 242 spectral bands at roughly 30m spatial resolution and 10nm spectral resolution over a 7.5 km wide swath from a 705 km orbit. The system has two grating spectrometers; one visible / near infrared (VNIR) spectrometer (approximately 0.4-1.0 micrometers) and one short-wave infrared (SWIR) spectrometer (approximately 0.9-2.5micrometers) (figure 2). Data are calibrated to radiance using both pre-mission and on-orbit measurements. Key Hyperion characteristics are discussed by Green et al. [8]. The image has a total of 242 bands but only 198 bands are calibrated. Because of an overlap between the VNIR and SWIR focal planes, there are only 196 unique channels [8,9]. Due to water vapor absorption, some bands nearby 0.94, 1.38 and 1.87 micrometers also can not be available. The rest 163 bands can be used in research. Some pre-processing steps are necessary before using image. Firstly, some bad pixel value in original image were replaced by the means of two pixels value beside its two sides; then the image was radiometrically corrected using calibration coefficient; at last, the image was atmospheric corrected using FLAASH model [10]. Figure 3 and figure 4 shows different ground objects spectra before and after atmospheric correction. Shapes of different ground objects spectra after atmospheric correction are similar with shapes of standard laboratory spectra for the same ground object type.

Spectral angle mapper (SAM)
Spectral Angle Mapper (SAM) algorithm is successfully used in matched filtering based on hyperspectral remote sensing image [4,[11][12][13][14][15][16][17][18]. It computes the "spectral angle" between the pixel spectrum and the endmember spectrum. When used on calibrated data, this technique is comparatively insensitive to illumination and albedo effects. Smaller angles represent closer matches to the reference spectra. The result indicates the radian of the spectral angle computed using the following equation: Where m=the number of bands.    Usually, a constant threshold is assigned firstly. When  is lower than the constant threshold, that means the pixel spectrum and the reference spectrum are similar with each other, and then assigned the pixel into the reference spectral class.

Extracting endmembers
The reference spectra can be selected from Spectral libraries, acquired by the handhold spectroradiometer, or extracted from the image itself. The commonly used technique is to extract the reference spectra from the image, for this method has the advantage that endmembers were collected under similar atmospheric conditions and pixel scale. A variety of methods have been used to find endmembers in multispectral and hyperspectral images. Iterated Constrained Endmembers (ICE) is an automated statistical method to extract endmembers from hyperspectral images [19]. [20] found a unique set of purest pixels based upon the geometry of convex sets. Probably Pixel Purity Index (PPI) is the most widely used algorithm [21]. In this chapter, PPI was used to find the most spectrally pure pixels in hyperspectral images as reference spectra. Firstly, the image was applied to a dimensionality analysis and noise whitening using the Minimum Noise Fraction (MNF) transform process [22,23]. Then, the data are projected onto random unit vectors repeatedly and the total number of each pixel marked as an extreme pixel is noted. At last, the purest pixels in the scene are rapidly identified. In this chapter, 48 endmember spectra were extracted from hyperspectral image.
SAM was used to match each pixel spectrum to 48 endmembers. Figure 5 (b) is the classification result using the constant threshold. Different color represents different classes and black refer to unclassified classes. Comparing figure 5(a) with figure 5(b), it is shown that most vegetation in shadowed region and water are classified into unclassified classes, so some vegetation and water information are lost. One of reason is that the radiance of low reflectivity ground object such as water and vegetation in shadowed area are severely weakened by atmosphere influence when they arrive at the satellite. As is shown from figure 4, the digital numbers value of the vegetation in unshadowed area are 2 times higher than theirs in shadowed region. After atmospheric correction, the reflectance of the vegetation in shadowed area are obviously lower than theirs in unshadowed region, and reflectance in some spectral range (0.46-0.68 and 1.98-2.37 micrometers) are near zero. However, comparing the shape of vegetation spectra in shadowed and unshadowed area, they are similar with each other, so it is possible to identify the low reflectance ground object using SAM algorithm for it is relatively insensitive to illumination and albedo effects, but the constant threshold is not suitable. In this chapter, 48 endmember spectra were used as reference spectra, so the hypothesis that 48 endmember spectra include all land cover type is reasonable. The spectral angle of every pixel was calculated using with all endmembers, and the pixel belongs to the class which has the smallest spectral angle. Figure 5(c) is the processed classification result using adjustable threshold. Comparing with figure 5(b) and figure 5(c), they are the same except unclassified pixel in figure 5(b) which belongs to the certain land cover type in figure 5(c). This method improved the constant threshold SAM classification result, so it is more effective than using constant threshold.  Figure 5(c) contains so many classes, and some classes may belong to the same class. In this chapter, these classes were clustered using K-mean algorithm [24], which is a straightforward and effective algorithm for finding clusters in data. It classifies pixels based on features into k centroids of group, one for each cluster. These centroids shoud be placed as much as possible far away from each other, then take each point belonging to a given data set which associate to the nearest centroid. The algorithm proceeds as follows. Firstly, the number of classes which the image should be partitioned into is inputted, and k records are randomly assigned to be the initial cluster center. Then, for each record, find the nearest cluster center. For each of the k clusters, find the cluster centroid, and update the location of each cluster center to the new value of the centroid. Repeat steps until convergence or termination. In this chapter, endmember spectra were clustered using K-mean algorithm and final 5 spectral classed were outputted. Then, classification result using adjustable threshold were merged according the K-mean algorithm result. Final classification result is shown in figure 6. Comparing the classification result and field data, they are in accord with each other.

Results and discussions
Matching filter was maturely used in spectral classification in hyperspectral remote sensing image, however, due to atmospheric effect, it is hard to extract low reflectivity ground object. This chapter proposed an unsupervised classification method. Firstly, the hyperspectral remote sensing image was atmospherically corrected. Accuracy atmospheric correction is the key to the classification. Then, endmember spectra were extracted using PPI algorithm, and the image was classified using SAM. Traditionally SAM algorithm used constant threshold. This chapter improved and used adjustable threshold, and the pixel belong to class which has the smallest spectral angle. Finally, the endmember spectra were clustered based on K-mean algorithm and classes were combined according to the K-mean algorithm result. The final classification map was projected and outputted. It is an effective classification method especially for hyperspectral remote sensing image. Users also can adjust the endmember and classes number according to their applications.