Land Cover/Land Use Mapping Using Soft Computing Techniques with Optimized Features

The chapter discusses soft computing techniques for solving complex computational tasks. It highlights some of the soft computing techniques like fuzzy logic, genetic algorithm, artificial neural network, and machine learning. The classification of the remotely sensed images is always a tedious task. So, here we explain how these soft computing techniques could be used for image classification. Image classification mainly concentrates on the feature ’ s extraction process. The features extracted in an efficient manner improve classification accuracy. Hence, the different kinds of features and different methods for these extractions are explained. The best extracted features are selected using genetic algorithm. Various algorithms are shown and comparisons are made. Finally, the results are verified using a hypothetical case study.


Introduction
The remote sensing (RS) image has millions and millions of details hidden into it. The interpretation of RS images thus leads to a variety of new improvements in our daily life. Since RS image coils a lot of areas in a single image, intensive care has to be taken while handling each and every pixel [25,41,42,61]. Also, extraction of features plays an important role. Using those features, a particular pixel can be classified easily [32,33,46,55,56]. Deciding which features we are going to extract is important, and it has to be done based on the application and type of image.
The classified output has several uses in civil engineering. It is also useful in planning for large airports, industrial estates, and harbors and the construction of dams, bridges, and pipelines. It also provides valuable data for the process and design of roads and highways. The application areas also extend to extracting building footprints, detecting roads, and outlining urban changes from a pair of images taken at different dates. It also extends to the field of forest investigation, water management, and disaster management.
Similarly, the interpretation of RS images has many applications [34]. They include the study of forest where investigating the landscape of forest area can avoid deforestation and degradation processes. Forest land cover describes the physiographical characteristics of the environment from bare rock to tropical forest. So, classifying these will result in the understanding of the variety and type of land cover. Another important advantage with forest land cover is identification of very specific habitats and distribution of both individual species and species assemblies. In the case of urban planning, the year-wise RS images are analyzed to find whether the occupation is growing in the right place. While planning the urban area utilization, the government may plan with the RS image, so that the road construction plan, water pipeline construction plan, and power supply connection plan can be made easy. If in case our urban occupation is happening in the vegetation area, then it should be taken care of and constructions are to be made in other areas.
RS images are also used in water management system to clearly display sediment pollution and oil spills over water bodies and help to monitor the quality of water resources. They are also used in disaster management. In case of natural disaster, risk-prone areas are detected, and risk management is undertaken. When sudden natural disaster happens, it is difficult for humans to collect data at that moment, and so using RS technology, we can handle the situation.
The application area also covers the hazard management. As water-related natural hazards occur due to a number of factors, such as structure, drainage, slope, land use, road network, etc., they must be taken into account when assessing the region's instability and potential hazard risks. It is essential because proper hazard management can help us take timely measures to prevent flooding and following landslides.
The chapter is organized in the following way. Section 2 explains the feature extraction process, Section 3 explains the feature subset selection, Section 4 explains about feature classification, and Section 5 concludes the chapter.

Feature extraction
To classify/segment the different objects in a digital image, features are of much important. Texture feature is one such important feature. Texture is more useful as it is expressed in terms of smoothness, coarseness, fineness, linearization, granularity, and randomness. Analysis of texture requires the identification of features that will differentiate the textures for classification, segmentation, and recognition [17-19, 22, 23, 26, 35-37, 43]. Scale is another important property of texture. The appearance of texture changes when it is viewed at different resolutions. Remotely sensed images are analyzed using gray-level co-occurrence features, features extracted from Gabor filter. There are many methods for extracting features.

Extraction of features using wavelet packet transform
The main reason for the usage of such wavelet-based multi-resolution analysis [7-10, 12, 27,29,30,39] in remote sensing is that the resolution of the remotely sensed imagery may be different in many cases and it is important to understand how information changes over different scales of imagery.
The work in [1] proposed a system in which statistical and co-occurrence features of the input patterns are first extracted, and those features are used for classification [11-13, 20, 38, 48]. The continuous wavelet transform of a 1-D signal f(x) is defined using Eq. (1): The mother wavelet y has to satisfy the admissibility criterion to ensure that it is a localized zero mean function [39]. Typically, some more constraints are imposed on y to ensure that the transform is non-redundant and complete and constitutes a multi-resolution representation of the original signal. This results in a good realspace transform implementation using quadrature mirror filters. The convolution is performed, and the results with the low-pass filter are called approximation image, and the results with the high-pass filter in specific directions are called detail images. In earlier processes, the image is split into an approximation and detail images. The approximation is then split itself into a second level of approximation and details. For a n-level, the signal decomposition can be represented using Eq. (3): where "*" denotes the convolution operator, "↓2,1" denotes the downsampling along the rows (columns), A0 = I is the original image, and H and G are low-pass and high-pass filters, respectively. I(x, y) is the original image. A n is obtained by low-pass filtering and is the approximation image at scale n. The detail images D ni are obtained by band-pass filtering in a specific direction (i = 1, 2, 3 for vertical, horizontal, and diagonal directions, respectively) and thus contain directional detail information at scale n. The original image, I, is thus represented by a set of subimages at several scales: {A n , D ni }.
The wavelet packet decomposition offers a richer signal analysis. Here, the split happens for both detail image and approximation image. This results in a wavelet decomposition tree. The details present in detail images are helpful in analyzing texture and discrimination. To characterize a texture, the features derived from detail images are used. The following section discusses the way in which the features from wavelet transformed image to be used for classification.
The filter choice and its order may vary for each application. Here, two levels of wavelet packet decomposition with different wavelet families are done and shown in Figure 1. There is no need to perform a deeper decomposition because, after the second level, the size of images become too small and no more valuable information is obtained. Sixteen wavelet coefficient matrices containing texture information are produced from the second level of decomposition.
In texture training, the known texture images are decomposed using DWPD. To create feature database, a set of WPSF, such as mean and standard deviation, is calculated to form the original image, and a set of wavelet packet co-occurrence features and spectral feature NDVI is calculated using Eqs. 4-11 and Eq. (12), respectively. These features are saved for further use in texture classification. where The input Madurai LISS IV image is shown in Figure 2. The procedure for classification is explained in the later content, but the results are presented here for better understanding. The classification of LISS IV Madurai image is done with wavelet filters such as Daubechies (DB2), symlet (Sym2), Coiflet (Coif2), and biorthogonal (Bi-or2.2) and is shown in Figure 3(a)-(e).

Extraction of deep features
Deep feature learning plays an important role in image classification. In order to extract different features automatically, the convolution neural network (CNN) is   utilized [2]. The architecture of CNN is shown in Figure 4. In convolution layer, the features are extracted using different filters to input image. The ReLU layer handles the output from convolutional layer by figuring out the negative pixel value into zero, retaining the dimensionality of the matrix unchanged. Pooling helps in retaining the most important information while reducing the size of feature map. Each training sample is applied with the same processes and thus resulting in different feature sets.

Feature subset selection
Feature subset selection is the process of selecting those features that are most useful to a particular classification problem from all those available. The most popular methods for feature reduction in remote sensing are the use of the principal components transform [6]. The principal components (PC) transformation transforms the original data into a new smaller set, which are less correlated to the first data set. Therefore, a reduced number of new variables represent the information content of the original set. However, although frequently used, the PC transform is not appropriate for feature extraction in classification, because it does not consider the classes of interest, but only the data set. Therefore, it may not produce the optimum subspace for the classification. So, we are utilizing genetic algorithm (GA) for feature subset selection [3, 49-54, 57-60, 63-66].

Genetic algorithms
Computational studies of Darwinian evolution and natural selection have led to numerous models for computer optimization. GAs comprise a subset of these evolution-based optimization techniques focusing on the application of selection, mutation, and recombination to a population of competing problem solutions. Being a directed search rather than an exhaustive search, population members cluster near good solutions; however, the GA's stochastic component does not rule out wildly different solutions, which may turn out to be better. This has the benefit that, given enough time and a well-bounded problem, the algorithm can find a global optimum. This makes them well suited to feature selection problems, and they can find near optimum solutions using little or no prior knowledge.
There are three major design decisions to consider when implementing GA to solve a particular problem. A representation for candidate solutions must be chosen and encoded on the GA chromosome, an objective (fitness) function must be specified to evaluate the quality of each candidate solution, and finally the GA run parameters must be specified, including which genetic operators to use, such as crossover, mutation, selection, and their possibilities of occurrence. Until a satisfactory solution is found, the fitness-dependent selection and application of genetic operators to generate successive generations of individuals are repeated many times.
In the problem of feature selection, feature subsets are represented as binary strings where a value of 1 will represent the inclusion of a particular feature in the training process and a value of 1 will represent its absence. Since a chromosome is represented through a binary string, genetic algorithm will operate on a pool of binary strings. The mutation and crossover operators operate in the following way: mutation operates on a single string and generally changes a bit at random. Thus, a string 10,010 may, as a consequence of random mutation, get changed to 10,110. Crossover on two parent strings produces two offsprings. With a randomly chosen crossover position 2, the two strings 01101 and 11,000 yield the offspring 01000 and 11,100 as a result of crossover. If the obtained feature set X using wavelet-based technique contains 45 features for each pixel of the image of size 400 Â 400 pixels, then the feature set X of the data is of dimension 160,000 Â 45, where each column represents the features of the respective pixel in the data. Using GA, the feature set X of size 45 Â 400 Â 400 is mapped into new feature set denoted by Y of size 17 Â 400 Â 400. This reduction in feature set improves the overall execution speed and the classification accuracy [52]. The classification results (for both full feature set and optimal feature set) are shown in Figure 5.
The accuracy assessments are made using accuracy indices, namely, overall accuracy, producer accuracy, user accuracy, and kappa coefficient and are listed in Table 1.

Classification
Using the features obtained, so far the classification is done using the obtained features. We use different classifiers for the classification. The classifier is an algorithm that maps the input data to a specified category.

Classification based on Mahalanobis distance
In this method, the decomposition for test texture image is done using DWPD. In the same manner, another set of features are obtained and compared with the obtained feature values. The class of textures is represented as C, the mean signature of class C is represented with m c , and the Mahalanobis distance is given by If the distance D(i) is smallest, then the test texture image is classified as ith texture [15,21,47]. Features are obtained using many wavelet filters, and it is followed using classification process [14, 28,44]. The overall, user, producer, and kappa accuracy indices obtained for the different wavelet filters presented in Table 2 show that the DB2 wavelet filter gives superior results than the other wavelet filters. Thus, the DB2 wavelet filter will be more useful for land cover/land use mapping.

Classification based on adaptive neuro-fuzzy inference system (ANFIS)
The adaptive network-based fuzzy inference system (ANFIS) is a useful neural network approach for the solution of function approximation problems [4, 31,40,45,62]. To determine the optimal distribution of membership functions, the ANFIS gives the mapping relation between the input and output data. ANFIS architecture consists of both artificial neural network (ANN) and fuzzy logic (FL). The system includes five layers. The node function describes several nodes, which are to be included in the ANFIS layer. The inputs of present layers are obtained from the previous layers. For example, consider two inputs (x, y) and one output (fi) are used in this system. The rule base contains fuzzy if-then rules. Thus, the two rules are: • Rule 1: If x is A1 and y is B1, then z is f1(x, y).
• Rule 2: If x is A2 and y is B2, then z is f2(x, y). where x and y are the inputs and A and B are the fuzzy sets fi (x, y). The feature extraction is done using DB2 wavelet filter, and the optimum features are obtained using GA [16]. Then the classification is done using GA with neural network and GA with ANFIS, and the results are shown in Figure 6. Based on the classified output, it is clearly understood that the GA and ANFIS shows the better classification.

Classification using CNN
The classification using CNN is done using the deep features obtained from the training phase of CNN [2,5,24]. In training, it carries out the predefined process with one or multiple layers. In a fully connected layer, every neuron is connected to every other neurons of previous layer. Softmax is the final layer and it calculates the probability value. The higher probability becomes the output. After training, the system will be able to classify the image automatically without human intervention. The classification is done for Vaihingen city and the results are displayed in Figure 7.

Classification using multilayer perceptron layer
The multilayer perceptron (MLP) layer realizes intelligent classification using features from the wavelet layer. The training parameters of the MLP are shown in Table 3. These parameters were selected to give best performance, after several experiments, such as the number of hidden layers, size of the hidden layers, value of the moment constant and learning rate, and type of the activation functions.

Genetic algorithm
In GA, the selection of wrong fitness value may affect the solution of the problem. Other parameters like population size, mutation and crossover also plays  an important role in providing solution. GA belongs to a non-deterministic class of algorithms. The optimal solution we get from GA may vary each time we run our algorithm for the very same input data.

Convolutional neural network
CNN requires a lot of training. Also, it requires a lot of data sets for training. A convolution is always a slower operation. Deeper the network, the longer is it's processing time.

ANFIS
Defining the membership function remains a difficult task.

Field survey
The results of the entire work are verified with the help of the ground truth. Ground truth is the process done onsite, in which a "pixel" on a satellite image is compared to what is there in reality. It is done in order to verify the contents of the "pixel" on the image. For an image of 400 Â 400 size, we have taken 500 points as ground truth data. By performing field visit, these data are collected. The outcome of each method is verified with those points.

A hypothetical case study
A hypothetical case study is presented to show the application of land cover/land use mapping in real-life scenario. Assume the XXX company wants to plan their production center construction in Madurai city. For the production centre to be established, large area is needed and thus unoccupied areas in the city have to be investigated. Also, it sends out waste material that is toxic and should not be present in the urban areas. The products, which the company produces, are sent to other parts of the country and some are exported. So, road routes also have to be checked. So, initially a place has to be selected and a plan to be made accordingly. In order to plan the construction, it acquires the satellite image of Madurai city. Then the features are obtained using wavelet feature extraction method, and the classified output is obtained using adaptive neuro-fuzzy inference system classification. The classified image can be clearly understood and can be given to the construction planning team for their further processing. Here, also in addition if the PAN image of the Madurai city and MS image of the Madurai city are fused and then if classification is performed, it would yield still better classification results.

Conclusion
The chapter focused on the methods used to obtain the perfect classification of the RS images. It discusses various methods used for feature extraction. Different feature extraction methods are discussed. After feature extraction, the number of obtained features is reduced using the feature subset selection methods. The best features are considered and the features which contribute less are neglected. The optimal features are thus taken into account for the classification process. The classification also discusses different techniques through which efficient results are obtained. The methods are implemented using LISS IV image of Madurai city. The classified outputs are shown wherever necessary, and accuracy assessments are also calculated. Thus, the chapter gives overall idea for handling RS image using optimal features.