Analysis of Land Cover Classification in Arid Environment: A Comparison Performance of Four Classifiers

Arid environment is a dry landscape or region that received an extremely low amount of precipitation. Arid areas are located where vegetation cover is sparse to almost nonexistent. Almost one third of earth land surface is arid or desert. Over desert areas, a number of land cover patterns can be observed. One example is given here for the Arabian Peninsula. The located area can be found in Fig. 1. This pattern does not correlate with vegetation; the area is extremely arid with little or no vegetation. In addition, specific land cover is defined as the observed physical layer including natural and planted vegetation and human constructions, which cover the surface of the Earth. Land cover classification is a tool that fills an important informational niche for natural resource managers, decision-makers, and stakeholders. It serves to categorize natural ecosystems, managed crops, and urban areas. As a general form, land cover classifications provide the elemental information to appraise the impact of human interactions within the environment and to assess scientific foundations for sustainability, vulnerability and resilience of land systems and their use

The Advanced Land Observation Satellite (ALOS) has been operating since January 24, 2006.The mission objectives of ALOS are cartography, disaster monitoring, etc.In particular, such geographical information as elevation, topography, land use, and land-cover map is necessary basic information in many practical applications and research areas.To achieve these objectives, ALOS has three mission instruments: two optical instruments, which are Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM), the Phased Array type L-band Synthetic Aperture Radar (PALSAR) and Advanced Visible and Near-Infrared Radiometer type 2 (AVNIR-2) (Tadono et al., 2009).But we only concerned with AVNIR-2 sensor for this article.AVNIR-2 has four spectral bands with about 10 m of instantaneous field of view (IFOV), 70 km (consists of 7100 pixels) of FOV, and a mechanical pointing function (by moving mirror) along the cross-track direction (+-44•) for effective global land observation (Murakami et al., 2009).One of the purposes for this sensor is to provide land cover and land-use classification maps for monitoring at regional levels.The instrument, however, does not have SWIR capabilities (Wulder et al., 2008).The information pertaining to the sensor can be found in Table 1 while ALOS satellite with their three instruments is given in Fig. 2

Classification methodology
To begin the processing of raw satellite data, remote sensing images were involved in three stages in order to complete this project.The stages are data pre-processing, image classification and data analysis as shown in Fig. 3.

Image preprocessing
The application of raw remote sensing images for spatial analysis requires several preprocessing procedures.These procedures are used in order to subset the images from the original scene, to correct geometric distortion and to remove noise from the image due to error generated by the sensors.In the sub-setting process, the larger images in the original scene have been cut out to a smaller size within the desire area.Meanwhile, geometric correction was done by using second order polynomial coordinate transformation to relate the location of the reference image to the equivalent row and column positions in the ALOS AVNIR-2 images.A total of 23 ground control points were used in this process with 0.45 www.intechopen.com Analysis of Land Cover Classification in Arid Environment: A Comparison Performance of Four Classifiers 121 pixel error was obtained.On the other hand, filtering procedure was used in order to remove or reduce noisy element in the imagery.7x7 low pass averaging filter was selected as a window to smooth the imagery.The filter applying a mathematical calculation using pixel values under selected window and replacing the central pixel with the new value.

Supervised image classification
The aim of the image classification process is to categorize all pixels in an image into their respective classes.Basically, there are two ways in order to perform the classification which are supervised and unsupervised classification methods.In a supervised classification, it requires to train a sufficient number of pixels for each class to create a representative signature.Unlike supervised classification, neither prior knowledge nor training sets are required to produce a classification map in the unsupervised or clustering methods.Therefore, the image can be automatically divided into spectrally distinct classes that still need to be interpreted in terms of land cover classes (Han et al., 2004).According (Cihlar et al., 1998), supervised classification methods are more effective in identifying complex land cover classes compared to unsupervised approaches, if detailed a priori knowledge of the study area and good training data exist.Moreover, the classification results are also influenced by a variety of factors, including availability of remotely sensed data, landscape complexity, image band selection, the classification algorithm used, analyst's knowledge about the study area, and analyst's experience with the classifiers used (Lu et al., 2004).For a given study area, selecting a suitable classifier becomes significant in improving the classification results.A comparative study of different classifiers is necessary to understand which classifier is most suitable for a specific landscape.Hence, four classifiers, ranging from simple MD to complex NN, are analyzed in this article.Different classifiers have their own advantages and disadvantages.Selecting a classifier most suitable for the characteristics of the study area can improve classification results.
The concept of image classification is often implemented based on the fact that the spectral signature of each pixel contains information on the physical characteristics of the observed materials underlying the pixel.By analyzing such information from satellite images we can infer the type of materials associated with that pixel.However, the major problem is that spectral non-homogeneity within a particular type of material or land cover makes the classification of land cover difficult (Ju et al., 2005).Taking into account physical characteristics of Mecca city, we chose to classify here the following land cover features: urban, mountain, land, vegetation, ritual area and shadow. of each class.Although ritual area can be group under urban class, but the authors decided to separate them into a new class due to the special characteristic of the class, thus, need to be appeared in the classified map.Ritual area which includes a grand mosque and a thousand of tents was a holy area for Muslims.Muslims or pilgrims need to visit to these places as part of their religious event during Hajj Season.Meanwhile, although shadow is not a pure land cover type and mostly appear in the mountainous area, the author also decided to separate them into another class due to their spectrally different against mountain.Hence, to classify the images into those six classes, the statistical minimum distance and maximum likelihood techniques representing traditional method and artificial neural network and contextual representing advanced method were applied.The details pertaining to the four classifiers will be explained in the next section.

Data analysis
In this section, the results of all classifiers will be presented.All analysis regarding the performance of four classifiers will be discussed in detail in section 8.

Traditional method
Of the many classifiers, MD and ML may be the most popular due to their simple theory and availability in almost any image processing or GIS software packages.Both of the classifiers also recognised as statistical method.

Minimum distance to mean (MD)
MD is a non-parametric classifier that has no assumption of data sets for features of interest.
It is computationally simple and fast, only requiring the mean vectors for each band from the training data.Candidate pixels are assigned to the class that is spectrally closer to the sample mean.This method does not consider class variability; thus, large differences in the variance of the classes often lead to misclassification (Lu et al., 2004).The minimum distance algorithm allocates a pixel by its minimum Euclidean distance to the center of each class.
The pixel is assigned to the closest class, or marked as unknown if it is farther than a predefined distance from any class mean.Though if a pixel lies on the edge of a class, it might be that the value of the pixel is closer to the mean of a neighbor class and it will be assigned to the neighbor class (Avelar et al., 2009).

Maximum likelihood (ML)
ML is a parametric classifier that assumes normal spectral distribution of data within each class.An equal prior probability among the classes is also assumed.This classifier is based on the probability that a pixel belongs to a particular class.It takes the variability of classes into account by using the covariance matrix; thus, it requires more computation per pixel compare to MD.The ML classifier considers that the geometrical shape of the set of pixels belonging to a class can be described by an ellipsoid.Pixels are grouped according to their position in the influence zone of a class ellipsoid.The probability that a pixel will be a member of each class is evaluated.The pixel is assigned to the class with the highest probability value or left as unknown if the probability value lies below a pre-defined threshold (Avelar et al., 2009).(Aplin et al., 1999;Lu and Weng, 2007).Therefore, the ML classifier needs more training data to characterize the classes than the other methods (Pignatti et al., 2009).

Advanced method
In recent years, many advanced methods have been applied in remote sensing image classification, each of which has both strengths and limitations.We examined two classification methods, the artificial neural network with back propagation algorithm and contextual classification using frequency based approach, for each of the ALOS AVNIR-2 data sets.

Neural network (NN)
Artificial neural networks (NN) are computational systems that inspired from biological neurons, so neurons provide the information processing ability (Khan et al., 2010).NNs, like people, learn by example.NN is configured for a specific application, such as pattern recognition or data classification, through a learning process.In the last decade, NN has gained momentum in remote sensing field due to the good results obtained in many applications.NN models have two important properties: the ability to learn from input data and to generalize and predict unseen patterns based on the data source, rather than on any particular a priori model.Although there are a wide range of network types and possible applications in remote sensing, most attention has focused on the use of Multilayer Perceptron (MLP) networks trained with a back-propagation learning algorithm for supervised classification.Generally, NN require three or more layers of processing nodes: an input layer which accepts the input variables (e.g., satellite image band values) used in the classification procedure, one or more hidden layers which identify internal structure of the input data, and an output layer.The number of nodes (also called processing units or neurons) at the input layer is equal to the dimensionality of the input vector.For the purpose of land cover classification, the number of nodes at the output layer is the same as the number of the classes intended for the classification scheme.In the meantime, the size of the hidden layer can be a crucial question in network design and need to be determined carefully.Nodes between any two consecutive layers are fully connected with connection weights controlling the strength of the connections.The relationship of input -hidden layers and hiddenoutput layer are given by Equation 1and 2 (Sarkheil et al., 2009): (1) (2) where: a i is the input node i of the input layer, b j is the output node j of the hidden layer, W ij is the weight between input and hidden layer, V ji is the weight between hidden and output layer.
The complexity of the MLP network can be changed by varying the number of layers and the number of units in each layer.Hence, the right structures of NN have to be found by experiments.It has been reported by several researchers (Lippmann, 1987;Cybenko, 1989) that a single hidden layer should usually be sufficient for most problems, especially for classification tasks.The major efforts were focused on controlling the complexity of the model in order to avoid a too complex model structure which may lead into an over fitted ANN model (Niska et al., 2010).
The non-parametric neural network classifiers have numerous advantages over the statistical methods, such as no assumption about the probabilistic models of data, the ability to generalize in noisy environments, and the ability to learn complex patterns.Other advantages of NNs are that they can classify data with a smaller training set than conventional classifiers and be more tolerant of noise present in the training patterns (Mather, 1999).

Frequency-based contextual (FBC)
Unlike three methods previously discussed, contextual technique considering both spectral and spatial information in order to perform the classification process instead of depending on spectral component alone (Mustapha et al., 2011).Classification results of spectral data can be improved by taking into account other information into the original image.The simplest way is to incorporate spatial information within the neighboring pixel.Contextual information, or so-called context for simplicity, may be defined as how the probability of presence of one object (or objects) is affected by its (their) neighbors (Tso & Olsen, 2005).
There are many examples of contextual classification approach, but in this present article we www.intechopen.comonly concern with FBC approach.Frequency-based contextual classification of multispectral imagery is performed by using a grey level reduced image and a set of training site bitmaps.
The input layer must be 8-bit data.Any 16-bit and 32-bit data layers should be scaled to 8bits. There

Training areas development
Training is the identification of a sample of pixels of known class membership obtained from reference data.These training pixels are used to derive spectral signatures for classification, and signature statistics are evaluated to ensure adequate separability.Then, the pixels of the image are allocated to the class with greatest similarity to the training data metrics (Alberti et al., 2004).The training stage of a supervised classification is designed to provide the necessary information.The training sites were used to train the supervised classification algorithm for classification process.In remote sensing, the aim of the training stage has typically been the production of descriptive statistics for each class which may then be used in the determination of class membership by the selected classifier (Foody & Mathur, 2006).Obtaining enough training data has been a tough question with land cover applications.Two sets of training data were finally prepared.The first set of data was prepared for the use of the traditional method.Meanwhile, the second set of the training data was used for the advance method.The use of the different datasets for classifying same area by using different classifier will be discussed in section 8.3.(Piper, 1992& VanNiel et al., 2005).In addition, all training and test sample sites were revisited on the ground to confirm accuracy of measurement.

Accuracy assessment
Accuracy assessment is an important aspect of land cover mapping as a guide to map quality.The accuracy assessment sites were used to provide a statistical assessment of the accuracy produced by each of the classification mapping approaches tested for this project.
The accuracy assessment sites were set aside until the map was completed and accuracy assessment was performed.This process insured that the accuracy data were completely independent of the training data (Thomas et al., 2003).
The error matrix is the standard method used to assess classification accuracy.In the error matrix, the column represents the reference data, while the rows represent the classified data (Table 3).It is typical to extract several statistics from the error matrix: overall accuracy, Kappa coefficient, producer's accuracy and user's accuracy.To conduct the accuracy assessment, a total of 500 sample plots, covering different land cover types, were randomly allocated and examined using field data, a SPOT-5 image with 5m in spatial resolution and high resolution of google earth map.Luedeling & Buerkert (2008) used the google earth map as one of their validation method.The sampling pixels used for accuracy assessment were selected using the randomly stratified sampling method.In addition, the test pixels were uniformly distributed in entire image.Table 3. Population error matrix with p ij representing the proportion of area in the mapped land cover category i and the reference land cover category j.
Overall accuracy is the simplest and one of the most popular accuracy measures and is computed by dividing the total correct (i.e., the sum of the major diagonal) by the total number of pixels in the error matrix (Congalton, 1991).Meanwhile, Rosenfield and Fitzpatricklin (1986) identified the Kappa coefficient as a suitable accuracy measure in the thematic classification for representing class accuracy.Its strength lies in the fact that it takes all the elements (diagonal and non-diagonal) of the confusion matrix into consideration, in contrast to the overall accuracy measures which only consider the diagonal element of the matrix.In addition, Two types of thematic errors can be measured in a confusion matrix.They take into account the accuracy of individual categories.One is given by the producer's accuracy, which indicates the proportion of ground base reference samples correctly assigned.It details errors of omission, i.e., when a pixel is omitted from its correct category.
The other error is given by the user's accuracy, which indicates the proportion of data from the estimation map representing that category on the ground.It is a measure of errors of commission, i.e., when a pixel is committed to an incorrect category (Avelar et al., 2009).

Performance evaluation
The six classes-urban, mountain, land, vegetation, ritual area and shadow were classified using four different classifiers, and classification accuracy assessments were conducted (Table 4-7).Performances of each of the classifiers that have been tested will be analyzed based on three factors.In order to make a comparison, the classifiers performance are analyse in term of their classification accuracy, training samples and performance in heterogeneous area.The area of each class estimated through various techniques was compared and evaluated with the corresponding actual area as obtained from the reference data.For lack of additional satellite data, concurrent with the periods of the field surveys, a reference dataset was generated based on the ordered SPOT-5 satellite data and expert knowledge.

Classification accuracy
From the perspective of the classification accuracy, there are four parameters could be discussed which are overall accuracy, kappa coefficient, user's and producer's accuracies (analysis per class).These parameters can be calculated from error matrix tables.A classification error matrix was computed for quantitative accuracy assessment.The dominant land cover types in the selected area were urban, mountain and land areas which correspond to 95% of the entire image.The remaining 5% of the image is consisted by vegetation, ritual area and shadow.
For MD algorithm which is the simplest classifier among others, the result of overall accuracy was 64.2% with 0.479 value of kappa coefficient was obtained.The user accuracy is varied between 50.7% for urban class and 100.0%for vegetation and ritual area classes.Mountain, land and shadow classes recorded 79.6%, 62.2% and 73.1% respectively.For producer accuracy, the accuracy for each class using MD approach was as follow: 67.1% for urban, 57.9% for mountain, 70.9% for land, 45.5% for vegetation (lowest), 66.7% for ritual area and 95.0% for Shadow (highest).A total of 500 random sample points were tested in order to verify the classification result with 321 points was correctly classified.Meanwhile, urban class recorded almost half of the tested pixels that correctly classified with most of the misclassified pixel go to mountain class.A total of 129 out of 162 observations had been correctly classified for mountain class and 33 points were wrongly classified with 26 points were misclassified as urban class.The high number of wrongly pixels go to urban class is due to the fact that the mountainous area in the arid environment is not cover by tree but it is filled by stones and rocks which is has a similar spectral characteristic of urban area.Nevertheless, vegetation and ritual area classes gave the perfect result by correctly classified all tested points.Both classes are easily to classify due to the significantly different on their spectral characteristics among other classes.In the other hand, 56 out of 90 observations for  Table 7. Error matrix derived from Frequency-based Contextual classifier addition, 126 out of 151 observations were correctly classified for mountain class whereas 63 out of 76 observations were correctly classified for land class.Most of the misclassification for both classes goes to urban class.For vegetation class, although it has lower percentage over MD classifier, but the result still to be considered as a good result by obtaining over 90% with only 2 out of 27 observations were wrongly classified.Ritual area class was another category that showed their percentage lower than MD classifier.Although the class was easily to classify but they recorded only 66.7% when validation process was performed.
The lower in accuracy for the ritual area class is explained by the fact that only 3 out of 500 points were tested in that particular class meaning insufficient validation points occurred in this class.The result is expected to be higher if more validation point is added during the validation process as this class was a homogenous category.In the other hand, shadow class gave a perfect result by correctly classified all tested pixels.However, NN approach which was one of the advanced methods tested in this project demonstrated superior result in term of overall accuracy.The NN outperformed the other classifiers for this factor.The overall accuracy was 84.2% and had a value of 0.757 for kappa coefficient.The NN method seems to do a much better job in classifying all classes than the other methods, which seems to be the primary reason for its high overall accuracy.The success of this classifier is due to the fact that four classes (land, vegetation, ritual area and shadow) have been tested correctly perfect (100% is obtained in analysis per class).In fact, the mountain class achieved a high user's accuracy (95.3 percent) using NN method with 162 out of 170 observations were correctly classified.Nevertheless, urban class recoded 72.5% in analysis per class but it still acceptable and highest among other classes.From the view point of statistical analysis, most of the pixels in urban area were confuse with mountain (48 points) and land (23 points).The reason why each of classifier always resulted urban class to the lower percentage compared to other classes will be explained in the next sub-section.Meanwhile, producer's accuracy varied between 59.7% for land class and 100% for ritual area class.Urban, mountain, vegetation and shadow classes recorded 98.9%, 77.1%, 93.8% and 92.3% respectively.
The network architecture for the NN had three layers, with twelve units in the hidden layer, four units in the input layer (one for each spectral band), and six units in the output layer (one for each class).The other parameters used in the NN algorithm are shown in Table 8.These network structures were determined through trial and error meaning the number of hidden units used in this application was determined through experimental simulations.Fig. 6 shows the variation of RMSE values at convergence as a function of the number of hidden nodes.The experiments were performed with a maximum number of iterations of 1000 and the final RMSE was between 0.009 and 0.109 with the number of hidden nodes ranging from 3 to 15 nodes with increment of 3. The minimum RMSE with the smallest number of nodes was attained adopting architecture with 12 hidden nodes.This was the architecture finally adopted for the learning and classification process.For FBC classifier, it is required to determine window sizes before begin the classification process.This window size was very important because it will impact on how much the spatial information will be included during decision making process.But the exactly size is not easy to determine.Hence, it was determined by experiment.From the experimental simulations, 9x9 window size was determined as an optimum window.But we decided to choose window sizes of 7x7 instead of 9x9 as the images used for other three classifiers were filter out using 7x7 averaging filter to reduce the noisy effect from the original image.Moreover, there are no significant differences on overall accuracy between these two windows as well as for the remaining of window sizes beyond 9x9 as shown in Fig 7 .The FBC techniques used in this project achieved higher overall accuracy and kappa coefficient (81.6 percent and 0.722) rather than traditional method.Even though it cannot overtake the performance of NN but their result is still good and acceptable.Further evaluation of the error matrix shows that the additional of contextual information increases map accuracy.The high quality of the spatial information had a large impact on the success of this method.However, there are some extremely difficult types of confusion to map.Desert landscaping often consist of gravel, and certain types of gravel can be spectrally indistinguishable from urban.To make the confusion even more complex, some part of mountainous area were located within urban area.This situation would lead the misclassification to be occurred since their spectral characteristic is similar.In the meantime, the class specific producer accuracy varied between 28.6% for vegetation class and 100.0%for ritual area class.User accuracy reached the highest value of 100.0% for vegetation class.Lowest values were obtained for the class ritual area with 60.0%.The integration of contextual information showed its benefits in the sharp improvement in accuracy for the mountain and land classes compared to traditional method.Other behavior of FBC method that it can be seen from the classification result that pixels at the edge of different land cover type are mostly misclassified.At the center of each land cover type, most classes are correctly classified.By evaluating error matrix table, it revealed that urban and mountain classes were confused each other.Shadow, land and vegetation classes were easily classified with all observation points were classified correctly for vegetation class.Nevertheless, the unexpected result was achieved by ritual area class where it gave lower result (60 percent) although this class was considered as homogenous area with has uniformly in their spectral characteristic.The sharp decrease in accuracy for that class is explained by the fact that it was not suitable to use the current window sizes due to the homogenous behavior of the class.For this situation, smaller window size is more suitable and could be expected to increase the class accuracy.In general, the NN approach generally provided the highest accuracies for all classes.
Considering the overall accuracy, NN provided the best classification results with 84.2% and MD provided the poorest results with overall accuracy of 64.2%.Fig. 8 provides a comparison of kappa and overall accuracy results among the different classifiers.It indicates that NN and FBC have a significantly better accuracy than do MD and ML classifiers.MD produced lower classification accuracy because it only used the mean vector and ignored the covariance between the classes.ML produced a relatively higher accuracy than did MD because it takes the covariance into account in its algorithm.However, ML assumes a normal distribution for the histograms of the classes, which is not always true.Both MD and ML only consider per-pixel information, ignoring texture or contextual information.
Comparing the two approaches (traditional and advanced methods), the proposed NN classifier proved to be more effective, with a 6.6% and 20.0% increase in accuracy compared to ML and MD classifier whereas FBC could increased their accuracy up to 4.0% and 17.4% compared to the same classifiers.

Performances in heterogeneous area
The heterogeneity environment in an image is a major problem in classification where a pixel contains more than one land cover class.In our case, urban class is considered as a heterogeneous area instead of homogenous area for remaining of the five other classes.In this section, we will explain the reason why the percentage of urban class in this work is always lower compared to other classes.This is due to the urban factor itself.The spectral characteristics of urban surfaces are known to be complex.This is due to the fact that much information could be extracted from the urban class.Urban areas are characterized by a large variety of built-up environments and natural vegetation covers which not only determine the surface features of a city, such as land use patterns, but also influence ecological, climatic and energetic conditions of land surface processes (Chen et al, 2009).For instance, sites under construction possess a more varied high reflectance resulting from building construction foundations and construction materials.Cleared land exhibits high uniform spectral reflectance which is characteristic of bare soil, while some vegetated area also located in urban environment.These numbers of information in a single class would create the high possibility of mixed pixel to be occurred.Mixed pixel problem increases the difficulty in classification process and lead to has misclassification pixels and reduce the classification accuracy.As stated by Small (2005), the highly heterogeneous nature of urban surface materials is problematic at multiple spatial scales, resulting in a high percentage of mixed pixels in moderate resolution imagery and even limiting the utility of high spatial resolution imagery.Furthermore, Alberti et al., (2004) in their article mentioned that interpretation and analysis of urban landscapes from remote sensing, however, present unique challenges due to the characteristics of urban land cover which amplify the spectral heterogeneity of urban surfaces and make it extremely difficult to identify the source of observed in observed reflectance.
The greatest challenge for each of the classifiers is to accurately determine various materials that make up urban surface reflectance.Hence, the classification result could be increased extremely if any classifiers can performed well in these urban mixing surfaces as it represented almost one third of the entire image.mountain classes as their spectral characteristic are very similar.So that traditional per pixel classifiers such as ML and MD are not recommended to be used when the image contain large portion of heterogeneity area surfaces.The MD too broadly classified class by often overlapping another class because the classifier lacks sophisticated spectral discrimination between very complex features.The ML is more sophisticated, but being a per-pixel classifier, created a salt and pepper pattern classification, which showed misclassification has been occurred.

Training sample
In supervised classification approach, training stage become a major part in the decision making process as it will affected the outcome of the classification result.In order to analyse the performance of the four classifiers in term of training sample, two sets of training data were prepared.Training samples were chosen across the study area and the number of samples for each land cover type was listed in Table 9.However, the classification results greatly depended on the quality of training datasets and required abundant and accurate field measurements from all classes of interest.One difficulty encountered in particularly heterogeneous areas, such as the urban class, is related to the difficulty of identifying a sufficient number of pure pixels for classifier training and validation.Unlike the other classes, particularly on the vegetation, ritual area and shadow classes were easy to identify due to the spectrally different among each other.The use of different training data sets for the classification of the same images is due to the differences of the classifier characteristic behavior in the decision making process.For example, traditional method needs more training data as this type of method was a statistical approach.With a large number of the training data, it can generate the statistical information for the classification process.Meanwhile, advanced method do not required a large number of training data as it not a statistical approach.They have their own way to handle the training stage.For instance, the training of a network by backpropagation involves three stages: the feed forward of the input training pattern, the calculation and back-propagation of the associated error, and the adjustment of the weights (Rezapour et al., 2010).In fact, the weights are usually randomized at the beginning of the training.
Evaluation on table 9 demonstrated that traditional method needs almost double size of pixels in order to perform classification compared to advanced method.We also conducted experiment for traditional method by using the dataset that prepared for advance method (data set 2).The experimental results revealed that both classifiers cannot perform well with this training dataset as their overall accuracy were decreased from 77.6% to 68.0% and 64.2% to 57.0% for ML and MD classifiers.The amounts of seven to nine percent reduction were obtained.This indicates that the small number of training samples is not sufficient for both of classifiers.The experiment shows the strong evidence that the traditional classifier needs a large number of training samples in order to perform the classification.
In addition, the training samples of ML and MD were selected in their raster layer.Any repeatable on experiments are without difficulty.The training process is not take long time to complete although they have a large number of training data.Unlike NN and FBC, their training samples were collected in bitmap layer.The number of bitmap layer is corresponding to the number of intended classes.

Conclusion
In this article, four different approaches to the classification of complex areas by use multispectral data have been described.The main purpose of our investigation was to quantitatively assess, also from the viewpoint of statistical significance, the capabilities of the four approaches to exploit ALOS AVNIR-2 satellite data in an effective way.Some interesting conclusions can be drawn from the obtained results.Different classifiers have their own advantages and disadvantages.For a given research topic, deciding which classifier is more appropriate depends on a variety of factors.Even though some classifiers provide more accurate results than others, all four used in this research are useful in extracting land-cover information.However, of the four classifiers tested, NN and FBC are the two most recommended approaches when classifying the image that surrounding with desert environment especially for urban class.Experimental results confirm the significant superiority of the advanced method in the context of multispectral data classification over the conventional classification methodologies.Sophisticated algorithms are needed to successfully discriminate distinct features in complex environments.In this case, classification problems will be either related to spatial/spectral aspects or to spectral mixtures at a given resolution.Our results show that NN and FBC had the best performance to address the land cover heterogeneity of the study area.These two classification approaches have proved to be suited for classification of complex areas.NN method was preferred because they are capable of handling large amounts of data and do not require simplifying hypotheses on the statistical distribution.The NN approach provided better overall accuracy than did the FBC, ML and MD approaches.On the other hand, the NN approach requires a complex and expensive design phase (e.g., concerning the correct size of the hidden layers and parameter settings) and a much longer training time.For FBC, the contribution of the spatial information (neighboring pixels) to the digital satellite imagery for land cover mapping was very valuable instead of depending on multispectral data alone.Although there are several limitations, the results of the classification procedures performed highlight the accuracy improvement compared to traditional method.The traditional classification methods, in this case ML and MD, reach their limitations in urban systems due to the high spectral heterogeneity of urban features.The misclassification of some urban features came therefore as no surprise, since high-quality buildings, streets and their surroundings are very heterogeneous.
In conclusion, remote sensing has been shown to be a useful tool for evaluating the performances of different classifiers in arid environment.Remote sensing classifications should be considered the technique of choice for land cover study and monitoring.In many instances remotely sensed data are used to derive information on a specific land cover class of interest.Although a conventional classifier may be used to derive this information but it cannot handle the complex mixture environment and always produced noisy image in that particular environment such as in the urban class.Urban environments represent one of the most challenging areas for remote sensing analysis due to high spatial and spectral diversity of surface materials.Finally, future study are planned that will compare the results of this study to those that can be obtained using object based approaches.Additionally, research will be conducted on the use of highresolution image and applying it to more extensive remote sensing data such as hyperspectral images.

Fig. 4 .
Fig. 4. Basic neural network architecture are a number of factors affecting the land cover classification accuracy of the FBC.For instance, the collection of the training area and selection of the pixel window size are very important for this approach.The training area must be representative and of a reasonable size to capture the spatial structure of any land cover type in an image.Nevertheless, pixelwindow size determines the amount of spatial information that can be included in the classification.Because the optimal pixel window varies with the individual class and image resolution, it is usually difficult to determine before image classification.Therefore, an appropriate window size is usually determined empirically.Pixel window size needs to be specified specifically when performing contextual classification on each pixel.Users may have to run the contextual classifier with the same input data, but using different settings for window size until a desirable output is produced.In general, contextual classification performs better when specifying a larger window size, especially if the original input image contains complicated mixed classes (such as urban areas).If the classes are uniform and spectrally pure, then a smaller window size may sufficient.A few examples of different window sizes are shown in Fig.5.It seems clear that the inclusion of spatial arrangement information of gray-level values in a pixel neighborhood can considerably improve the performance of the FBC, as expected byGong and Howarth (1992).But, this classifier also has their drawback itself.Contextual classification cannot classify pixels along the edges of the image.If the output window borders the edge of the image file, then the output pixels along the edge are set to zero, to indicate unclassified or unknown pixels.Usually, the error patterns caused by the contextual classification algorithm are usually systematically located along the class boundaries.Meanwhile, the classification results demonstrate that a significant increase in overall accuracy can be achieved by combining spatial data with spectral data when comparing the results obtained from traditional method although it cannot overtake the performance of neural network algorithm.

Fig. 5 .
Fig. 5. Examples of window sizes used in frequency based contextual method (3x3, 5x5, 7x7).Black pixel indicates the center pixel of the specific window.
For advanced method, knowledge of the statistical distribution is not required.Rather NNs learn it from a representative training set.In our case, the training phase of the NN was based on the back-propagation (BP) learning rule to minimize the mean square error (MSE) between the desired target vectors and the actual output vectors.Training patterns were presented to the network, and the weights of e a c h n o d e w e r e a d j u s t e d s o t h a t t h e approximation created by the NN minimized the error between the desired output and the added output created by the network.In a network each connecting line has an associated weight.NN are trained by adjusting these input weights (connection weights), so that the calculated outputs approximate the desired.In the learning phase, input patterns from training data are fed forward through a network initiated with random synapse weights.The root-mean-square error (RMSE) is calculated between the network outputs and the desired outputs.The errors are back-propagated through the network and the synapse weights are adjusted in order to reduce the total RMSE.This process continues until a convergence criterion is satisfied(Rumelhart et al., 1986).The successful generalization of the NNs used in this application is indicated by the low residual RMS errors.The training is finished when the output value is equal to the ideal output value.Mean Squares of the network Errors (MSE) is given by the Equation3(Moghadassi et al., 2009): (3) where Target output (τ i ) α i is output from neuron Meanwhile, the selection of training sets were based on field surveys, reference information from SPOT-5 images and visual inspection of the image of the particular area.Only the training samples believed to be the most useful and informative were selected for the classification.Training data acquisition can be a very costly process.Training data that are not carefully selected may introduce error.Collection of training data is the crucial step for image classification and it directly influences the classification accuracy (Wang et al., 2007).Training set size can impact greatly on classification result.However, size is only one www.intechopen.comAnalysis of Land Cover Classification in Arid Environment: A Comparison Performance of Four Classifiers 127 attribute of a training set.Some of the literature suggests the use of a minimum of 10-30p cases per-class for training, where p is the number of wavebands used
Fig. 6.Graph of RMSE versus Number of Nodes

Fig. 7 .
Fig. 7. Graph showing experimental result of FBC using different window sizes

Fig. 8 .
Fig. 8.Comparison of overall accuracy and kappa coefficient using different classifier Fig. 9. Comparison of four classifiers for each class .

Table 2 .
Detail description of the classes www.intechopen.com Table 4, 5, 6 and 7 demonstrated the error matrices table deriving from MD, ML, NN and FBC classifier.

Table 4 .
Further evaluation of the error matrix shows that 388 out of 500 points used from the same random samples were correctly classified.The classifier had some difficulty separating cleared land from land under construction (urban) and mountain from urban area, as exhibited by error matrix table that showed 68 points were wrongly classified to both classes (53 points for mountain, 15 points for land).This is understandable because their spectral characteristics are very similar.However, the result of urban class revealed that significant improvement (nearly 20%) was achieved compared to the MD classifier.In Error matrix derived from Minimum Distance-to-Mean classifier

Table 5 .
Error matrix derived from Maximum Likelihood classifier

Table 6 .
Error matrix derived from Back-propagation Neural Network classifier

Table 9 .
The training process is time consuming Number of training samples for each class especially for NN classifier.This is due to the fact that the repeatable on experiments required all the parameter settings and also the first set of random weights.If the structure has more than one hidden layer, hence, more time is needed to finish the training process.Lippman (1987) suggested that NN with more than one hidden layer are harder to use because they add the problem of hidden structures and lengthen training time.For FBC, it also takes longer time in training stage but not too longer as NN.Thus, NN was found the least friendly in training and the most expensive in terms of time requirement although they have less number in training sample.