Mapping urban form at regional and local scales is a crucial task for discerning the influence of urban expansion upon the ecosystem and the surrounding environment. Remotely sensed imagery is ideally used to monitor and detect urban areas that occur frequently as a consequence of incessant urbanization. It is a lengthy process to convert satellite imagery into urban form map using the existing methods of manual interpretation and parametric image classification digitally. In this work, classification techniques of high-resolution satellite imagery were used to map 50 selected cities of study of the National Urban System in Mexico, during 2015–2016. In order to process the information, 140 RapidEye Ortho Tile multispectral satellite imageries with a pixel size of 5 m were downloaded, divided into 5 × 5 km tiles and then 639 tiles were generated. In each (imagery or tile), classification methods were tested, such as: artificial neural networks (RNA), support vector machines (MSV), decision trees (AD), and maximum likelihood (MV); after tests, urban and nonurban categories were obtained. The result is validated with an accuracy method that follows a stratified random sampling of 16 points for each tile. It is expected that these results can be used in the construction of spatial metrics that explain the differences in the Mexican urban areas.
- urban form
- remote sensing
- high-resolution satellite imagery
- advanced classification methods
- GIS integration
Urbanization, as a process that manifests itself through the concentration of population in cities, is considered one of the most powerful and visible anthropogenic forces on the planet. Its influence is manifested on topics ranging from environmental changes on a global, regional, and local scale [1, 2], socioeconomic problems  to urban planning . Thereby, several investigations use maps of urban areas to assess the influence of urbanization on natural and human environments and to estimate some important aspects of urbanization, such as its composition , size, scale, and form .
The urban form is the most visible result of the economic, social, cultural, and environmental driving forces of urban development . Therefore, it is a spatial reflection of different processes across the evolution of a city and its characterization is a valuable source of information for urban planning. Ultimately, urban form is the result of the symbiotic interactions of infrastructures, people, and economic activities in a city that is constantly evolving in response to social, environmental, economic, and technological development .
In the cities, urban form is materialized by the heterogeneous physical alignment and characteristics of buildings, streets, and open spaces at different levels of spatial resolution. This high heterogeneity of materials and urban objects in terms of size, forms, and urban fabric morphology of the cities can be detected through the use of remote sensing imagery. This type of research provides very important information in relation to urban issues on planning, housing, health, transportation, and economic policies; especially for regions in developing countries that are less documented.
Most of the research efforts have been made for mapping urban landscapes at various scales and on the spatial resolution requirements of such mapping . Different remote sensing techniques have already shown their value in mapping urban areas with different spatial, geometric, spectral, and temporal resolutions for different purposes. Therefore, the selection of an appropriate estimation method based on remotely sensed data characteristics is important.
Traditional remote sensing literature review suggests that major approaches include pixel-based image classification [9, 10], spectral index [11, 12], object-oriented algorithms [13, 14], and machine learning like artificial neural networks  and decision tree classification algorithm . Techniques, such as data/image fusion, have also been explored . Recent research has used high and very high spatial resolution remote sensing imagery to quantitatively describe the spatial structure of urban environments and characterize patterns of urban morphology .
Remote sensing approach compared with traditional methods for mapping the urban form provides certain advantages due to its convenience, efficiency, and coverage . For this reason, the study of the detection of the urban form and its corresponding derived attributes through different types of satellite images is becoming of more interest [16, 20, 21, 22, 23].
Regardless of the satellite imagery classification method employed for urban form detection, they can be divided into two categories: supervised and unsupervised methods. Those results obtained by the first ones usually produce a greater reliability, nevertheless they require more processing steps for the construction of training data.
For the supervised methods, the classifiers based on support vector machines (SVM) are very popular due to their good performance and robustness [24, 25]. Additionally, the methods based on the artificial neural networks (ANN) are also widely used for the classification of urban areas . For example, Dridi et al.  combine multiple SVM for the mapping of urban extensions in the city of Algeria and compare them with ANN to support the experimental analysis to monitoring the spatiotemporal phenomenon of urban sprawl. Other supervised classification methods, such as decision tree (DT), regression model (RM), and maximum likelihood (ML), can also provide plausible results in the mapping of urban areas .
In this work, we evaluated four supervised classification methods (SVM, ANN, DT, and ML) using satellite images of earth observation, to integrate with a GIS approach the mapping of the urban form in 50 Mexican cities. The rest of this document is organized as follows: in Section 2, the context of the cities selected for the test and the dataset used are briefly presented; in Section 3, it is described the methodology with the proposed classification strategy for urban mapping that includes the preprocessing of RapidEye images, the collection of training samples, the classification methods evaluating the validation strategy, and the postprocessing GIS approach. The experimental results obtained and their discussions are presented in Section 4. Finally, the conclusions of the work are expressed in Section 5.
2.1 Study area
In Mexico, urbanization has been associated with increased prosperity and improvements in quality of life. Urban areas, lead in expanding coverage of basic and social services, also offer better access to other services and amenities, including health care and education. Moreover, Mexico’s growing middle class and declining inequality in recent decades seem to be definitely urban phenomena .
There have been important changes on the spatial form of Mexican cities over the past 30 years: most notably urban growth is characterized as distant, dispersed, and disconnected. Between 1980 and 2010, the built-up area of Mexican cities expanded on average by a factor of seven and the urbanized area of the 11 biggest metropolitan areas with more than 1 million inhabitants in 2010 has even grown by a factor of nine (SEDESOL 2012). This rapid spatial transformation of most Mexican cities presents important challenges for their potential to promote green and inclusive growth. To solve these problems, different initiatives have made significant efforts to put in place measurement systems and to broaden information about urban dynamics.
An ambitious national initiative, the National Urban System (NUS) is a unified platform to support decision-making for urban and housing policies. The NUS, launched by Mexican federal agencies in 2012, exemplifies a significant effort to broaden information and understanding about urban dynamics and has been recognized as innovative among Latin American urban initiatives. This system is a reference to analyze spatial patterns of Mexican cities, their causes, and their impact and to provide an analytical basis to understand urban phenomenon.
The National Population Council (Consejo Nacional de Población, CONAPO) and the Secretariat of Social Development (Secretaria de Desarrollo Social, SEDESOL) put together the NUS on the basis of data from the Population and Housing Census (2010) with the objective of creating a system to support strategic planning and decision-making in urban areas and to provide all sectors (state governments, municipalities, academia, private sector, and general users) with integrated metropolitan and urban information on demographic and socioeconomic variables. The NUS comprises 384 cities with over 15,000 inhabitants each, out of which 59 are metropolitan areas, 78 conurbations (suburban centers), and 247 urban centers. About 81.2 million people or 72.3% of the country’s population live in these 384 cities.
The study area corresponds to a 50 cities sample of the NUS that include three types of cities, classified on the basis of geographical delimitations defined by the NUS (Figure 1).
These 50 urban areas include:
12 metropolitan areas defined as a group of municipalities that share a central city and are highly integrated with more than 250,000 residents: (1) Aguascalientes, (2) Monclova-Frontera, (3) Juárez, (4) San Francisco del Rincón, (5) Moroleón-Uriangato, (6) Tula, (7) Tehuacán, (8) Rioverde Ciudad Fernández, (9) Nuevo Laredo, (10) Coatzacoalcos, (11) Tianguistenco, and (12) Teziutlán.
16 urban conurbations that extend across more than one locality and have more than 15,000 residents: (13) Ensenada, (14) Campeche, (15) Manzanillo, (16) Tapachula de Córdova y Ordóñez, (17) Guanajuato, (18) Irapuato, (19) Chilpancingo de los Bravo, (20) Ciudad Lázaro Cárdenas, (21) Uruapan, (22) Zitácuaro, (23) San Juan Bautista Tuxtepec, (24) Chetumal, (25) Ciudad Obregón, (26) Cárdenas, (27) Túxpam de Rodríguez Cano,and (28) Fresnillo.
22 urban centers that have more than 15,000 residents and that do not extend beyond the boundaries of their locality: (29) La Paz, (30) Ciudad del Carmen, (31) Ciudad Acuña, (32) Comitán de Domínguez, (33) San Cristóbal de las Casas, (34) Cuauhtémoc, (35) Delicias, (36) Hidalgo del Parral, (37) Victoria de Durango, (38) Salamanca, (39) Iguala de la Independencia, (40) Ciudad Guzmán, (41) Lagos de Moreno, (42) Apatzingán, (43) San Juan del Río, (44) Ciudad Valles, (45) Los Mochis, (46) Culiacán Rosales, (47) Mazatlán, (48) Navojoa, (49) Heroica Nogales, and (50) Ciudad Victoria.
Urban areas were identified by looking at the layer of urban polygons of the geostatistical framework, version 5.0 of the National Institute of Statistics and Geography (Instituto Nacional de Estadística y Geografía, INEGI). Later, satellite images were obtained for the binary classification between urban and nonurban areas that covered the 50 study cities, for which 140 RapidEye images of the period 2015–2016 were acquired, through the Planet platform (
The main characteristics of these images are: (a) spatial resolution of 5 m and covered area per image of 25 km2; (b) 5-band spectral resolution (blue 440–510 nm, green 520–590 nm, red 630–685 nm, red edge 690–730 nm, and near-infrared 760–850 nm); (c) 12-bit radiometric resolution, and (d) Universal Transverse Mercator (UTM) and WGS84 Horizontal Datum.
Additionally, a digital elevation model (DEM) of the Mexican territory was downloaded to perform the radiometric and atmospheric corrections. Finally, for the collection of training samples, a Web Map Service (WMS) of a SPOT satellite images mosaic provided by the Mexico Reception Station (Estación de Recepción México, ERMEX) was used, at a resolution of 1.5 m in true color.
The methodology is split into five main steps as follows: strategy for satellite imagery download and preprocessing, training and validation sample selection, classification methods, GIS integration, and results evaluation.
3.1 Strategy for satellite imagery download and preprocessing
In the first step, the entire Mexican territory was divided into nonoverlapping 5 × 5 km blocks, with the purpose of selecting blocks that cover the mosaics of the images related to the urban areas selected. A total of 639 blocks were selected to cover the 50 urban areas. Then, 140 RapidEye Ortho Tile multispectral scenes were downloaded through the Planet platform (
Radiometric and atmospheric corrections were conducted to retrieve surface reflectance values by means of the atmospheric and topographic corrections software (ATCOR3) implemented in the ENVI virtual IDL machine . Finally, mosaics by blocks were prepared for each of the 50 cities.
3.2 Training and validation sample selection
To obtain training and validation samples, the generated blocks in the previous stage were used to cover the mosaics of the satellite imagery that corresponds to the selected. Training and validation data should be representative of the study area and of the classification scheme. Because urban is often a relatively rare class that covers only a small proportion of the landscape, spatial stratification with proportional class allocation (SpatialProp) was selected to be able to obtain high user’s accuracy of urban class .
In the SpatialProp strategy, the sample size is allocated to each class proportional to the areal coverage in the reference set, with the constraint that each spatial stratum receives an equal total sample size. For example, if the urban and nonurban classes comprised 25 and 75% of the area of the entire region, respectively, the sample allocation in each spatial stratum would be 25% urban and 75% nonurban. According to Jin et al.  in each 5 × 5 km block, 16 random samples are assigned to the urban and nonurban strata proportional allocation. For example, in our hypothetical situation, nonurban occupies 75% of the area and urban occupies 25%. Given the total sample size of 16, 12 nonurban pixels and 4 urban pixels will be selected following the designs of SpatialProp.
For the 639 blocks employed for the 50 selected urban areas, 20,448 sampling and validation points were assigned. Later, each of the data points were verified with the related category based on the RapidEye mosaic and the Web Map Service (WMS) of a SPOT Image.
3.3 Classification methods
Machine-learning classification has become a major focus of the remote-sensing literature since it is generally able to model complex class signatures without making assumptions about the data distribution, i.e., it is nonparametric . A wide range of studies have generally found that these methods tend to produce higher accuracy compared to traditional parametric classifiers, especially for complex data with a high-dimensional feature space [32, 33].
However, parametric maximum likelihood (ML) classifier method is the most commonly used remote-sensing classification method . In this work, we evaluate the classification methods of artificial neural networks (ANN), support vector machines (SVM), decision tree (DT), and maximum likelihood (ML) for each city. For each of this classifier, we can measure the accuracy based on the use of an error matrix. Below, there is a brief description of each referred methods.
3.3.1 Artificial neural networks (ANN)
An artificial neural network is a massive parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and can make it available for use . The model is formed by artificial neurons that emulate biological neurons and the synaptic connections among them; it regulates them through the process of solving problem .
The network needs to be “trained” with a sufficiently large number of examples in order to be able to make the appropriate inferences. The procedure of training involves groups of input data together with the expected output data. Once the system of neurons has been trained, the network allows the processing of imprecise information, the generalization of known responses to new situations, and the prediction of outcomes. They are appropriate models for dealing with a large set of variables and their nonlinearity is convenient for the assessment of complex systems .
The links with the neurons located in the so-called hidden neuron layer take then different weights and are educated depending on the required output, thus they can model complex relationships among variables. The system requires feedforward and backpropagation processes to allow the network to get trained . The visualization of this stage is accomplished through error analysis. If the error becomes smaller and asymptotic, the network will be ready to receive new input data and to predict an output .
The ANN models used in this study are of the multilayer perceptron ANN type, a model in which all neurons are fully connected to adjacent layers while layers are not connected to each other at all [39, 40]. There are three types of layers in a typical multilayer perceptron network: input layer, hidden layer, and output layer. This architecture is shown in Figure 2. In each case, the training of the proposed network was performed with a backpropagation algorithm which is a supervised learning procedure .
The main tasks of remote sensing data analysis in which the application of ANN standard backpropagation for supervised learning is reported are classification, more commonly land cover classification [42, 43], unmixing [44, 45], and retrieval of biophysical parameters of cover . Other applications of ANNs are also reported in change detection, data fusion, forecasting, preprocessing, georeferencing, and object recognition.
3.3.2 Support vector machines (SVMs)
Support vector machines are a supervised nonparametric statistical learning technique that has no assumption made on the underlying data distribution . Initially, the method is presented with a set of labeled data instances and the SVM training algorithm aims to find a hyperplane that separates the dataset into a discrete predefined number of classes in a fashion consistent with the training examples . Where, optimal separation hyperplane term is used to refer to the decision boundary that minimizes misclassifications, obtained in the training step and learning refers to the iterative process of finding a classifier with optimal decision boundary to separate the training patterns (in potentially high-dimensional space) and then to separate simulation data under the same configurations (dimensions) .
In its simplest form, SVM are linear binary classifiers that assign a given test sample a class from one of the two possible labels . Figure 3 illustrates a simple scenario of a two-class separable classification problem in a two-dimensional input space where the solution for a typical two-dimensional case where the subset of points that lies on the margin (called support vectors) is the only one that defines the hyperplane of maximum margin.
An important generalization aspect of SVMs is that frequently not all the available training examples are used in the description and specification of the separating hyperplane. The subset of points that lie on the margin (called support vectors) is the only one that defines the hyperplane of maximum margin. If the two classes are not linearly separable, the SVM tries to find the hyperplane that maximizes the margin while, at the same time, minimizing a quantity proportional to the number of misclassification errors . The tradeoff between margin and misclassification error is controlled by a user-defined constant . SVM can also be extended to handle nonlinear decision surfaces. Boser et al.  propose a method of projecting the input data onto a high-dimensional feature space using kernel functions and formulating a linear classification problem in that feature space .
In case of nonlinear classification, SVM can perform the classification by using various types of kernels which turn nonlinear boundaries to linear ones in the high-dimensional space to define optimal hyperplane . In this study, four types of kernels (linear, polynomial, radial basis function, and sigmoid) were used for the SVM classification.
3.3.3 Decision tree (DT)
A decision tree is a flow chart like tree structure, defined as a classification procedure that recursively partitions a dataset into smaller subdivisions on the basis of a set of tests defined at each branch (or node) in the tree . Figure 4 illustrates a tree composed of a root node (formed from all of the data), a set of internal nodes (splits), and a set of terminal nodes (leaves). Each circle is a node at which tests (T) are applied recursively, in order to split the data into smaller groups. The labels (A, B, C) at each leaf node refer to the class label assigned to each observation.
In this framework, a DT classifier performs multistage classifications by using a series of binary decisions to place pixels into classes. Each decision divides the pixels in a set of images into two classes based on an expression. It is possible to divide each new class into two more classes based on another expression and defines as many decision nodes as needed. Decision trees have significant intuitive appeal because the classification structure is explicit and therefore easily interpretable since the results of the decisions are always classes. Furthermore, it is possible to use data from many different sources and files together to make a single DT classifier.
The construction of decision tree classifier does not require any domain knowledge of parameter setting, and therefore, is appropriate for satellite imagery classification . The learning and classification steps of decision tree induction are simple and fast. In general, decision tree classifier has good accuracy. Decision tree induction algorithms have been used for classification in many applications areas, including remote sensing . Decision trees have several advantages over traditional supervised classification procedures used in remote sensing such as l ISODATA clustering and maximum likelihood classifier algorithms . In particular, decision trees are strictly nonparametric and do not require assumptions regarding the distributions of the input data. In addition, they handle nonlinear relations between features and classes, they verify missing values and are capable of handling both numeric and categorical inputs in a natural manner .
3.3.4 Maximum likelihood (ML)
Into the classic remote sensing image classification techniques, maximum likelihood (ML) classifier, widely implemented in commercial image-processing software packages, is the most frequently method used to pixel-wise classification . ML classifier assumes that the statistics for each class in each band is normally distributed and calculates the probability that a given pixel belongs to a specific class. Unless the algorithm selects a probability threshold, all pixels are classified. Each pixel is assigned to the class that has the highest probability, that is, the maximum likelihood .
Statistical techniques such as ML estimation usually assume that data distribution is known a priori . The ML algorithm in remote sensing classification is parametric and depends on each class and is represented by a Gaussian probability density function, which is completely described by the mean vector and variance–covariance matrix using all available spectral bands, and if possible, ancillary information (Figure 5). The maximum likelihood classifier is based on an estimated probability density function for each of the reference classes under consideration, where the class statistics is obtained from the training data. Given these parameters, it is possible to compute the statistical likelihood of a pixel vector as a member of each spectral class .
The maximum likelihood classifier is simple and robust enough to accommodate modifications. With the advent of commercial high and very high spatial resolution sensor data, the ML classifier is appropriate for many urban applications . In the context of the new generation of very high spatial resolution commercial satellite sensors, data from these sensors are high volume and they measure large spectral variations in urban land cover, so that in the absence of classifiers designed to deal with such data, simplicity in the maximum likelihood can accommodate large datasets, and the modifications outlined .
3.4 Validation strategy
In this step, the overall classification accuracies were determined from the error matrix by calculating the total percentage of pixels correctly classified for the classification methods of: (i) artificial neural networks (ANN); (ii) support vector machines (SVM) for linear (ML), polynomial (MP), radial basis function (MRBF), and sigmoid (MS) kernels; (iii) decision tree (DT); and (iv) maximum likelihood (ML). Since this assessment takes only the diagonal of the matrix into account, the Kappa coefficient, which is based on all the elements in the confusion matrix, was also calculated . The overall accuracy and kappa values were determined using test datasets, obtained with the SpatialProp strategy for training and validation samples developed in Section 3.2.
With the approach of more advanced digital satellite remote sensing techniques, the necessity of performing an accuracy assessment has received renewed interest . Accurate assessment or validation is an important step in the processing of remote sensing data. At present, the geographic information systems and remote sensing communities are becoming more interested on accurate topics. Technological developments in the area of data processing offer more and more possibilities. In this work, the collection of training samples collected from a Web Map Service (WMS) of a SPOT satellite images mosaic at a resolution of 1.5 m in true color is used. The data collected by this method are comparable to the field data employed to assess the accuracy of these remote sensing products.
3.5 GIS integration
The different nonparametric classifiers implemented in this work, such as an artificial neural network, decision tree, support vector machines, and the traditional maximum likelihood classifier, have their own strengths and limitations. For example, when sufficient training samples are available and the feature of land covers in a dataset is normally distributed, a maximum likelihood classifier may yield an accurate classification result. In contrast, when an image data are anomalously distributed, neural network and decision tree classifiers may demonstrate a better classification result [65, 66]. Some other times, machine-learning approaches provide a better classification result than ML, although some tradeoffs exist in classification accuracy, time consumption, and computing resources .
Previous research has indicated that the integration of two or more classifiers provides improved classification accuracy compared to the use of a single classifier [67, 68, 69]. A critical step is to develop suitable rules to combine the classification results from different classifiers. Some previous research has explored different techniques, such as a production rule, a sum rule, stacked regression methods, majority voting, and thresholds, to combine multiple classification results [69, 70].
In this step, we have employed a GIS approach to integrate the results of the ANN, SVM, DT, and ML classifiers to produce a better final map of urban form. Different urban mapping hybrid approaches have already been combined to achieve better results [71, 72]. In our approach, the matching results of two or more methods evaluated are combined by the superposition function with the results of the best evaluated method. Subsequently, through a selection of these attributes, the pixels of the urban and nonurban uses that were identified as the best results of the combination within a GIS environment are extracted. The resulting map was validated again, revealing that the most likely characteristics of urban and nonurban uses were present in the combined pixels. This integration GIS approach has allowed the improvement of the results of the urban area classification for the selected cities of study. We suggested that this integration approach can be economically and immediately implemented in a standard GIS software package to produce urban form maps with higher accuracy from satellite images of high spatial resolution for the Mexican National Urban System.
4. Results and discussion
In this study, four different supervised classification methods were integrated to map urban forms of 50 selected cities of study in the National Urban System in Mexico. Maximum likelihood classifier which is a conventional classification method and the advanced classification methods: artificial neural networks, decision tree, and support vector machines for linear (ML), polynomial (MP), RBF (MR), sigmoid (MS) kernels. We found that the artificial neural network classifier (overall accuracy of 92.2%) turned out to be the better single classification method. Support vector machine (overall accuracy of 89.8%) and maximum likelihood (overall accuracy of 89.2%) had similar results. Decision tree classification method (overall accuracy of 87.8%) was the lower classification method. The results we obtained were evaluated by the overall accuracy which is computed by dividing the total number of correct pixels (i.e., the sum of the major diagonal) by the total number of pixels in the error matrix. Overall accuracy for ANN, DT, selected SVM models, and ML classifiers is summarized in Figure 6.
After integrating the results obtained by city, using GIS approach, each evaluated method produces a result that has an impact on the spatial extent of the urban form, this is an important result. GIS approach showed an overall accuracy above the average of global reliabilities for each of the 50 selected cities of study; the average reliability for the methods evaluated in all the cities was 89.8%; when using GIS approach, this average reached 91.2%; this number is higher in 38 of the 50 cities evaluated. The approach used in this work has shown good results, although all the classifiers showed very little differences in the spatial extent (within ±4%) of the urban class. The result for the 50 selected cities of study is shown as follows. Figure 7a shows the metropolitan areas, Figure 7b the urban conurbations, and Figure 7c the urban centers.
Information about urban form maping is essential for proper planning and to examine how the recent urban growth has affected the economic performance and livability of cities. This methodological approach offers a spatially explicit inputs for adjusting urban policy frameworks and instruments in ways that support sustainable spatial development and make cities more productive and inclusive.
In this work, different advance classification methods have been tested for the high-resolution satellite imagery classification for urban form detection. SVM method proved to be better for classification problems of two classes. Its major advantage is the less parameters to make it operational and reach high accuracy rates. The employed methodology shows a great potential for the urban form mapping, which could help urban planners to understand and interpret complex urban characteristics with greater precision, where problems are often cited about satellite-based remotely sensed imagery .
Furthermore, the proposed approach used to integrate results through GIS environment indicates a robust framework for addressing integrated classification problems in the field of remote sensing. This proposed approach allows to obtain better results when is used to integrate, on the basis that each of the integrated classification methods provides the best of its results to the benefit of a more accurate urban form classification.
Therefore, we believe this proposed approach has great practical value for several remote sensing problems and could be improved and applied to various urban applications in the near future. In this respect, this integration approach can be strengthened through the implementation of learning methods to manage the integration of the data and therefore obtain more and better reliable results. Finally, we are also interested in plainly analyzing the morphological characteristics of the urban form through the application of metrics that have, as primary input, the results obtained with this work.
The authors thank the anonymous reviewers for their comments and suggestions. We also thank the financial support granted by the Fondo Sectorial INEGI-CONACYT (278953-S0025-2016-1) project. Throughout the project we had the technical assistance of the Centro de Investigación en Ciencias de Información Geoespacial. For the technical support, we thank Sandra Medina and Gerardo Ávila, and specially we thank Gabriela Quiroz for the mapping making and visual design.