In this approach, cognitive and statistical classifiers were implemented in order to verify the estimated and chosen regions on unstructured environments images. As inspection of crops for natural scenes demands and requires complex analysis of image processing and segmentation algorithms, since these computational methods evaluate and predict environment physical characteristics, such as color elements, complex objects composition, shadows, brightness and inhomogeneous region colors for texture, JSEG segmentation algorithm was approached to segment these ones, and ANN and Bayes recognition models to classify images into predetermined classes (e.g. fruits, plants and general crops). The intended approach to segment classification deploys a customized MLP topology to classify and characterize the segments, which deals with a supervised learning by error correction – propagation of pattern inputs with changes in synaptic weights in a cyclic processing, with accurate recognition as well as easy parameter adjustment, as an enhancement of iRPROP algorithm (improved resilient back-propagation) (Igel and Hüsken, 2003) derived from Back-propagation algorithm, which has a faster identification mapping process, that verifies what region maps have similar matches through the explored environment. Bayes statistical models had the addiction of process variable as set parameters of predictive error correction.
To carry through this task, a feature vector is necessary for color channels histograms (layers of primary color in a digital image with a counting graph that measures how many pixels are at each level between black and white). After training process, the mean squared error (MSE), denotes the best results achieved by segment classification to create the image-class map, which represents the segments into distinct feature vectors. Furthermore, a language dictionary is used for the expansion on main results, which semantic regions and negation detection are applied as data mining process with cognitive and statistical classifiers.
2. JSEG image segmentation
Color images with homogeneous regions are segmented with an algorithm to generate clusters in the color space/class (different measures classes in spectral distribution, with distinct intensity of visible electro-magnetic radiation at many discrete wavelengths) (Deng et al, 1999a). One way to segment images with textures is to consider the spatial arrangement of pixels using a region-growing technique whereby a homogeneity mode is defined with pixels grouped in the segmented region. Furthermore, in order to segment texture images one must consider different scales of images.
The JSEG algorithm segments images of natural scenes properly, without manual parameter adjustment for each image and simplifies texture and color. Segmentation with this algorithm passes through three stages, namely color space quantization (number reduction process of distinct colors in a given image), hit rate regions and similar color regions merging.
In the first stage, the color space is quantized with little perceptual degradation by using the quantization algorithm (Deng et al, 1999b) with minimum coloring. Each color is associated with a class. The original image pixels are replaced by classes to form the class maps in the next stage. Before performing the hit rate regions, the J-image - a class map for each windowed color region, whose positive and negative values represent the edges and textures of the processing image - must be created with pixel values used as a similarity algorithm for the hit rate region. These values are called „J-values“ and are calculated from a window placed on the quantized image, where the J-value belongs.
2.1. Segmentation algorithm evaluation
Natural scenes present a 24-bit chromatic resolution color image, which is coarsely quantized preserving its major quality. The main idea for a good segmentation criterion is to extract representative colors differentiating neighboring regions in the acquired image, as an unsupervised method.
Therewith, the color quantization using peer group filtering (Deng et al., 2001) is applied through perceptual weighting on individual pixels, to smooth the image and remove the existing noise. Then, new values indicating the smoothness of the local areas are obtained, and a weight is assigned to each pixel, prioritizing textured areas to smooth areas. These areas are identified with a quantization vector to the pixel colors, based on General Lloyd Algorithm (GLA) (Gersho and Gray, 1999), which the perceptually uniform L*u*v color space is adopted, presenting the overall distortion D:
And it is derived for:
The parameters: ci is the centroid of cluster Ci, x(n) and v(n) are the color vector and the perceptual weight for pixel n. Di is the total distortion for Ci.
With the centroid value, as denoted by Equation (2) - after the vector quantization and merged clusters, pixels with the same color have two or more clusters, affected by GLA global distortion. For merging close clusters with minimum distance between preset thresholds for two centroids, an agglomerative clustering algorithm is performed on ci (Duda and Hart, 1970), as the quantization parameter needed for spatial distribution.
After clustering merging for color quantization, a label is assigned for each quantized color, representing a color class for image pixels quantized to the same color. The image pixel colors are replaced by their corresponding color class labels, creating a class-map.
In order to calculate the J-value, Z is defined as the set of all points of quantized image, then z = (x, y) with z ∈ Z and being m the average in all Z elements. C is the number of classes obtained in the quantization. Then Z is classified into C classes, Zi are the elements of Z belonging to class i, where i=1,...,C, and mi are the element averages in Zi.
The J-value is as follows:
The parameter ST represents the sum of quantized image points within the average in all Z elements. Thereby, the relation between SB and SW, denotes the measures of distances of this class relation, for arbitrary nonlinear class distributions. J for higher values indicates an increasing distance between the classes and points for each other, considering images with homogeneous color regions. The distance and consequently, the J value, decrease for images with uniformly color classes.
Each segmented region could be recalculated, instead of the entire class-map, with new parameters adjustment for average. JK represents J calculated over region k, Mk is the number of points in region k, N is the total number of points in the class-map, with all regions in class-map summation.
For a fixed number of regions, a criterion for is intended for lower values.
2.2. Spatial segmentation technique
The global minimization of is not practical, if not applied to a local area of the class-map. Therefore, the idea of J-image is the generation of a gray-scale image whose pixel values are the J values calculated over local windows centered on these pixels. With a higher value for J-image, the pixel should be near region boundaries.
Expected local windows dimensions determines the size of image regions, for intensity and color edges in smaller sizes, and the opposite occurs detecting texture boundaries.
Using a region-growing method to segment the image, this one is considered initially as one single region. The algorithm for spatial segmentation starts segment all the regions in the image at an initial large scale until the minimum specified scale is reached. This final scale is settled manually for the appropriate image size. The initial scale 1 corresponds to 64x64 image size, scale 2 to 128x128 image size, scale 3 to 256x256 image size, with due proportion for increasing scales and the double image size.
Below, the spatial segmentation algorithm is structured in flow steps.
3. Image processing (spatial distribution and objects quantification)
The sequential images evince not only the color quantization (spatial distributions forming a map of classes), but also the space segmentation (J-image representing edges and regions of textured side).
Several window sizes are used by J-values: the largest detects the region boundaries by referring to texture parameters; the lowest detects changes in color and/or intensity of light. Each window size is associated with a scale image analysis. The concept of J-image, together with different scales, allows the segmentation of regions by referring to texture parameters.
Regions with the lowest values of J-image are called valleys. The lowest values are applied with a heuristic algorithm. Thus, it is possible to determine the starting point of efficient growth, which depends on the addition of similar valleys. The algorithm ends when there are spare pixels to be added to those regions.
It was observed that the oranges represent the largest number of image pixels, given its characteristics of high contrast with other objects on the scene.
Fig. 3, above, shows three types of scenes in orchards. The first identifies the largest part of the tree. In this category, the quantization threshold was adjusted to higher values for the fusion of regions with same color tone between branches, leaves and ground would be avoided. The second scene denotes the regions' set details in orchards, excluding darker regions. Not only irregularities of each leaf are segmented, as well as abnormalities of color tones in fruit itself, allowing later analysis of disease characteristics. The third category identifies most of the trees, but with higher incidence of top and bottom regions.
4. Artificial Neural Networks (ANN) – MLP customized algorithm
It is fundamental that an ANN-based classification method associated with a statistical pattern recognition be used. Multi-Layer Perceptron (MLP) (Haykin, 1999; Haykin, 2008) is suitable for default ANN topology to be implemented through a customized back-propagation algorithm for complex patterns (Costa and Cesar Junior, 2001).
The most appropriate segment and topology classifications are those using vectors extracted from HSV color space (Hue, Saturation, Value), matching RGB color space (Red, Green, Blue) components. Also, the network with less MSE in the neurons to color space proportion is used to classify the entities.
Derived from back-propagation, the iRPROP algorithm (improved resilient back-propagation) (Lulio, 2010) is both fast and accurate, with easy parameter adjustment. It features an Octave (Eaton, 2006) module which was adopted for the purposes of this work and it is classified with HSV (H – hue, S – saturation, V – value) color space channels histograms of 256 categories (32, 64,128 and 256 neurons in a hidden layer training for each color space channel: H, HS, and HSV). The output layer has three neurons, each of them having a predetermined class.
The charts below (Figures 5, 6, 7, 8) denote the ratio of mean square error (MSE) and amount of times to obtain the best performance index during the validation data towards the training and test sets.
All ANN-based topologies are trained with a threshold lower than 0.0001 mean squared errors (MSE), the synaptic neurons weights are initiated with random values and the other algorithm parameters were set with Fast Artificial Neural Network (FANN) library (Nissen, 2006) for Matlab (Mathworks Inc.) platform, and also its Neural Network toolbox. The most appropriate segment and topology classifications are those using vectors extracted from HSV color space. Also, a network with less MSE in the H-64 was used so as to classify the planting area; for class navigable area (soil), HSV-256 was chosen; as for the class sky, the HS-32.
Figures 9 and 10 denote the regression for target-outputs of ANN classifier, for RGB and HSV classes. The higher the concentration of data at the intersection of bias and Y = T (equal to the output sampling period), the lower the linear regression of data is classified, based on confusion matrices for each set of dimensions.
The response times are given for combinations of training, testing, validation and all data sets.
5. Statistical pattern recognition
Statistical methods are employed as a combination of results with ANN, showing how accuracy in non-linear features vectors can be best applied in a MLP algorithm with a statistical improvement, which processing speed is essentially important, for pattern classification. Bayes Theorem and Naive Bayes (Comaniciu and Meer, 1997) both use a technique for iterations inspection, namely MCA (Main Component Analysis), which uses a linear transformation that minimizes co-variance while it maximizes variance. Features found through this transformation are totally uncorrelated, so the redundancy between them is avoided. Thus, the components (features) represent the key information contained in data, reducing the number of dimensions. Therefore, RGB space color is used to compare the total number of dimensions in feature vectors with HSV. With a smaller dimension of iterations, HSV is chosen as the default space color in most applications (Grasso and Recce, 1996).
Bayes Theorem introduces a modified mathematical equation for the Probability Density Function (PDF), which estimates the training set in a conditional statistics. Equation (8) denotes the solution for p(Ci|y) relating the PDF to conditional class i (classes in natural scene), and y is a n-dimensional feature vector. Naive Bayes implies independence for vector features, what means that each class assumes the conditional parameter for the PDF, following Equation (9) (Morimoto et al, 2000).
In Fig. 11, for the location of fruits in the RGB case, the discrimination of the classes fruit, sky and leaves, twigs and branches, attends constant amounts proportional to the increasing of the training sets. This amount, for HSV case, is reduced for the fruit class, as the dispersion of pixels is greater in this color space. In Fig. 12, in the RGB case, the best results were obtained using Bayes classifier, having smaller ratio estimation in relation to the number of components analyzed. In this color space, the estimation in the recognition of objects related to the fruits is given by the PDF of each dimension, correcting the current values by the hope of each area not matched to the respective class.
Also in Fig. 12, the recognition of the fruit to the HSV case presents balance in the results of the two classifiers, but with a compensation of the success rate, for lower margins of the estimation ratio to the Bayes classifier. This allows the correction of the next results by priori estimation approximating, in the PDF of each dimension.
It can be seen that, the ratio of the estimation must be lesser for the increasing of the dimensions number and its subsequent classification, in all cases.
6. Objects quantification (post-processing)
The classes maps are processed, as the representation by the area filling (floodfill) brings only solid regions which are quantified. Initially, a conversion is performed on gray level image in order to threshold regions that are outlined. Then, to determine the labels of the elements connected, it is necessary to exclude objects which are greater than 200 to 300 pixels, depending on the focal length. Thus, it is necessary to identify each element smaller than this threshold, and calculate the properties of these objects, such as area, centroid, and the boundary region. As a result, the objects that present areas near the circular geometry will be labelled and quantified as fruits.
To determine the metrics and the definition of objects of orange crop, the graph-based segmentation (Gonzalez and Woods, 2007) was applied. This technique provides the adjacency relation between the binary values of the pixels, and their respective positions, highlighting the local geometric properties of the image.
In first case, areas corresponding to small regions, as fruits partially hidden (oranges) with equivalent texture and color properties to leaves are excluded. Then, estimated elements are fully grouped, when overlap the representative segments, which denote an orange fruit. Lastly, the grouping is applied for regions which detect two or more representative segments, denoting another orange fruit.
As the best classification results, related to second approach were through Bayes in HSV color space, only the maps of class from these classifiers will be presented to localization and quantification of objects, compared to RGB case.
Then, for the RGB and HSV cases are presented, through Figures 13 to 21, the images in their respective maps of class, the pre-processing for thresholding with areas smaller than 100 and greater than 300, the geometric approximation metrics for the detection of circular objects, the boundary regions with the centroid of each object, and finally the label associated to the fruit.
This chapter presented merging techniques for segmentation and statistical classification of agricultural orange crops scenes, running multiple segmentation tests with JSEG algorithm possible. As the data provided evince, this generated algorithms fulfills the expectations as far as segmenting is concerned, so that it sorts the appropriate classes (fruits; leaves and branches; sky). As a result, a modular strategy with Bayes statistical theorem can be an option for the classification of segments applied with cognitive approach.