Image segmentation is the process of partitioning a digital image into multiple regions (sets of pixels); the pixels in each region have similar attributes. It is often used to separate an image into regions in terms of surfaces, objects, and scenes, especially for object location and boundary extraction. Until now, many general-purpose algorithms and techniques have been proposed for image segmentation. Typical and traditional methods are: (1) threshold-based method; (2) edge-based method; and (3) region-based method. In this chapter, we propose an approach of image segmentation based on mathematical morphology operator: toggle operator. The experimental result shows that the proposed method can segment natural scene images into homogeneous regions effectively.
- homogeneous region
- image segmentation
- mathematical morphology
- toggle operator
Image segmentation is typically used to partition an image into meaningful parts. Thus, it has a significant application in image analysis and understanding. The result of image segmentation is a set of regions (each region is a set of pixels) that collectively cover the entire image, or a set of contours (i.e., boundaries, consisting of lines, curves, etc.) extracted from the image. The pixels in one region have similar characteristics in terms of color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s) [1, 2].
Until now, a great variety of algorithms have been proposed for image segmentation [3, 4]. These methods are generally classified into three categories: thresholding segmentation, edge-based segmentation, and region-based segmentation. Each method has its own advantages and disadvantages; there is no single method that can be applied effectively for segmenting all kinds of images. This is because different images have different features and properties. Therefore, for different images, the segmentation requires different techniques. Images can be divided into the following categories according to their attributes and characteristics. From the color point of view, images include grayscale images, binary images, and color images; from the texture point of view, images include texture images and nontexture images. Based on the image features, the following subsections will consider some proposed classical approaches in more detail.
1.1. Threshold-based segmentation
Thresholding segmentation is a pixel-based method for image segmentation. It is the simplest method based on the variation of intensity between the object pixels and background pixels. Therefore, it is often used to separate out regions of an image corresponding to objects that we are interested in.
In order to differentiate the pixels that are located in the region of interest from the rest, a comparison is performed for each pixel intensity value with respect to a threshold. In this method, pixels are divided into two classes that are typically named “foreground” and “background.” Pixels with values less than threshold are placed in one class, and the rest are placed in the other class. Therefore, this method is often used to convert a grayscale image into a binary image.
where is the grayscale image, is the binary image, is the coordinate of target pixel, and is the threshold value. This method is most effective for images with high levels of contrast. However, the key of this method is to select the well-suited threshold value.
Many researchers contribute to the work for automated selecting the threshold based on the computer. Their thresholding methods can be categorized into the following groups based on the information the algorithm manipulates [5–9]:
Histogram shape-based methods: the peaks, valleys, and curvatures of the smoothed histogram are analyzed.
Double-peak threshold method: suppose the histogram of the image is a bimodal distribution (regions with uniform intensity give rise to strong peaks in the histogram), then the value of valley point can be chosen as the threshold.
Minimum variance method: suppose that a region has relatively homogeneous gray values, it will make sense to select a threshold that minimizes the variance of the gray values within regions.
Maximum variance method (usually called the Otsu method ): chooses a good threshold value by maximizing the variance between objects and background. The histogram can be divided into two classes, while the interclasses’ variance is maximized.
Clustering method: representing the image as a set of clusters, an ideal threshold value is determined by iteratively reducing the clusters until there are two clusters left. The two remaining clusters are the background and the foreground (object).
Maximum-entropy-based method: determines an ideal threshold value by maximizing entropy for the probability distribution of the foreground and background regions, which are represented in the histogram.
Local thresholding method: chooses the threshold value for each pixel according to the local image characteristics. In this method, a different threshold value is selected for each pixel in the image.
Watershed method: considers the gradient magnitude of an image as a topographic surface where high gradient denotes peaks, while low gradient denotes valleys. Start by filling every isolated valley with different colored water. As the water rises, water from different valleys will start to merge. To avoid that, barriers are built in the locations where water merges. Continue the work of filling water and building barriers until all the peaks are under water. Then the created barriers give the segmentation result.
Thresholding methods are applied to segment, not only grayscale images but also color images. For color images, one approach is to determine a separate threshold for each color channel of the image and then combine them with an AND operation. Segmentation based on color information (e.g., RGB, HSL, and HSV color models) may be more accurate than grayscale images [11, 12].
1.2. Edge-based segmentation
An edge is the boundary between two regions with different properties; it represents the change from one object or surface to another. Edges are used to characterize the physical extent of objects, since there is often a sharp adjustment in intensity at the region boundaries. Thus, detection of edges is a very important step toward image feature understanding; it is often used to divide images into areas corresponding to different objects.
The main idea of most edge-detection techniques is the computation of a local derivative of an image, including first- and second-order derivatives. The first-order derivative of choice in image processing is the gradient; it can be used to detect the presence of an edge in an image. Second-order derivatives in image processing generally are computed using the Laplacian. The sign of the second derivative can be used to determine whether an edge pixel is on the dark or light side of an edge [13–17]. Gradient operator and Laplacian operator are defined as follows:
(1) Gradient operators: the gradient of an image, , at location is defined as the vector.
The magnitude of this vector is computed from.
To simplify this computation, this quantity is approximated by using the following expression:
The direction angle of gradient vector is the maximum rate of change of at coordinates :
For a discrete image, the gradient can be calculated by the following expressions:
Edges can be detected with the help of first-order derivative type operators, as follows:
Sobel edge detector: using convolutions with row and column edge gradient masks, it is suitable to detect edges along the horizontal and vertical axis.
Robert detection: calculates the square root of the magnitude squared of the convolution with the row and column edge detectors; it is able to detect edges that run along the vertical axes of 45° and 135°.
Prewitt operator: detects edges by calculating the Prewitt compass gradient filters that return the result for the largest filter response.
Kirsch edge detector: performs convolution using eight filters that are applied to calculate gradient.
Frei-Chen edge operator: uses only the row and column filters.
The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image. However, these gradient operators tend to be sensitive to noise.
(2) Laplacian operator: the Laplacian of an image is defined as
where , , so that
The Laplacian is seldom used directly for edge detection because, as a second-order derivative, it is unacceptably sensitive to noise, its magnitude produces double edges, and it is unable to detect edge direction. In order to improve the effectiveness of edge detection, the following algorithms are proposed:
Laplacian of Gaussian (LoG): Gaussian filtering is combined with Laplacian to break down the image where the intensity varies, to detect the edges effectively . However, this operator cannot find the orientation of edge because of using the Laplacian filter .
Canny edge detector (colored edge detectors): uses a multistage algorithm to detect a wide range of edges in images. First, Gaussian convolution filter is applied to smooth the images in order to reduce the noise. Second, a first-derivative operator (e.g., Sobel, Robert algorithm) is applied to the smoothed image in order to output a gradient magnitude image. Third, a process of nonmaximum suppression is applied to get rid of spurious response to edge detection in order to give a thin line in the output. Fourth, a double threshold is applied to determine potential edges. Last, tracking edges by hysteresis helps to finalize the detection by suppressing all the other edges. Canny edge detector performance is good; the only drawback is that it takes more time to compute and it is more complex [18, 19].
1.3. Region-based segmentation
An edge-based technique may attempt to find the object boundaries and then locate the object itself by filling them in; a region-based technique takes the opposite approach.
Region-based segmentation algorithms operate iteratively by grouping together neighboring pixels that have similar properties (such as gray level, texture, color, shape) and splitting groups of pixels that are dissimilar in value [20, 21]. There are a variety of approaches of region-based segmentation. These methods can be classified into two categories:
(1) Region growing method: This is the simplest region-based segmentation method. It is also classified as a pixel-based segmentation method since it involves the selection of initial seed points.
This approach to segmentation groups of pixels or subregions into larger regions is based on predefined criteria. First, a set of seed points is selected based on some user criterion (e.g., pixels in a certain grayscale range). Second, the regions are grown from these seed pixels to neighboring pixels, which are examined to ascertain if they should be added to the region, according to a region membership criterion (e.g., pixel intensity, grayscale texture, or color). Third, the second process is iterated on, in the same manner as general data clustering algorithms.
Region-growing-based techniques are better than the edge-based techniques in noisy images where edges are difficult to detect. However, it is computationally expensive.
(2) Split-and-merge method: This method consists of two steps: region splitting and region merging.
Region splitting starts with the whole image as a single region and subdivides it into subsidiary regions recursively while a condition of homogeneity is not satisfied.
Region merging is the opposite of region splitting and works as a way of avoiding over-segmentation. It starts with small regions and merges the regions that have similar characteristics (such as gray level, variance).
There are a great number of various approaches [22–26] excepting for the methods described above, also including the improvements of above methods [27–37]; for example, matching-based segmentation, clustering-based segmentation, fuzzy-inference-based segmentation, generalized PCA (principal component analysis)-based segmentation. Each segmentation method has its advantages and disadvantages. A universal algorithm of segmentation does not exist, as each type of image corresponds to a specific approach. Therefore, choice of technique depends on peculiar characteristic of individual problems. The emphasis of this paper lies on an improved method of scene image segmentation based on mathematical morphological operator-toggle operator.
This chapter is organized as follows: Section 1 presents an overview of methodologies and algorithms for image segmentation. A new proposed image segmentation method is then introduced in Section 2. In Section 3, the experimental results are analyzed to prove the validity of the proposed method. Finally, the chapter concludes in Section 4.
2. Scene image segmentation based on mathematical morphology
Signs and public notices are ubiquitous indoors and outdoors, and they are often used for route finding, finding public places and other locations. The texts in natural scene images contain important information. Therefore, text detection has attracted wide interest due to its usefulness in a variety of real-world applications, such as robots navigation, assisting visually impaired people, tourist navigation, enhancing safe vehicle driving, and so on [38, 39]. To date, a great number of algorithms have been proposed for detecting text on scene images or video [40–49]. However, most approaches proposed in the past research contribute to detect the text regions by analyzing the entire image. The image is segmented into text regions and non-text regions according to their features, respectively. The performance of these methods relies on the text detection algorithm and image complexity. Actually, scene text is usually presented on signboards. Because of the uniform color for the background of signboard, the ideal way for extracting text from scene images is to cut out the signboard regions first, and then detect text from the signboard regions. Thus, this chapter aims to propose an algorithm for segmenting a natural scene image into homogeneous regions. In our method, we first perform the image segmentation in order to detect homogeneous regions. Signboard regions are then detected with a simple criterion in order to remove the noise, such as trees and other non-signboard areas. In the following subsections, the proposed method is described and discussed in detail.
2.1. Image smoothing preprocessing
A natural scene image, , is supposed to be a bitmap image based on the RGB (red-green-blue) color model. A smoothing process, edge preserving smoothing filter (EPSF) , is first applied to .
The EPSF is applied independently to every pixel using different coefficients as shown in the following convolution mask:
where () are calculated using the following equation:
where () are the Manhattan color distances, which are extracted between the target pixel and the 8 neighboring pixels in a 3 3 window. That is,
where , , are the RGB color values of the target pixel, and , , are the RGB color values of the th neighboring pixel.
The filtering of the image is achieved by applying the convolution mask, Eq. (10), on each of the three color channels. Factor p in Eq. (11) scales exponentially the color differences; it controls the amount of blurring performed on the image. A fixed value p = 13 is used for all of our experiments because this results in very good performance. The target pixel of the convolution mask is set to zero to remove impulsive noise.
Finally, a smoothed image, , is obtained. is then converted to a grayscale image through the following equation:
where is the coordinate of target pixel. , , and are the intensities for red, green, and blue, respectively, of the smoothed image .
2.2. Homogeneous region segmentation
A measure of region homogeneity is variance (i.e., regions with high homogeneity will have low variance). In this section, a mathematical morphological operator, Toggle Mapping (TM) , is introduced to segment a grayscale image into homogenous regions according to the pixel intensity. This is a simple way to segment a grayscale image into homogeneous regions based on a toggle operator. Such operator is defined as follows:
where is a binary image taking two values, is a threshold value, and and are the dilation image and erosion image of input image, respectively.
In order to meet the needs of the application, Dorini  and Fabrizio  have modified and improved this operator by adding new factors or weight coefficients. In their algorithms, the toggle operator is used one time for segmenting an image, so the values of thresholds and coefficients are fixed in their algorithms. However, for different images, the optimal values for thresholds and coefficients should be different. In order to overcome over-segmentation or under-segmentation, we propose a new algorithm for grayscale image segmentation. In our method, the toggle operator is applied iteratively on input image, and the value of threshold is changed in each iteration step. This is because, while applying Eq. (14) on a grayscale image, in the output image, the area of connected component will increase with the increase of threshold value. Figure 1 shows an example for the increment of threshold value. Based on such feature, we propose an approach that tries to search for homogeneous regions by calculating the standard deviation of intensity for connected components. The detail procedure of our proposed algorithm is described in the following steps.
Step 1: Initializing , , , , , and applying Eq. (14) on then gets a set of connected components , where is the number of elements of .
Step 2: Updating , , and performing Eq. (14) on then gets a set of connected components , where is the number of elements of . Setting .
Step 3: For connected component , where , calculating its standard deviation of intensity . If , go to Step 4; else go to Step 5.
Step 4: Supposing , it means that are covered by , where is a subset of . Updating and .
Step 5: Updating . If , go to Step 3; else go to Step 6.
Step 6: Calculating the number of pixels for and for , respectively. If , go to Step 7; else go to Step 2.
Step 7: Terminating. Finally, is a set of homogeneous regions.
As shown in Figure 2, the natural scene image can be segmented into homogeneous regions. The result showed that our proposed method can work effectively with high accuracy.
3. Experiment and results
3.1. Experimental images
In our experiment, 500 natural scene images are captured with various signboards, shop names, traffic signs, and more. All the original images are saved in RGB24 bitmap format with a size of pixels. In order to provide a wide range of real-life scenarios, images are captured with different compact digital cameras at different angles, positions, and under variable lighting and weather conditions. Figure 3 shows some examples used in this experiment. Table 1 shows the experimental environment.
|OS||Microsoft Windows 10 Enterprise|
|CPU||Intel(R) Xeon(R) E5–2620 v4 2.10GHz (dual processor)|
3.2. Evaluation of image segmentation
In this subsection, our proposed method is compared to the methods of watershed segmentation using gradient , Canny edge detection , and region growing . In order to evaluate the accuracy of our proposed method. There are many parameters included in not only our method but also the other three methods. Therefore, 100 images, selected from the total 500 images, are used for training and deciding the value of parameters based on the grid search method. The remaining 400 images that differ from the 100 training images are used to do the experiment in order to evaluate the accurancy of segmentation.
The purpose of our research is to support visually impaired people to access the scene text. This paper aims to segment the natural scene images into homogenous regions. This is because, after the segmentation, specified criteria can be applied to select the signboard regions and the text can then be extracted from these regions. Therefore, in the experiment, we only focus on the accuracy of signboard segmentation.
The result of segmentation relies on not only segmentation algorithms but also the quality of the images. From the observation, the signboard regions can be segmented into four categories: PERFECT, FRAGMENT, EXCALATION, and FRAGMENT and EXCALATION.
PERFECT: a signboard has been segmented correctly, as shown in Figure 4(a).
FRAGMENT: a signboard is segmented out in fragments, as shown in Figure 4(b), where the extracted results are part of one signboard.
EXCALATION: one part of the signboard is extracted, but the others are lost, as shown in Figure 4(c), where the extracted region is part of one signboard.
EXCALATION and FRAGMENT: one part of a signboard is segmented into fragments, but the other part is lost, as shown in Figure 4(d).
In the experiment, PERFECT and FRAGMENT are evaluated as correct results, EXCALATION and EXCALATION and FRAGMENT are evaluated as incorrect results.
There are 482 target signboards in 400 experimental images. After the experiment, the accuracy of signboard segmentation and the average image processing time are calculated, respectively. The results are shown in Table 2.
|Methods||Accuracy (%)||Average execution time (s)|
|Canny edge detector||91.4||0.73|
As shown in Table 2, the average processing time of our proposed method is not so short. This is because our algorithm iteratively applies the toggle operator to segment image and find homogeneous regions. So, it is time-consuming. For the region-growing method, it first searches the seeds in an image and then performs the growing processing. This is also time-consuming.
Figures 5–8 are the segmentation results of Figure 3, by applying our menthod, watershed segmentation, Canny edge detection, and region-growing method, respectively. From the observation, our proposed algorithm can segment an image into homogeneous regions effectively, and some results are better than those applying the Canny edge detector and the region-growing method, because of the Canny edge operator not always detecting the closed boundary of object and the result of region-growing method deeply depending on the initial seeds selection.
Each method can achieve a high accuracy value if the quality of images is very good. But if the images include much noise, the accuracy of segmentation is very low any method. The signboard regions cannot be segmented completely due to the following reasons: (1) the surface of signboard is corroded, for example, Figure 9(a); (2) shadow exists on the signboard, for example, Figure 9(b); and (3) reflective effect, for example, Figure 9(c).
This chapter proposes a method of mathematical morphology-based natural scene image segmentation. First, a number of typical segmentation algorithms are reviewed and discussed, and the objective of this chapter was introduced. Second, our proposed method was described. Third, the experiment was done and discussed.
The proposed method was tested on different images, and the results showed that our method can be an effective way for scene images segmentation. However, the results indicated that the signboard regions were extracted with low accuracy due to the presence of shadows or even corroded signboards.
In order to improve the accuracy of segmentation results, in the near future, we will introduce techniques for removing shadows and reflections in images.
This work is supported by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 17KJB520007, 17KJB470002); Doctoral Research Foundation of Jiangsu University of Science and Technology, China (Grant No. 1624821607–9); and Natural Science Foundation of Jiangsu Province, China (Grant No. BK20150471).