Open access peer-reviewed chapter

Image Segmentation Based on Mathematical Morphological Operator

By Jianjun Chen, Haijian Shao and Chunlong Hu

Submitted: May 28th 2017Reviewed: November 21st 2017Published: December 20th 2017

DOI: 10.5772/intechopen.72603

Downloaded: 591


Image segmentation is the process of partitioning a digital image into multiple regions (sets of pixels); the pixels in each region have similar attributes. It is often used to separate an image into regions in terms of surfaces, objects, and scenes, especially for object location and boundary extraction. Until now, many general-purpose algorithms and techniques have been proposed for image segmentation. Typical and traditional methods are: (1) threshold-based method; (2) edge-based method; and (3) region-based method. In this chapter, we propose an approach of image segmentation based on mathematical morphology operator: toggle operator. The experimental result shows that the proposed method can segment natural scene images into homogeneous regions effectively.


  • homogeneous region
  • image segmentation
  • mathematical morphology
  • toggle operator

1. Introduction

Image segmentation is typically used to partition an image into meaningful parts. Thus, it has a significant application in image analysis and understanding. The result of image segmentation is a set of regions (each region is a set of pixels) that collectively cover the entire image, or a set of contours (i.e., boundaries, consisting of lines, curves, etc.) extracted from the image. The pixels in one region have similar characteristics in terms of color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s) [1, 2].

Until now, a great variety of algorithms have been proposed for image segmentation [3, 4]. These methods are generally classified into three categories: thresholding segmentation, edge-based segmentation, and region-based segmentation. Each method has its own advantages and disadvantages; there is no single method that can be applied effectively for segmenting all kinds of images. This is because different images have different features and properties. Therefore, for different images, the segmentation requires different techniques. Images can be divided into the following categories according to their attributes and characteristics. From the color point of view, images include grayscale images, binary images, and color images; from the texture point of view, images include texture images and nontexture images. Based on the image features, the following subsections will consider some proposed classical approaches in more detail.

1.1. Threshold-based segmentation

Thresholding segmentation is a pixel-based method for image segmentation. It is the simplest method based on the variation of intensity between the object pixels and background pixels. Therefore, it is often used to separate out regions of an image corresponding to objects that we are interested in.

In order to differentiate the pixels that are located in the region of interest from the rest, a comparison is performed for each pixel intensity value with respect to a threshold. In this method, pixels are divided into two classes that are typically named “foreground” and “background.” Pixels with values less than threshold are placed in one class, and the rest are placed in the other class. Therefore, this method is often used to convert a grayscale image into a binary image.


where Igrayis the grayscale image, Ibwis the binary image, xyis the coordinate of target pixel, and Tis the threshold value. This method is most effective for images with high levels of contrast. However, the key of this method is to select the well-suited threshold value.

Many researchers contribute to the work for automated selecting the threshold Tbased on the computer. Their thresholding methods can be categorized into the following groups based on the information the algorithm manipulates [59]:

  1. Histogram shape-based methods: the peaks, valleys, and curvatures of the smoothed histogram are analyzed.

    1. Double-peak threshold method: suppose the histogram of the image is a bimodal distribution (regions with uniform intensity give rise to strong peaks in the histogram), then the value of valley point can be chosen as the threshold.

    2. Minimum variance method: suppose that a region has relatively homogeneous gray values, it will make sense to select a threshold that minimizes the variance of the gray values within regions.

    3. Maximum variance method (usually called the Otsu method [10]): chooses a good threshold value by maximizing the variance between objects and background. The histogram can be divided into two classes, while the interclasses’ variance is maximized.

  2. Clustering method: representing the image as a set of clusters, an ideal threshold value is determined by iteratively reducing the clusters until there are two clusters left. The two remaining clusters are the background and the foreground (object).

  3. Maximum-entropy-based method: determines an ideal threshold value by maximizing entropy for the probability distribution of the foreground and background regions, which are represented in the histogram.

  4. Local thresholding method: chooses the threshold value for each pixel according to the local image characteristics. In this method, a different threshold value is selected for each pixel in the image.

  5. Watershed method: considers the gradient magnitude of an image as a topographic surface where high gradient denotes peaks, while low gradient denotes valleys. Start by filling every isolated valley with different colored water. As the water rises, water from different valleys will start to merge. To avoid that, barriers are built in the locations where water merges. Continue the work of filling water and building barriers until all the peaks are under water. Then the created barriers give the segmentation result.

Thresholding methods are applied to segment, not only grayscale images but also color images. For color images, one approach is to determine a separate threshold for each color channel of the image and then combine them with an AND operation. Segmentation based on color information (e.g., RGB, HSL, and HSV color models) may be more accurate than grayscale images [11, 12].

1.2. Edge-based segmentation

An edge is the boundary between two regions with different properties; it represents the change from one object or surface to another. Edges are used to characterize the physical extent of objects, since there is often a sharp adjustment in intensity at the region boundaries. Thus, detection of edges is a very important step toward image feature understanding; it is often used to divide images into areas corresponding to different objects.

The main idea of most edge-detection techniques is the computation of a local derivative of an image, including first- and second-order derivatives. The first-order derivative of choice in image processing is the gradient; it can be used to detect the presence of an edge in an image. Second-order derivatives in image processing generally are computed using the Laplacian. The sign of the second derivative can be used to determine whether an edge pixel is on the dark or light side of an edge [1317]. Gradient operator and Laplacian operator are defined as follows:

(1) Gradient operators: the gradient of an image, fxy, at location xyis defined as the vector.


The magnitude of this vector is computed from.


To simplify this computation, this quantity is approximated by using the following expression:


The direction angle of gradient vector is the maximum rate of change of fat coordinates xy:


For a discrete image, the gradient can be calculated by the following expressions:




Edges can be detected with the help of first-order derivative type operators, as follows:

  1. Sobel edge detector: using convolutions with row and column edge gradient masks, it is suitable to detect edges along the horizontal and vertical axis.

  2. Robert detection: calculates the square root of the magnitude squared of the convolution with the row and column edge detectors; it is able to detect edges that run along the vertical axes of 45° and 135°.

  3. Prewitt operator: detects edges by calculating the Prewitt compass gradient filters that return the result for the largest filter response.

  4. Kirsch edge detector: performs convolution using eight filters that are applied to calculate gradient.

  5. Frei-Chen edge operator: uses only the row and column filters.

The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image. However, these gradient operators tend to be sensitive to noise.

(2) Laplacian operator: the Laplacian of an image fxyis defined as


where 2fxyx2=fx+1y+fx1y2fxy, 2fxyy2=fxy+1+fxy12fxy, so that


The Laplacian is seldom used directly for edge detection because, as a second-order derivative, it is unacceptably sensitive to noise, its magnitude produces double edges, and it is unable to detect edge direction. In order to improve the effectiveness of edge detection, the following algorithms are proposed:

  1. Laplacian of Gaussian (LoG): Gaussian filtering is combined with Laplacian to break down the image where the intensity varies, to detect the edges effectively [16]. However, this operator cannot find the orientation of edge because of using the Laplacian filter [17].

  2. Canny edge detector (colored edge detectors): uses a multistage algorithm to detect a wide range of edges in images. First, Gaussian convolution filter is applied to smooth the images in order to reduce the noise. Second, a first-derivative operator (e.g., Sobel, Robert algorithm) is applied to the smoothed image in order to output a gradient magnitude image. Third, a process of nonmaximum suppression is applied to get rid of spurious response to edge detection in order to give a thin line in the output. Fourth, a double threshold is applied to determine potential edges. Last, tracking edges by hysteresis helps to finalize the detection by suppressing all the other edges. Canny edge detector performance is good; the only drawback is that it takes more time to compute and it is more complex [18, 19].

1.3. Region-based segmentation

An edge-based technique may attempt to find the object boundaries and then locate the object itself by filling them in; a region-based technique takes the opposite approach.

Region-based segmentation algorithms operate iteratively by grouping together neighboring pixels that have similar properties (such as gray level, texture, color, shape) and splitting groups of pixels that are dissimilar in value [20, 21]. There are a variety of approaches of region-based segmentation. These methods can be classified into two categories:

(1) Region growing method: This is the simplest region-based segmentation method. It is also classified as a pixel-based segmentation method since it involves the selection of initial seed points.

This approach to segmentation groups of pixels or subregions into larger regions is based on predefined criteria. First, a set of seed points is selected based on some user criterion (e.g., pixels in a certain grayscale range). Second, the regions are grown from these seed pixels to neighboring pixels, which are examined to ascertain if they should be added to the region, according to a region membership criterion (e.g., pixel intensity, grayscale texture, or color). Third, the second process is iterated on, in the same manner as general data clustering algorithms.

Region-growing-based techniques are better than the edge-based techniques in noisy images where edges are difficult to detect. However, it is computationally expensive.

(2) Split-and-merge method: This method consists of two steps: region splitting and region merging.

Region splitting starts with the whole image as a single region and subdivides it into subsidiary regions recursively while a condition of homogeneity is not satisfied.

Region merging is the opposite of region splitting and works as a way of avoiding over-segmentation. It starts with small regions and merges the regions that have similar characteristics (such as gray level, variance).

There are a great number of various approaches [2226] excepting for the methods described above, also including the improvements of above methods [2737]; for example, matching-based segmentation, clustering-based segmentation, fuzzy-inference-based segmentation, generalized PCA (principal component analysis)-based segmentation. Each segmentation method has its advantages and disadvantages. A universal algorithm of segmentation does not exist, as each type of image corresponds to a specific approach. Therefore, choice of technique depends on peculiar characteristic of individual problems. The emphasis of this paper lies on an improved method of scene image segmentation based on mathematical morphological operator-toggle operator.

This chapter is organized as follows: Section 1 presents an overview of methodologies and algorithms for image segmentation. A new proposed image segmentation method is then introduced in Section 2. In Section 3, the experimental results are analyzed to prove the validity of the proposed method. Finally, the chapter concludes in Section 4.

2. Scene image segmentation based on mathematical morphology

Signs and public notices are ubiquitous indoors and outdoors, and they are often used for route finding, finding public places and other locations. The texts in natural scene images contain important information. Therefore, text detection has attracted wide interest due to its usefulness in a variety of real-world applications, such as robots navigation, assisting visually impaired people, tourist navigation, enhancing safe vehicle driving, and so on [38, 39]. To date, a great number of algorithms have been proposed for detecting text on scene images or video [4049]. However, most approaches proposed in the past research contribute to detect the text regions by analyzing the entire image. The image is segmented into text regions and non-text regions according to their features, respectively. The performance of these methods relies on the text detection algorithm and image complexity. Actually, scene text is usually presented on signboards. Because of the uniform color for the background of signboard, the ideal way for extracting text from scene images is to cut out the signboard regions first, and then detect text from the signboard regions. Thus, this chapter aims to propose an algorithm for segmenting a natural scene image into homogeneous regions. In our method, we first perform the image segmentation in order to detect homogeneous regions. Signboard regions are then detected with a simple criterion in order to remove the noise, such as trees and other non-signboard areas. In the following subsections, the proposed method is described and discussed in detail.

2.1. Image smoothing preprocessing

A natural scene image, Irgb, is supposed to be a bitmap image based on the RGB (red-green-blue) color model. A smoothing process, edge preserving smoothing filter (EPSF) [20], is first applied to Irgb.

The EPSF is applied independently to every pixel using different coefficients as shown in the following convolution mask:


where ci(i=1,2,3,,8) are calculated using the following equation:


where di(i=1,2,3,,8) are the Manhattan color distances, which are extracted between the target pixel and the 8 neighboring pixels in a 3 ×3 window. That is,


where IR0, IG0, IB0are the RGB color values of the target pixel, and IRi, IGi, IBiare the RGB color values of the ith neighboring pixel.

The filtering of the image is achieved by applying the convolution mask, Eq. (10), on each of the three color channels. Factor p in Eq. (11) scales exponentially the color differences; it controls the amount of blurring performed on the image. A fixed value p = 13 is used for all of our experiments because this results in very good performance. The target pixel of the convolution mask is set to zero to remove impulsive noise.

Finally, a smoothed image, IEPSF, is obtained. IEPSFis then converted to a grayscale image Igraythrough the following equation:


where xyis the coordinate of target pixel. IEPSFRxy, IEPSFGxy, and IEPSFBxyare the intensities for red, green, and blue, respectively, of the smoothed image IEPSF.

2.2. Homogeneous region segmentation

A measure of region homogeneity is variance (i.e., regions with high homogeneity will have low variance). In this section, a mathematical morphological operator, Toggle Mapping (TM) [34], is introduced to segment a grayscale image into homogenous regions according to the pixel intensity. This is a simple way to segment a grayscale image into homogeneous regions based on a toggle operator. Such operator is defined as follows:


where His a binary image taking two values, this a threshold value, and Dand Eare the dilation image and erosion image of input image, respectively.

In order to meet the needs of the application, Dorini [35] and Fabrizio [36] have modified and improved this operator by adding new factors or weight coefficients. In their algorithms, the toggle operator is used one time for segmenting an image, so the values of thresholds and coefficients are fixed in their algorithms. However, for different images, the optimal values for thresholds and coefficients should be different. In order to overcome over-segmentation or under-segmentation, we propose a new algorithm for grayscale image segmentation. In our method, the toggle operator is applied iteratively on input image, and the value of threshold is changed in each iteration step. This is because, while applying Eq. (14) on a grayscale image, in the output image, the area of connected component will increase with the increase of threshold value. Figure 1 shows an example for the increment of threshold value. Based on such feature, we propose an approach that tries to search for homogeneous regions by calculating the standard deviation of intensity for connected components. The detail procedure of our proposed algorithm is described in the following steps.

  1. Step 1: Initializing φ, THSD=20, ThRatio=0.99, i1, thi, and applying Eq. (14) on Igraythen gets a set of connected components CC1=C11C21Cn11, where n1is the number of elements of CC1.

  2. Step 2: Updating ii+1, thi, and performing Eq. (14) on Igraythen gets a set of connected components CCi=C1iC2iCnii, where niis the number of elements of CCi. Setting j1.

  3. Step 3: For connected component Cji, where CjiCCi, calculating its standard deviation of intensity SD_Cji. If SD_Cji>THSD, go to Step 4; else go to Step 5.

  4. Step 4: Supposing CjiCj1i1Cj2i1Cjmi1, it means that Cj1i1,Cj2i1,,Cjmi1are covered by Cji, where Cj1i1Cj2i1Cjmi1is a subset of CCi1. Updating UUCj1i1Cj2i1Cjmi1and CCiCCiCji.

  5. Step 5: Updating jj+1. If jni, go to Step 3; else go to Step 6.

  6. Step 6: Calculating the number of pixels NP_CCi1for CCi1and NP_CCifor P_CCi, respectively. If minNP_CCi1NP_CCimaxNP_CCi1NP_CCi>ThRatio, go to Step 7; else go to Step 2.

  7. Step 7: Terminating. Finally, Uis a set of homogeneous regions.

Figure 1.

An example for result of Toggle Mapping (a) th = i−1, (b) th = i, (c) th = i + 1.

As shown in Figure 2, the natural scene image can be segmented into homogeneous regions. The result showed that our proposed method can work effectively with high accuracy.

Figure 2.

An example for natural scene image segmentation (a) Original Image, (b) Grayscale Image, (c) Labeling Image for Homogeneous Regions.

3. Experiment and results

3.1. Experimental images

In our experiment, 500 natural scene images are captured with various signboards, shop names, traffic signs, and more. All the original images are saved in RGB24 bitmap format with a size of 1000×1500pixels. In order to provide a wide range of real-life scenarios, images are captured with different compact digital cameras at different angles, positions, and under variable lighting and weather conditions. Figure 3 shows some examples used in this experiment. Table 1 shows the experimental environment.

Figure 3.

Examples of scene images.

OSMicrosoft Windows 10 Enterprise
CPUIntel(R) Xeon(R) E5–2620 v4 2.10GHz (dual processor)
Programming LanguageMATLAB

Table 1.

Experimental environment.

3.2. Evaluation of image segmentation

In this subsection, our proposed method is compared to the methods of watershed segmentation using gradient [1], Canny edge detection [18], and region growing [1]. In order to evaluate the accuracy of our proposed method. There are many parameters included in not only our method but also the other three methods. Therefore, 100 images, selected from the total 500 images, are used for training and deciding the value of parameters based on the grid search method. The remaining 400 images that differ from the 100 training images are used to do the experiment in order to evaluate the accurancy of segmentation.

The purpose of our research is to support visually impaired people to access the scene text. This paper aims to segment the natural scene images into homogenous regions. This is because, after the segmentation, specified criteria can be applied to select the signboard regions and the text can then be extracted from these regions. Therefore, in the experiment, we only focus on the accuracy of signboard segmentation.

The result of segmentation relies on not only segmentation algorithms but also the quality of the images. From the observation, the signboard regions can be segmented into four categories: PERFECT, FRAGMENT, EXCALATION, and FRAGMENT and EXCALATION.

PERFECT: a signboard has been segmented correctly, as shown in Figure 4(a).

Figure 4.

Cases of signboard segmentation (a) Complete Signboard, (b) Fragmentary Signboard, (c) Partial Signboard, (d) Partially & Fragmentary.

FRAGMENT: a signboard is segmented out in fragments, as shown in Figure 4(b), where the extracted results are part of one signboard.

EXCALATION: one part of the signboard is extracted, but the others are lost, as shown in Figure 4(c), where the extracted region is part of one signboard.

EXCALATION and FRAGMENT: one part of a signboard is segmented into fragments, but the other part is lost, as shown in Figure 4(d).

In the experiment, PERFECT and FRAGMENT are evaluated as correct results, EXCALATION and EXCALATION and FRAGMENT are evaluated as incorrect results.

There are 482 target signboards in 400 experimental images. After the experiment, the accuracy of signboard segmentation and the average image processing time are calculated, respectively. The results are shown in Table 2.

MethodsAccuracy (%)Average execution time (s)
Our method94.53.21
Watershed segmentation83.62.82
Canny edge detector91.40.73
Region growing92.24.08

Table 2.

Segmentation accuracy and average processing time.

As shown in Table 2, the average processing time of our proposed method is not so short. This is because our algorithm iteratively applies the toggle operator to segment image and find homogeneous regions. So, it is time-consuming. For the region-growing method, it first searches the seeds in an image and then performs the growing processing. This is also time-consuming.

Figures 58 are the segmentation results of Figure 3, by applying our menthod, watershed segmentation, Canny edge detection, and region-growing method, respectively. From the observation, our proposed algorithm can segment an image into homogeneous regions effectively, and some results are better than those applying the Canny edge detector and the region-growing method, because of the Canny edge operator not always detecting the closed boundary of object and the result of region-growing method deeply depending on the initial seeds selection.

Figure 5.

Result of our method for segmenting Figure 3 (a) Homogeneous Regions, (b) Homogeneous Regions, (c) Homogeneous Regions, (d) Homogeneous Regions, (e) Homogeneous Regions, (f) Homogeneous Regions.

Figure 6.

Watershed segmentation result of Figure 3 (a) Homogeneous Regions, (b) Homogeneous Regions, (c) Homogeneous Regions, (d) Homogeneous Regions, (e) Homogeneous Regions, (f) Homogeneous Regions.

Figure 7.

Canny edge detector for segmentation of Figure 3 (a) Homogeneous Regions, (b) Homogeneous Regions, (c) Homogeneous Regions, (d) Homogeneous Regions, (e) Homogeneous Regions, (f) Homogeneous Regions.

Figure 8.

Region growing for segmentation of Figure 3 (a) Homogeneous Regions, (b) Homogeneous Regions, (c) Homogeneous Regions, (d) Homogeneous Regions, (e) Homogeneous Regions, (f) Homogeneous Regions.

Figure 9.

Examples of scene images.

Each method can achieve a high accuracy value if the quality of images is very good. But if the images include much noise, the accuracy of segmentation is very low any method. The signboard regions cannot be segmented completely due to the following reasons: (1) the surface of signboard is corroded, for example, Figure 9(a); (2) shadow exists on the signboard, for example, Figure 9(b); and (3) reflective effect, for example, Figure 9(c).

4. Summary

This chapter proposes a method of mathematical morphology-based natural scene image segmentation. First, a number of typical segmentation algorithms are reviewed and discussed, and the objective of this chapter was introduced. Second, our proposed method was described. Third, the experiment was done and discussed.

The proposed method was tested on different images, and the results showed that our method can be an effective way for scene images segmentation. However, the results indicated that the signboard regions were extracted with low accuracy due to the presence of shadows or even corroded signboards.

In order to improve the accuracy of segmentation results, in the near future, we will introduce techniques for removing shadows and reflections in images.


This work is supported by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 17KJB520007, 17KJB470002); Doctoral Research Foundation of Jiangsu University of Science and Technology, China (Grant No. 1624821607–9); and Natural Science Foundation of Jiangsu Province, China (Grant No. BK20150471).

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Jianjun Chen, Haijian Shao and Chunlong Hu (December 20th 2017). Image Segmentation Based on Mathematical Morphological Operator, Colorimetry and Image Processing, Carlos M. Travieso-Gonzalez, IntechOpen, DOI: 10.5772/intechopen.72603. Available from:

chapter statistics

591total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Color Reconstruction and Resolution Enhancement Using Super-Resolution

By Eduardo Quevedo Gutiérrez and Gustavo Marrero Callicó

Related Book

First chapter

Motion Tracking System in Surgical Training

By Shazrinizam Shaharan, Donncha M Ryan and Paul C Neary

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us