The labels and examples of the test database.
With the growth in the number of color images, developing an efficient image retrieval system has received much attention in recent years. The first step to retrieve relevant information from image and video databases is the selection of appropriate feature representations (e.g. color, texture, shape) so that the feature attributes are both consistent in feature space and perceptually close to the user . There are many CBIR systems, which adopt different low level features and similarity measure, have been proposed in the literature [2-5]. In general, perceptually similar images are not necessarily similar in terms of low-level features . Hence, these content-based systems capture pre-attentive similarity rather than semantic similarity . In order to achieve more efficient CBIR system, active researches are currently focused on the two complemented approaches: region-based approach [4, 8-10] and relevance feedback [6, 11-13].
Typically, the region-based approaches segment each image into several regions with homogenous visual prosperities, and enable users to rate the relevant regions for constructing a new query. In general, an incorrect segmentation may result in inaccurate representation. However, automatically extracting image objects is still a challengeing issue, especially for a database containing a collection of heterogeneous images. For example, Jing et al.  integrate several effective relevance feedback algorithms into a region-based image retrieval system, which incorporates the properties of all the segmented regions to perform many-to-many relationships of regional similarity measure. However, some semantic information will be disregarded without considering similar regions in the same image. In another study , Vu et al. proposed a region-of-interest (ROI) technique which is a sampling-based approach called SamMatch for matching framework. This method can prevent incorrectly detecting the visual features.
On the other hand, the mechanism of relevance feedback is an online-learning technique that can capture the inherent subjectivity of user’s perception during a retrieval session. In Power Tool , the user is allowed to give the relevance scores to the best matched images, and the system adjusts the weights by putting more emphasis on the specific features. Cox et al.  propose an alternative way to achieve CBIR that predicts the possible image targets by Bayes’ rule rather than provides with segmented regions of the query image. However, the feedback information in  could be ignored if the most likely images and irrelevant images have similar features.
In this Chapter, a novel region-based relevance feedback system is proposed that incorporates several feature vectors. First, unsupervised texture segmentation for natural images is used to partition an image to several homogeneous regions. Then we propose an efficient dominant color descriptor (DCD) to represent the partitioned regions in image. Next, a regional similarity matrix model is introduced to rank the images. In order to attack the possible fails of segmentation and to simplify the user operations, we propose a foreground assumption to separate an image into two parts: foreground and background. The background could be regarded as the irrelevant region that confuses with the query semantics for retrieval. It should be noted that the main objectives of this approach could exclude irrelevant regions (background) from contributing to image-to-image similarity model. Furthermore, the global features extracted from entire image are used to compensate the inaccuracy due to imperfect segmentations. The details will be presented in the following Sections. Experimental results show that our framework improves the accuracy of relevance-feedback retrieval.
The Chapter is organized as follows. Section 2 describes the key observations which explain the basis of our algorithm. In Section 3, we first present a quantization scheme for extracting the representative colors from images, and then introduce a modified similarity measure for DCD. In Section 4, image segmentation and region representation based on our modified dominant color descriptor and local binary pattern are described. Then the image representation and the foreground assumption are explained in Section 5. Our integrated region-based relevance feedback strategies, which consider pseudo query image and relevant images as the relevance information, are introduced in Section 6. Experimental results and discussions of the framework are made in Section 7. Finally, a short conclusion is presented in Section 8.
2. Problem statement
The major goal in region-based relevance feedback for image retrieval is to search perceptually similar images with good accuracy in short response time. For nature image retrieval, conversional region-based relevance feedback systems use multiple features (e.g., color, shape, texture, size) and update weighting scheme. In this context, our algorithm is motivated by the following viewpoints.
Computational cost increases as the selected features increased. However, an algorithm with large number of features does not guarantee an improvement of retrieval performance. In theory, the retrieval performance can be enhanced by choosing more compact feature vectors.
The CBIR systems retrieve similar images according to the user-defined feature vectors . To improve the accuracy, the region-based approaches [14, 15] segment each image into several regions, and then extract the image features, such as the dominant color, texture or shape. However, the correct detection of semantic objects involves many conditions  such as lighting conditions, occlusion and inaccurate segmentation. Since no automatic segmentation algorithm achieves satisfactory performance currently, segmented regions are commonly provided by the user to support the image retrieval. However, semantically correct segmentation is a strict challenge to the user, even some systems provide segmentation tools.
The CBIR technique helps the system to learn how to retrieve the results that users are looking for. Therefore, there is an urgent need to develop a convient technique for region-of-interest analysis.
3. A modified dominant color descriptor
Color is one of the most widely used visual features for retrieving images from common semantic categories . MPEG-7 specifies several color descriptors , such as dominant colors, scalable color histogram, color structure, color layout and GoF/GoP color. The human visual system captures dominant colors in images and eliminates the fine details in small areas . In MPEG-7, DCD provides a compact color representation, and describes the color distribution in an image. The dominant color descriptor in MPEG-7 is defined as
In order to extract the dominant colors from an image, a color quantization algorithm has to be predetermined. A commonly used approach is the modified generalized Lloyd algorithm (GLA) , which is a color quantization algorithm with clusters merging. This method can simplify the large number of colors to a small number of representative colors. However, the GLA has several intrinsic problems associated with the existing algorithm as follows .
It may give different clustering results when the number of clusters is changed.
A correct initialization of the centroid of cluster is a crucial issue because some clusters may be empty if their initial centers lie far from the distribution of data.
The criterion of the GLA depends on the cluster “distance”; therefore, different initial parameters of an image may cause different clustering results.
In general, the conventional clustering algorithms are very time consuming [2, 21-24]. On the other hand, the quadratic-like measure [2, 17, 25] for dominant color descriptor in MPEG7 does not matching human perception very well, and it could cause incorrect ranks for images with similar color distribution [3, 20, 26]. In this Chapter, we adopt the linear block algorithm (LBA)  to extract the representative colors, and measure the perceptual similar dominant colors by the modified similarity measure.
Considering two dominant color features and, the quadratic-like dissimilarity measure between two images and is calculated by:
where is the similarity coefficient between color clusters and, and it is given by
The threshold is the maximum distance used to judge whether two color clusters are similar, and is Euclidean distance between two color clusters and;, notation is a parameter that is set to 2.0 in this work.
The quadratic-like distance measure in Eq. (2) may incorrectly reflect the distance between two images. The improper results are mainly caused by two reasons. 1) If the number of dominant colors in target image increases, it might cause incorrect results. 2) If one dominant color can be found both in target images and query image, a high percentage of the color in target image might cause improper results. In our earlier work , we proposed a modified distance measure that considers not only the similarity of dominant colors but also the difference of color percentages between images. The experimental results show that the measure in  provides better match to human perception in judging image similarity than the MPEG-7 DCD. The modified similarity measure between two images and is calculated by:
whereandare the percentages of the
In Fig. 1, we calculate this example by using the modified measure and quadratic-like measure for comparison. In order to properly reflect similarity coefficient between two color clusters, the parameter is set to 2 and =25 in Eq(3). Since the pair-wised distance between
However, using the quadratic-like dissimilarity measure between the Q and is:
It can be seen that the comparison result of is not consistent with human perception. Whereas, using the dissimilarity measure in , we have
In DCD, the quadratic-like measure results incorrect matches due to the existence of high percentage of the same color in target image. For example, consider the quantized images in Fig. 2. We can see that the percentage of dominant colors of (rose) and (gorilla) are 82.21% and 92.72%, respectively. In human perception, Q is more similar to. However, the quadratic-like similarity measure is. Obviously, the result causes a wrong rank. The robust similarity measure  is more accurate to capture human perception than that of MPEG-7 DCD. In our experiments, the modified DCD achieves 16.7% and 3% average retrieval rate (ARR)  improvements than Ma  and Mojsilovic , respectively. In this Chapter, the modified dominant color descriptor is chosen to support the proposed CBIR system.
4. Image segmentation and region representation
4.1. Image segmentation
It has been mentioned that segmentation is necessary for those region-based image retrieval systems. Nevertheless, automatic segmentation is still unpractical for the applications of region-based image retrieval (RBIR) systems [8, 30-32]. Although many systems provide segmentation tools, they usually need complicated user interaction to achieve image retrieval. Therefore, the processing is very inefficient and time consuming to the user. In the following, the new approach will propose to overcome this problem. In our algorithm, the user does not need to provide precisely segmented regions, instead, the boundary checking algorithm are used to support segmented regions.
For region-based image retrieval, we adopt the unsupervised texture segmentation method [30, 33]. In , Ojala et al. use the nonparametric log-likelihood-ratio test and the G statistic to compare the similarity of feature distributions. The method is efficient for finding homogeneously textured image regions. Based on this method, a boundary checking algorithm  has been proposed to improve the segmentation accuracy and computational cost. For more details about our segmentation algorithm, we refer the reader to . In this Chapter, the weighted distribution of global information CIH (color index histogram) and local information LBP (local binary pattern) are applied to measure the similarity of two adjacent regions.
An example is shown in Fig. 3. It can be seen that boundary checking algorithm segments the test image correctly, and it costs only about 1/20 processing time of the method in . For color image segmentation, another example is shown in Fig. 4. In Fig. 4(c) Fig. 4(c’), we can see that the boundary checking algorithm achieves robustness segmentation for test image “Akiyo” and another nature image.
4.2. Region representation
To achieve region-based image retrieval, we use two compact and intuitive visual features to describe a segmented region: dominate color descriptor (DCD) and texture. For the first one, we use our modified dominant color descriptor in [19, 26]. The feature representation of a segmented region
where and are the
For the second one, the texture feature of a region is characterized by the weighted distribution of local binary pattern (LBP) [6, 25, 32]. The advantages of LBP include its invariant property to illumination change and its low computational cost . The value of
where represents the frequency of LBP value at
In addition, we define a feature to represent the
4.3. Image representation and definition of the foreground assumption
For image retrieval, each image in database is described by a set of its non-overlapping regions. For an image
The main goal of foreground assumption is to simply distinguish main objects and irrelevant regions in images. Assume that we can divide an image into two parts: foreground and background. In general, the foreground stands the central region of an image. To emphasize the importance of central region of an image, we define
where and are the occupied regions of foreground and background, respectively; and is height and width of the image.
In region-based retrieval procedure, segmented regions are required. It can be provided by the users or be generated by the system automatically. However, the criterion for similarity measure is based on the overall distances between feature vectors. If an image in database has background regions that is similar to the foreground object of the query image, this image will be considered as similar image based on the similarity measure. In this case, the accuracy of region-based retrieval system decreases. Therefore, we modify our region representation by adding a Boolean model to determine whether the segmented region belongs to the background of the query image or not.
Note that the variable is designed to reduce the segmentation error.
On the other hand, we extract the global features for an image to compensate the inaccuracy of segmentation algorithms. The featuresincludes three feature sets: 1) dominant color for each region, 2) texture for each region, and 3) dominant color.
5. Integrated region-based relevance feedback framework
In region-based image retrieval, an image is considered as relevant if it contains some regions with satisfactory similarity to the query image. The retrieval system can reconstruct a new query that includes only the relevant regions according to user’s feedback. In this way, the system can capture the user’s query concept automatically. For example, Jing et al.  suggest that information in every region could be helpful in retrieval, and group all regions of positive examples by K-means algorithm iteratively to ensure the distance between all the clusters not exceeding a predefined threshold. Then, all regions within a cluster are merged into a new region. However, the computational cost for merging new regions is proportional to the number of positive examples. Moreover, users might be more interested in some specified regions or main objects rather than the positive examples.
To speed up the system, we introduce a similarity matrix model to infer the region-of-interest sets. Inspired by the query-point movement method [8, 31], the proposed system performs similarity comparisons by analyzing the salient region in pseudo query image and relevant images based on user’s feedback information.
5.1. The formation of region-of-interest set
5.1.1. Region-based similarity measure
In order to perform region-of-interest (ROI) queries, the relevant regions are obtained by the measurement of region-based color similarityand region based texture similarityin Eq. (14) and (15), respectively. This similarity measure allows users to select their relevant regions accurately. Note that the conventional color histogram could not be applied on DCD directly because the images do not have exact numbers of dominant colors . The region-based color similarity between two segmented regions and can be calculated by
where and represent the number of pixels in regions
Theoretically, visual similar is achieved when both color and texture are similar. For example, two regions should be considered as non-similar if they are similar in terms of color but not texture. This can be achieved by imposing
5.1.2. Similarity matrix model
In the following, we introduce a region-based similarity matrix model. The regions of positive examples, which helps the system to find the intention of user’s query, are able to exclude the irrelevant regions flexibly. The proposed similarity matrix model is described as follows.
The region similarity measure is performed for all regions. The relevant image set is denoted as, where
As an example, let contains three relevant images, where,and. Our similarity matrix model to infer the user’s query concept is shown in Fig. 7, where the symbol “1” means that two regions are regarded as similar. On the contrary, the symbol “0” represents that two regions are non-similar in content.
To support ROI queries, we perform the one-to-many relationships to find a collection of similar region sets, e.g., , , , ,, , and, see Fig. 8. After this step, several region-of-interest sets can be obtained by merging all similar region sets. For example, the first set contains three similar regions. Each region will be merged together with the above eight similar region sets. In this example, three region-of-interest sets can be obtained by the merging operation, i.e., , and. Since user may be interested in some repeated similar regions, the single region set could be assumed to be irrelevant in our approach. Therefore, we haveand as shown in Fig. 8. The two sets are considered as region-of-interests that reflect user’s query perception.
If users are interested in many regions, the simple merging process can be used to capture the query concept. In Fig. 8, for example, and are the regions belong to the same relevant image and, respectively. It can be seen that the similar matrix approach is consistent with human perception and is efficient for region-based comparison.
5.1.3. Salient region model
To improve retrieval performance, all the region-of-interest sets from the relevant image set will be integrated for the next step during relevance feedback. As described in previous subsection, each region-of-interest set could be regarded as a collection of regions, and extracted information can be used to identify the user’s query concept. However, correctly capturing the semantic concept from the similar regions is still a difficult task. In this stage, we define salient region as all similar regions within each ROI set. The features of the new region are equal to the weighted average features of individual regions.
In order to emphasize the percentage of area feature, we modified the dominant color descriptor in Eq. (1). The feature representation of the salient region
where is the
All similar regions in ROI can be determined from the eight uniformly divided partitions in RGB color space as shown in Fig. 9.
where is the number of dominant colors in cluster ; , and represent the dominant color components of R, G and B located within partition
5.2. The pseudo query image and region weighting scheme
To capture the inherent subjectivity of user perception, we define a pseudo image as the set of salient regions, . The feature representation of can be written as
During retrieval, the user chooses the best matched regions what he/she is looking for. However, the retrieval system cannot precisely capture the user’s query intention at the first or second steps of relevance feedback. With the increasing of the returned positive images, query vectors are then constructed to perform better results. Taking average  from all the feedback information could introduce redundant, i.e., information from irrelevant regions. Motivated by this observation, we suggest that each similar region in ROI should be properly weighted according to the amount of similar regions. For example, the in Fig. 8 is more important than in . The weights associated with the significance of SR in can be dynamically updated as
where represents the number of similar regions in region-of-interest set , and is the number of region-of-interest sets.
5.3. Region-based relevance feedback
In reality, inaccurate segmentation leads to poor matching result. However, it is difficult to ask for precise segmented regions from users. Based on the foreground assumption, we define three feature vectors, which are extracted from entire image (i.e., global dominant color), foreground and background, respectively. The advantage of this approach is that it provides an estimation that minimizes the influence of inaccurate segmentation. To integrate the two regional approaches, we summarize our relevance feedback as follows.
For the initial query, the similarity measure for the initial query image and target image in database are compared by using Eq. (4). Therefore, a coarse relevant-image set can be obtained. Then, all regions in the initial query image and the positive images based on the user’s feedback information are merged into relevant image set . The proposed region-based similarity matrix model performs Eq. (14) and (15) to find the collection of the similar regions. The similar regions can be determined by Eq. (16), and then be merged into salient region
It should be noted that and defined above both contain the relevance information that reflects human semantics. The similarity measure for pseudo query image and target image is calculated by
where is the number of salient region sets in ; is the number of color/texture segmented regions in target image ; is the weight of salient region . In Eq. (21), the image-to-image similarity matching maximizes the value of region based color similarity by using Eq. (14). If the Boolean model for a partitioned region in target image, then the background of the image will be excluded for matching in Eq. (21).
On the other hand, is a collection of relevant images based on the user’s feedback information. Since poor matches arise from inaccurate image segmentations, three global features , and in Eq. (13) are extracted to compensate the inaccuracy. The similarity between the relevant image set and target image in database is calculated by
where, and are dominant colors, foreground and background for the
It is worth to mention that our region-based relevance feedback approach defined above is able to reflect human semantics. In other words, user might aware some relevant image from the initial query, and then provides some positive image.
Considering the ability to capture the user’s perceptions more precisely, the system determines the retrieved rank according to average of region-based image similarity measure in Eq. (21) and foreground-based similarity measure in Eq. (23).
6. Experimental results
We use an image database (31 categories about 3991 images) for general-purpose from Corel’s photo to evaluate the performance of the proposed framework. The database has a variety of images including animal, plant, vehicle, architecture, scene, etc. It has the advantages of large size and wide coverage . Table 1 lists the labels for 31 classes. The effectiveness of our proposed region-based relevance feedback approach is evaluated.
In order to make a comparison on the retrieval performance, both average retrieval rate (ARR) and average normalized modified retrieval rank (ANMRR)  are applied. An ideal performance will consist of ARR values equal to 1 for all values of recall. A high ARR value represents a good performance for retrieval rate, and a low ANMRR value indicates a good performance for retrieval rank. The brief definitions are given as follows. For a query q, the ARR and ANMRR are defined as:
where is total number of queries; is the number of the ground truth images for a query. The notation
To test the performance of our integrated approach for region-based relevance feedback, we first query an image with a gorilla sits on grass as shown Fig. 10(a).
As mentioned in Section 5.4, the dominant color between query image and target image is used for similarity measure in the initial query. The retrieval results are shown in Fig. 10(b), the top 20 matching images are arranged from left to right and top to bottom in order of decreasing similarity score.
For better understanding of the retrieval results, the DCD vectors of the query image, rank 6th image and rank 8th image are listed, respectively. See Fig. 11. It can be seen that the query image and the image “lemon” are very similar in the first dominant color (marked by box). If we use the global DCD as the only feature for image retrieval, the system only returns eleven correct matches. Therefore, further investigation on extracting comprehensive image features is needed.
Assume that the user has selected five best matched images, marked by red box, as shown in Fig. 10(a). In conventional region-based relevance feedback approach, all regions in the initial query image and the five positive images are merged into relevant image set . The proposed similarity matrix model is able to find the region-of-interest region sets. For the next query, could be regarded as a new query image which is composed of some salient regions. The retrieval results based on the new query image are shown in Fig. 12. The following are discussions.
The pseudo query image is capable to reflect user's query perception. Without considering the Boolean model in Eq. (21), the similarity measure by Eq. (21) returns 16 correct matches as shown in Fig. 12.
Using the pseudo image as query image, the initial query image is not ranked first but fifth, as shown in Fig. 12.
The retrieval results return three dissimilar images (marked by red rectangle boxes), which ranks are 7th, 8th and 12th, respectively.
To analyze the improper result, the dominant color vectors and percentage of area of “cucumber” and “lemon” are listed. See Fig. 13. We can see that each of the images “gorilla”, “cucumber” and “lemon” contains three segmented regions. For each region, the number of the dominant colors, percentage of area and BV value are listed and colored red. For similarity matching, the dominant colors (i.e. region#1, region#2 and region#3) of initial image “gorilla” are similar to the dominant color (marked by red rectangle box) of the image “cucumber”. In addition, the percentages of area (0.393911, 0.316813, 0.289276) of initial image “gorilla” are similar to the percentage of area (region#2, 0.264008) of the image “cucumber”. The other similarity comparisons between “gorilla” and “cucumber” image are not presented here because the maximum similarity between two regions in Eq. (14) is very small. In brief, without considering the exclusion of irrelevant regions, the region-based image-to-image similarity model in Eq. (21) could cause improper ranks in visualization.
The retrieval performance can be improved by automatically determining the user’s query perception. In the following, we would like to evaluate the advantages of our proposed relevance feedback approach. For the second query, the integrated region-based relevance feedback contains not only the salient-region information, but also the “specified-region” information based on relevant images set . The retrieval results based on our integrated region-based relevance feedback are shown in Fig. 14. Observations and discussions are described as follows.
The system returns 18 correct matches as shown in Fig. 14.
In Fig. 13, region#1 and region#3 in query image are two grass-like regions, which are labeled as inner region, i.e., . On the other hand, the region#2 in image “cucumber” is a green region that is similar to the grass-like regions in query image. In our method, this problem can be solved by examining the BV value in Eq. (21). As we can see, none of the three incorrect images including “cucumber”, “lemon” and “carrot” in Fig. 12 appears in the top 20 images in Fig. 14.
In contrast, it is possible that the grass-like regions are parts of the user’s aspect. In this case, the three feature vectors including entire image, foreground and background can be used to compensate the loss of generality. In Fig. 14 retrieval results indicate that the high performance is achieved by using these features.
Our proposed relevance feedback approach can capture the query concept effectively. In Fig. 14, it can be seen that most of the retrieval results are considered to be highly correlated. In this example, 90% of top 20 images are correct images. In general, the features in all retrieval results look similar to gorilla or grass. The results reveal that the proposed method improves the performance of the region-based image retrieval.
In Fig. 15-17, further examples are tested to evaluate the performance of the integrated region-based relevance feedback for nature images. In Fig. 15, the contents of the query image include a red car on country road by the side of grasslands. If the user is only interested in the red car, four positive images marked by red boxes will be selected as shown in Fig. 15 (b). In this case, retrieval results (RR=0.25, NMRR=0.7841) are far from satisfactory performance for the initial query.
After the submission of pseudo query image and relevant images set based on user’s feedback information, the first feedback retrieval returns 10 images containing “red car” as shown in Fig. 16. For this example, the first feedback retrieval achieves an ARR improvement of 28.6%. More precise results can be achieved by increasing of the number of region-of-interest sets and relevant image set based for the second feedback retrieval as shown in Fig. 17. The retrieval results for the second feedback retrieval returns 11 images containing “red car”, and achieve an NMRR improvement of 35% compared to the initial query. Furthermore, the rank order in Fig. 17 is more reasonable than that in Fig. 16.
To show the effectiveness of our proposed region-based relevance feedback approach, the quantitative results for individual class and average performance (ARR, ANMRR) are listed in Table 2 and 3, which show the comparison of the performance for each query. It can be seen that the performance of retrieving precision and rank are relatively poor for the initial query. Through the adding positive examples by user, feedback information could have more potential in finding the user’s query concept by means of optimal pseudo query image and relevant images set as described in Section 5.4. In summary, the first feedback query improves 30.8% of ARR gain and 28% of ANMRR gain, and the second feedback query further improves 10.6% of ARR gain and 11% of ANMRR gain as compared with first feedback query. Although the improvement of retrieval efficiency is decreases progressively after two or three feedback queries, the proposed technique is able to provide satisfactory retrieval results in that few feedback queries.
The conventional existing region-based relevance feedback approaches work well in some specified applications; however, their performances depend on the accuracy of segmentation techniques. To solve this problem, we have introduced a novel region-based relevance feedback for image retrieval with the modified dominant color descriptor. The term “specified area”, which combines main objects and irrelevant regions in image, has been defined for compensating the inaccuracy of segmentation algorithm. In order to manipulate the optimal query, we have proposed the similarity matrix model to form the salient region sets. Our integrated region-based relevance feedback approach contains relevance information including pseudo query image and relevant images set , which are capable to reflect the user's query perception. Experimental results indicate that the proposed technique achieves precise results in general-purpose image database.
This work was supported by the National Science Counsel of Republic of China Granted NSC. 97-2221-E-214-053-.
: Boolean model, which is used to determine whether the segmented regionbelongs to the background or foreground.
: dominant color descriptor
: similarity measure (dominant color descriptor)
: the ith non-overlaping region in I
: dominate color descriptor (DCD) of a segmented region R
: the value of kth bin in LBP histogram
: region-based color similarity
: the maximum similarity between two regions in similar color percentage
: similarity of the area percentage
: region based texture similarity
: defined background based on foreground assumption
: defined foreground based on foreground assumption
: the percentage of area for region R in the image
: relevant image set
: texture feature of region R
: similarity coefficient between two color clusters (dominant color descriptor)
: dominant color vector (dominant color descriptor)
: Euclidean distance between two color clusters (dominant color descriptor)
: percentage of each dominant color (dominant color descriptor)