Standard deviations of the recognition rates
Ear biometrics has received deficient attention compared to the more popular techniques of face, eye, or fingerprint recognition. The ear as a biometric is no longer in its infancy and it has shown encouraging progress so far. ears have played an important role in forensic science for many years, especially in the United States, where an ear classification system based on manual measurements was developed by (Iannarelli, 1989). In recent years, biometrics recognition technology has been widely investigated and developed. Human ear, as a new biometric, not only extends existing biometrics, but also has its own characteristics which are different from others. Iannarelli has shown that human ear is one of the representative human biometrics with uniqueness and stability (Iannarelli, 1989). Since ear as a major feature for human identification was firstly measured in 1890 by Alphonse Bertillon, so-called ear prints have been used in the forensic science for a long time (Bertillon, 1890). Ears have certain advantages over the more established biometrics; as Bertillon pointed out, they have a rich and stable structure that does not suffer from the changes of ages, skin-color, cosmetics, and hairstyles. Also the ear does not suffer from changes in facial expression, and is firmly fixed in the middle of the side of the head so that the background is more predictable than is the case for face recognition which usually requires the face to be captured against a controlled background. The ear is large compared with the iris, retina, and fingerprint and therefore is more easily captured at a distance.
We presented gabor-based region covariance matrix as an efficient feature for ear recognition. In this method, we construct a region covariance matrix by using gabor features, illumination intensity component, and pixel location, and use it as an efficient and robust ear descriptor for recognizing peoples. The feasibility of the proposed method has been successfully tested on ear recognition using two USTB databases, specifically used total 488 ear images corresponding to 137 persons. The effectiveness of the proposed method is shown in terms of the comparative performance against some popular ear recognition methods.
This chapter is organized as follows. In section 2, related works are presented. In section 3, region covariance matrix (RCM) and the method for fast RCM computation are presented. In section 4, the proposed method presented in detail. In section 5, ear image databases are introduced. In section 6, experimental results are shown and commented. The chapter concludes in section 7.
2. Related works
Ear recognition depends heavily on the particular choice of features that used in ear biometric systems. The Principal Component Analysis method (PCA) is a classical statistical characteristic extracts method. The PCA (Xu, 1994; Abdi & Williams, 2010) transformation is based on second order statistics, which is commonly used in biometric systems. With second order methods, a description with minimum reconstruction error of the data is found using the information contained in the covariance matrix of the data. It is assumed that all the information of Gaussian variables (zero mean) is contained in the covariance matrix. The Independent Component Analysis (ICA) is another popular feature extraction method. ICA (Comon, 1994; Stone, 2005) provides a linear representation that minimizes the statistical dependencies among its components, which is based on higher order statistics of the data. These dependencies among higher order features could be eliminated by isolating independent components. It is a statistical method for transforming an observed multidimensional random vector into components that are statistically independent from each other as much as possible. The ability of the ICA to handle higher-order statistics in addition to the second order statistics is useful in achieving an effective separation of feature space for given data. The higher order features are capable of capturing invariant features of natural images. In (Zhang & Mu, 2008), PCA and ICA methods with RBFN classifier is presented. In these two methods, PCA and ICA are used to extract features and RBFN is used as classifier. In this chapter, these two methods denote by PCA+RBFN, and ICA+RBFN respectively.
Hmax+SVM is another popular feature extraction method for ear recognition. Hmax model is motivated by a quantitative model of visual cortex, and SVMs are classifiers which have demonstrated high generalization capabilities in many different tasks, including the object recognition problem. This method (Yaqubi et al., 2008) combines these two techniques for the robust Ear recognition problem. With Hmax, a new set of features has been introduced for human identification, each element of this set is a complex feature obtained by combining position- and scale- tolerant edge detectors over neighboring positions and multiple orientations. This system’s architecture is motivated by a quantitative model of visual cortex (Riesenhuber & Poggio, 1999).
Another feature extraction method for ear recognition is presented by (Guo & Xu, 2008). This method called Local Similarity Binary Pattern (LSBP). Local Similarity Binary Pattern considers both the connectivity and similarity information in representation. LSBP histogram captures the information of connectivity and similarity, such as lines and connective area. In this method, in order to enhance efficient representation, histograms not only encode local information but also spatial information by image decomposition. Because of the special characteristics of ear images, the connectivity and similarity of intensity plays a significant role in ear recognition, which can be encoded by Local Similarity Binary Pattern.
3.1. Covariance matrix as a region descriptor
The covariance matrix is a symmetric matrix. Covariance matrix diagonal entries represent the variance of each feature and their non-diagonal entries represent their correlations. Using covariance matrices as the descriptors of the region has many advantages. The covariance matrix presents a natural way of fusing multiple features without normalizing features or using blending weights. It embodies the information embedded within the histograms as well as the information that can be derived from the appearance models. In general, for each region, a single covariance matrix is enough to match with that region in different views and poses. The noise corrupting individual samples are mostly filtered out with the average filter during covariance computation process. Due to the equal size of the covariance matrix of any region, we can compare any two regions without being restricted to a constant window size. If the raw features such as, image gradients and orientations, are extracted according to the scale difference, It has also scale invariance property over the regions in different images.
As given above, covariance matrix can be invariant to rotations. However, if information regarding the orientation of the points are embedded within the feature vector, it is possible to detect rotational discrepancies. We also want to mention that the covariance is invariant to the mean changes such as identical shifting of color values. This can be an advantageous property when objects are tracked under different illumination conditions. Region covariance matrix (RCM) presented by (Tuzel et al., 2006). RCM is a covariance matrix of many image statistics computed within a region.
We define as an one dimensional unit normalized intensity image. The method can be generalized to other type of images, which can be a 2D intensity image, or 3D color image or multi spectral. Assume F be the dimensional feature image extracted from
Where the function can be any mapping function such as color, image gradients, edge magnitude, edge orientation, filter responses, etc. this pixel-wise mapping list can be extended by including higher order derivatives, radial distances, texture scores, angels, and temporal frame differences in case a video data is available.
For a given rectangular window, let be the d-dimensional feature vectors inside.
Each feature vector introduces a pixel (x, y) within that window. Since we extract the mutual covariance of the features, the windows can actually be any shape not necessarily rectangles. Basically, covariance is a statistical measure of how much two variables vary together. Covariance can be a negative, positive or zero number, conditional upon what is the relation between two features (Forsyth & Ponce, 2002). If the features increase together, the covariance is positive. If one feature increases and the other decreases, the covariance is negative, and if the two features are independent, the covariance is zero. We introduce each window with a covariance matrix of the features.
Where is the mean vector of the corresponding features for the points within the region. The diagonal coefficients represent the variance of the corresponding features. For example, the jth diagonal element represents the variance for the jth feature. The non-diagonal elements represent the covariance between two different features.
The feature vectors can be constructed using different type of mapping functions like pixel coordinates, color intensity, gradient, etc.
or they can be constructed using the polar coordinates
are the relative coordinates with respect to window center, and
is the distance fromand
is the orientation component. For human detection problem, (Tuzel et al., 2007)
introduced the mapping function as
Where denotes the absolute operator. First- and second-order gradients and pixel location were used in this function to construct RCM. The other form of feature mapping function which is introduced by (Tuzel et al., 2006) for gray level images is
Figure 1, denotes a sample covariance matrix for a given image.
Despite RCM advantages, computation of the covariance matrices for all rectangular regions within an image is computationally prohibitive using the routine methods. Several applications such as detection, segmentation, and recognition require computation and comparison of covariance matrices of regions. However, routine methods disregard the fact that there exist a high number of overlaps between those regions and the statistical moments extracted for such overlapping areas can be utilized to enhance the computational speed.
3.2. Fast covariance computation using integral images
Instead of repeating the summation operator for each possible window as described by (Veksler, 2003 ; Porikli, 2005), we can calculate the sum of the values within rectangular windows in linear time. For each rectangular window we need a constant number of operations to calculate the sums over specific rectangles many times. First, we should define the cumulative image function. Each element of this function is equal to the sum of all values to the left and above of the pixel including the value of the pixel itself. We can calculate the cumulative image for every pixel with four arithmetic operations per pixel. Then we should calculate the sum of image function in a rectangle. This operation can be computed with another four arithmetic operations with some modifications at the border. Therefore by using a linear amount of computation, the sum of image function over any rectangle can be calculated in linear time.
Integral images are intermediate image representations used for fast calculation of region sums (Viola & Jones, 2001). Later Porikli (Porikli, 2005) was extended this idea for fast calculation of region covariances. He presented that the covariances can be obtained by a few arithmetic operations with a series of integral images.
We can rewrite (i, j)-th element in covariance matrix which introduces in (2) as
By expanding the mean we have
To compute region R (rectangular region) covariance, we need to calculate the sum of each feature dimension as well as the sum of multiplication of any two feature dimensions. In this stage, we can use a series of integral images to compute these sums with a few arithmetic operations.
For each feature dimensionand multiplication of any two feature dimensionswe should construct integral images. Finally, we have integral images. Define p as the tensor of the integral images along each feature dimensions.
And define as the tensor of the second order integral images.
If we have the rectangular region as shown in figure 2, the covariance of the region that bounded by and is
Where. In the same way, the covariance of the region is
Where. Therefore, by using the integral images, the covariance of each rectangular region can be computed in time. In our method we used integral image based covariance computation as a fast approach for RCM computation of the given features.
3.3. Covariance matrix distance calculation
Since RCMs lie on connected Riemannian manifold, the Euclidean distance is not proper for our features, for instant, this space is not closed under multiplication with negative scalars. We use the distance measure presented in (Forstner & Moonen, 1999) to compute the distance/dissimilarity of the covariance matrices.
where are generalized eigenvalues of and computed from
where are the generalized eigenvectors.
4. Gabor-based region covariance matrix
4.1. Gabor features extraction
The RCM-based methods with feature mapping functions (9),(10) have great success in people detection, object tracking, and texture classification (Tuzel et al., 2006; Tuzel et al., 2007). However our experimental results showed that the recognition rates of these methods are very low when being applied to ear recognition which is a very difficult task from the classification point of view. We construct effective features for RCM by using Gabor features and pixel location and illumination intensity component, to get better result in ear recognition. The biological relevance and computational properties of Gabor wavelets for image analysis have been investigated in (Jones & Palmer, 1987).
The Gabor features of ear images are robust against illumination changes. Gabor representation facilitates recognition without correspondence, because it captures the local structure corresponding to spatial frequency (scale), spatial localization, and orientation selectivity (Schiele & Crowley, 2000).
Daugman (Daugman, 1985) modeled the responses of the visual cortex by Gabor functions because they are similar to the receptive field profiles in the mammalian cortical simple cells. Daugman (Daugman, 1985) enhanced the 2D Gabor functions (a series of local spatial bandpass filters), which have good spatial localization, orientation selectivity, and frequency selectivity. Lee (Lee, 2003) gave a good description to image representation by using Gabor functions. A Gabor (wavelet, kernel, or filter) function is the product of an elliptical Gaussian envelope and a complex plane wave as
Whereis the variable in a spatial domain, and is the frequency vector, which determines the scale and direction of Gabor functions, where, with. In our application, and. The term is subtracted in order to make the kernel DC-free and, thus, insensitive to illumination. Examples of the real part of Gabor functions used in this chapter are shown in Figure 3. We use Gabor functions with five different scalesand eight different orientations, making a total of 40 Gabor functions. The number of oscillations under the Gaussian envelope is determined by
The gabor kernels family is constructed by taking five scales and eight orientations. The gabor features can be achieved by convolving the gabor kernels with the image
Where is a magnitude operator. are the gabor representation of an image at orientation and scale. Figure 4 shows the magnitude of gabor representation of an ear image.
4.2. Gabor based RCM
We propose a new gabor-based feature mapping function to construct effective and robust RCM.
Where is the pixel illumination intensity and are the gabor representation of the ear image. By substituting (24) into (2), we have the gabor-based region covariance matrices in region R. dimntionality is.
In our method, we represent each ear image with five RCMs extracted from five different regions. First RCM defined over the whole ear image, so it gives us a global representation of the ear image. Four other RCMs are defined over part of the ear image, so they give us the part-based representation of the ear image. In order to increase the robustness of our method against illumination variations, we use both global and part-based representations for ear images in our method. Figure 5, denotes these five regions for
For computing the distance between a gallery RCM and a Probe RCM, we use
Where and are RCMs from gallery and probe sets.
Sometimes one local RCM, due to illumination variation or noise, may be affected so much that make its corresponding distance unreliable. That is the reason why we subtracted the most unreliable part in (25) from the summation of all distances between gallery and probe RCMs. We used nearest neighbor classifier with the distance in (25) for our method.
Our method tested on two USTB databases (Yuan et al., 2005). Database 1 includes 180 images of human ear corresponding to 60 individual with three images per person. All the images in database 1 acquired under standard condition with a little changes. Figure 6, denotes sample ear images from database 1.
Database 2 includes 308 images of human ear corresponding to 77 individual with four images per person. All the images in database 1 acquired under illumination variation and 30 degree pose variations. Figure 7, shows sample ear images from database 2.
6. Experimental result
We performed our experimental studies comparing various ear reconigtion algorithms including our method with PCA+RBFN method (Zhang & Mu, 2008), ICA+RBFN method (Zhang & Mu, 2008), Hmax+SVM method (Yaqubi et al., 2008), LSBP method (Guo & Xu, 2008), four RCM-based methods (Tuzel et al., 2007; Pang et al., 2008). In order to compare the recognition performance of our method with the above methods, we have used USTB databases (Yuan et al., 2005) in our experiments. In database 1, from a total of 60 persons, two images per person where randomly used for training. There are three different ways of selecting two images for training from three images. In database 2, from a total of 77 persons, three images per person where randomly used for training. There are four different ways of selecting three images for training from four images.
For simplicity, RCM-based methods associated with (9), (10), (11), (12) denote by RCM1, RCM2, RCM3, RCM4 respectively. RCM3 is a subset of RCM1 with lack of intensity component; also RCM2 is a subset of RCM4 with lack of intensity component.
Figures 8 and 9 denote the mean of the recognition rates for database 1 and 2 datasets. From Figures 8 and 9, it can be seen that the recognition performances of four RCM-based methods were worse than other methods, so it can be concluded that the discrimination power, in these RCM-based methods are weak for recognition task. To find out about the intensity parameter effect on the recognition rate, we compare the result of RCM1 with RCM3 and the result of RCM2 with RCM4. We can conclude that is an important feature in RCMs and it contributes to increasing the recognition performance of RCM-based methods. Thus, we used the illumination intensity component in our mapping function to increase the accuracy of our method.
Table 1 shows the comparision of the standard deviation of recognition performance between all discussed methods on database 1 and 2. From table 1, We can see that the standard deviation of our method for database 1 are low. Therefore, our method showed better performance than any other methods in database 1. The mean recognition rates of our method in database 1 and 2 are 93.33% and 87.98% respectively. Due to the pose variations in database 2 images, the recognition performance of our method, in terms of average accuracies, outperforms any other methods, except LSBP and ICA methods.
|Database 1||Database 2|
Eventually, these results prove that using Gabor features, as main features in constructing RCMs, will improve the discrimination ability for recognizing ear images, and it shows better recognition rate in proportion to previous methods.
In this chapter, we proposed gabor-based region covariance matrices for ear recognition. In this method we form region covariance matrix by using gabor features, illumination intensity component, and pixel location and utlize it as an efficient ear descriptor. We compared our method with PCA+RBFN method (Zhang & Mu, 2008), ICA+RBFN method (Zhang & Mu, 2008), Hmax+SVM method (Yaqubi et al., 2008), LSBP method (Guo & Xu, 2008), and four RCM-based methods (Tuzel et al., 2007; Pang et al., 2008), using two USTB databases.
Unlike the previous RCM-based methods which have very low recognition rates when being applied to ear recognition, our RCM-based method, which used gabor features as a main feature for constructing RCM, showed better result in ear recognition. Potential results showed that our method achieved improvement, in terms of recognition rate, in proportion to other methods. Our method obtains the average accuracy of 93.33% and 87.98%, respectively, on the databases 1 and 2 for ear recognition.