Open access

Adaptive Fitness Approach - an Application for Video-Based Face Recognition

Written By

Alaa Eleyan, Hüseyin Özkaramanli and Hasan Demirel

Submitted: 06 November 2010 Published: 01 August 2011

DOI: 10.5772/20085

From the Edited Volume

New Approaches to Characterization and Recognition of Faces

Edited by Peter Corcoran

Chapter metrics overview

2,317 Chapter Downloads

View Full Metrics

1. Introduction

In the last two decades face recognition has emerged as an important research area with many potential applications that surely ease and help safeguard our everyday lives in many aspects (Zhao et al., 2003; Kirby & Sirovich, 1990; Turk & Pentland, 1991; Martinez & Kak, 2001; Belhumeur et al., 1997; Philipps et al., 2000; Eleyan & Demirel, 2007, 2011; Brunelli & Poggi, 1993; Wiskott et al., 1997). The face recognition problem from still images has been extensively studied (Sinha et al., 2006; Eleyan et al., 2008). Face recognition from video has recently attracted the attention of many researchers (Zhou et al., 2003; Li & Chellappa, 2002; Wechsler et al., 1997; Steffens et al., 1998; Eleyan et al., 2009). Video is inherently richer in information content when compared with still images. It has important properties that are absent in still images. Some of these important properties are the temporal continuity, dynamics and the possibility of constructing 3D models from faces. On the other hand, it should also be noted that video acquired facial data are normally of very low quality and low resolution, which make recognition algorithms very inefficient. The temporal continuity and dynamics of a person captured by a video makes it easier for humans to recognize people. Humans are usually able to recognize faces in very low resolution images. This is not the case for the computer based techniques which have been shown to be quite capable in recognizing faces from still images. Utilization of these properties for more efficient and high performance face recognition algorithms requires approaches that are different than the traditional approaches.

There are many reasons why humans are so successful in recognition of faces in video while computers are not. Some of these are: 1) Humans use a collection (flow) of data over time rather than an individual video image during both training and testing. 2) Humans are superbly capable of tracking objects. And in so doing can make excellent use of flow of data

In the training stage, when a new person is to be “memorized” many features such as appearance, gestures, gait etc. are encoded. Each person in the human memory (gallery) is encoded differently and there are quite a number of people memorized by humans. In the testing (recognition) stage human beings compare these features and make a decision on the identity of a person. This process however is not a “one shot” comparison, but it is continually made based on the flow of data. When the person is far a way for example, it is difficult to discern the facial features. However from the gait and gestures the human brain is able to extract important information to identify an approaching person. Based on this information the human brain automatically deems some of the people in the memory as unlikely candidates to match the approaching person and thus those candidates are not considered in further comparisons/associations. As the person approaches closer, the human brain restricts the comparison to reduced set of likely candidates in the memory.

Inspired by this biological process of making comparisons and making decisions based on a reduced set of candidates at testing stage, we propose in this chapter to design an analogues structure for computer based face recognition from video whereby the gallery is continually updated as the frames of the probe video is processed. In order to demonstrate the effectiveness of the proposed approach we employ features derived from PCA or LBP. After every probe frame, the feature vector is compared with the feature vectors of the gallery images, the unlikely images in the gallery are discarded based on the accumulated fitness of the gallery images. An update set of features are derived using remaining image in the gallery. The update set of features are used to test the next frame in the probe video. The results obtained using the idea of updated galley set indicates that significant improvement in recognition performance can be achieved. The adaptive fitness approach (AFA) is also tested without updating the gallery set. Again, this scheme with fixed gallery set gives comparable performance results as the scheme with updated gallery set.

The rest of the chapter is organized as follows. Section 2 briefly reviews feature extraction. Section 3 presents the face video database. Section 4 introduces the adaptive fitness update approach. Section 5 reports our experimental results and discussions, and Section 6 concludes this chapter.

Advertisement

2. Feature extraction

Feature extraction is a very crucial stage of data preparation for later on future processing such as detection, estimation and recognition. It is one of the main reasons for determining the robustness and the performance of the system that will utilize those features. It’s important to choose the feature extractors carefully depending on the desired application. As the pattern often contains redundant information, mapping it to a feature vector can get rid of this redundancy and preserve most of the intrinsic information content of the pattern. The extracted features have great role in distinguishing input patterns.

In this work, instead of using more biologically oriented features, for the reasons of simplicity we employ features derived from principal component analysis (PCA) (Kirby & Sirovich, 1990; Turk & Pentland, 1991) and local binary patterns (LBP) (Ahonen et al., 2004; Ojala et al., 2002). However the recognition framework does allow the incorporation of other features. In PCA case, one needs to prepare a projection space using the training set and use it to preparing the feature vectors of both training and tested sets. In LBP every image is processed independently to form its feature vectors. So if the size of the training set changed as it does in AFA, new space has to be prepared if PCA is used to form the feature vectors while feature vectors will stay same if LBP is used.

Advertisement

3. Video face database

In this study we used the BANCA database (Popovici et al., 2003), which is a multimodal database designed with various acquisition devices (2 cameras and 2 microphones), and under several scenarios (controlled, degraded and adverse). The videos were collected for 52 individual (26 male and 26 female) on 12 different occasions (4 recordings for each scenario). In our work we will be using the video sequences for the 52 individual with the three different scenarios. In the degraded scenario a web cam was used, while higher quality camera was used in the controlled and adverse scenarios. Figure 1 shows samples from the database for the three scenarios.

Figure 1.

Samples of the BANCA database images Left: Controlled, Middle: Degraded, Right: Adverse scenarios.

Figure 2.

Example of using face detection algorithm to crop the face region from the whole frame.

As it was computationally expensive to use all the frames in each individual’s video sequence, we selected 60 frames which correspond to every other frame in the video sequence. The face images from the first n frames (n={1,2,…,10}) of each video sequence were used to form the gallery set to train the system, while the rest were used for testing.

It was essential to run face detection in the pre-processing stage on the extracted frames in order to prepare them for the face recognition process. For this reason, the local Successive Mean Quantization Transform (SMQT) (Nilsson et al., 2007) has been adopted for face detection and cropping due to its robustness to illumination changes. Cropped faces were converted to grayscale and histogram equalized to minimize the illumination problems. Bicubic interpolation was used to resize the resulting face images to the same size of the reference resolution (size of gallery images 128 ×128). Figure 2 shows an example of the face detection cropping and resizing preprocess for one of the image in BANCA database.

Advertisement

4. Adaptive fitness based updating

4.1. Adaptive Fitness Approach (AFA)

The features of each subject in the gallery are derived from the first n frames (n={1,2,…,10}) of each subject’s video sequences using PCA and LBP. Each frame in the test/probe video is treated as a single still image. Feature vectors of each test frame are formed using PCA or LBP techniques. Each feature vector encodes the similarity of the test frame to each of the gallery images. It is natural to expect that some of the gallery images will have high similarity with the frame under the test while others will have low similarity. One can thus establish with some confidence that those gallery images with very low similarity measures will very likely not be the match for the probe frame. Thus when processing the next frame in the test video, one can reduce the size of the gallery by discarding those unlikely candidates. This enables one to make the gallery set smaller after each tested frame. It is a well known fact that the discriminating power of algorithms such as PCA improves when the gallery set that is reduced. The algorithm of discarding images from the gallery set enables in a way the mechanism employed by human brain in recognizing an approaching person. When the person is far away (low resolution) the human brain uses global features to identify for example the approaching person and it does so by eliminating people in its gallery who are unlikely to form a match. When the person gets closer there is an automatic update of the gallery and the approaching person is compared against smaller number of people in gallery. Eventually when the person is very close, the gallery images are reduced to just a few.

Inspired by this biological process which is employed by human brain in recognition tasks, we propose a simple approach to adaptively shrink the size of the gallery set after each frame of the test video is processed. A fitness measure i,k is defined using the Euclidean distance as

Φi,k={δi,kδi,kδi,k,k=1Φi,k1+δi,kδi,kδi,k,2k,NE1

where δi,k is a vector denoting the Euclidean distance between the ith gallery image and the kth test frame. δi,kis the mean distance value of the vector δi,k, and i,k denotes the accumulated fitness between the ith gallery image and the kth test frame. At the first frame of the test video the fitness is set to be just the normalized distance as the first line in equation (Eq. 1) indicates. The normalization is achieved by subtracting δi,k from the mean distance δi,k and dividing by the corresponding element δi,k. in order to reduce the effect of outliers.

The accumulated fitness measure forms the basis for shrinking the size of the gallery by discarding the candidates in the gallery that are unlikely to form a match with the probe video frame. After eliminating unlikely candidates from the gallery, a new set of features is formed from the remaining more fit candidates. For example, if PCA is used for feature extraction, after eliminating images from the gallery the existing eigenspace is updated and new feature vectors is formed for the remaining images. On the other hand if LBP is used, throwing out an image accounts to throwing out the corresponding feature vector; thus there is no need to recalculate a new set of feature vectors. Eventually the continuous updating of the gallery promises to leave behind few candidates that are very likely to form a match with the person under test.

This approach has several advantages. Its resemblance to the recognition by human beings is the first to note. Second, it promises to speed up the recognition process due to the discarding of the unfit images from the gallery. However it should be pointed out that due to the discarding of images from the galley this approach may lead to, even though very unlikely, throwing out some of the correct images in the gallery.

The number of discarded images from the gallery set at each processed frame depends on the standard deviation of the accumulated fitness values at that particular frame. The standard deviation of this distribution is used to establish a fitness threshold c for discarding gallery images. The critical fitness value c is picked conservatively to ensure with almost 100% confidence that the correct gallery images are not eliminated. This forces one to process almost all the frames in order to come up with a decision since with a low c one discards few images from the gallery. This also leads to higher computational burden. This undesirable situation can be avoided by picking a higher threshold c.

The adaptive fitness approach can also be used without updating the gallery. In this scheme one simply process all the frames in the probe video and accumulates the fitness measure with the originally prepared feature vectors. This approach where the gallery is fixed and no updating is required is computationally more efficient compared with the scheme where one updates the gallery and the feature vectors. However, this advantage is not significant since the updating of feature vectors after the gallery is reduced in size can be done incrementally without much computational burden. Furthermore, in the scheme with gallery updating one does not need to process all the video frames to come up with a decision. Figures 3 and 4 give step by step the algorithms of these two schemes.

Figure 3.

Pseudo code for Adaptive Fitness Approach (AFA) with updated gallery set, N = 50.

Figure 4.

Pseudo code for Adaptive Fitness Approach (AFA) with fixed gallery set, N= 50.

An example of how the accumulated fitness measure is employed in the video recognition process with updating of the gallery is depicted in Figures 5 and 6. The feature vectors in Figure 5 are derived from PCA where in Figure 6 feature vectors come from LBP. In this example the probe video belongs to person # 1. The accumulated fitness measure in both figures show clearly that the accumulated fitness corresponding to person # 1 increases while for all other people it is insignificant. Number of training images for each person in this example was n=1 using the controlled scenario (see first row in Table 1).

Figure 5.

Example of fitness accumulation through the frames for 1st video sequence of person number 1 using AFA with PCA.

4.2. Adaptive Fitness Fusion Approach (AFFA)

To recognize an individual human beings use more than one feature such gait, face, body shape and even wearing. A simple fusing technique is employed. The individual fitness

Figure 6.

Example of fitness accumulation through the frames for 1st video sequence of person number 1 using AFA with LBP.

measures coming from PCA and LBA are simply added. The recognition system based on feature vector fusion is the same as before. In the same manner, at the end of processing all the frames the individual with the highest fitness value is declared to be the correct subject. Figure 7 and 8 show the pseudo codes for the proposed fitness fusion idea with fixed and updated gallery set respectively.

Figure 7.

Pseudo code for AFFA approach with fixed gallery set, N= 50.

Figure 8.

Pseudo code for AFFA Approach with updated gallery set, N= 50.

Advertisement

5. Simulation results and discussions

Figures 9 to 14 show the performance of the proposed AFA with updated and fixed gallery. AFA used both LBP and PCA for feature extraction and the results were compared against single frame based PCA and LBP methods, respectively. The three scenarios were shown in these figures with 1 and 5 training images in the gallery set. Both updated and fixed galleries show high competitive results.

The performance of the system is tested using BANCA database under 3 scenarios: controlled, degraded and adverse. For each scenario there are 52 people. For each individual there are 4 videos. The initial gallery is formed from varying number of training images per individual. For this study the numbers ranged from 1 to 10 as 2nd column of table1 depicts.

Usually human beings recognize people by fusing more than one feature. Here we show how the simple approach can be extended to benefit from different feature vectors. This fusion further improves the performance significantly. Again we employ features derived from PCA and LBP for simplicity and convenience.

Due to the fact that the performance of the AFA without fusion was very high (almost 100%) in order to faithfully see the improvement of fusion we increased the video database. As explained in section 3, the Banca database consists of 52 people with 3 scenarios and 4 recordings for each scenario. We treated the 4 recordings of each individual in each scenario as a different individual. This modification accounts to using 208 subjects with 3 different

Figure 9.

Performance in controlled scenario with 1 training image per video with updated and fixed gallery set using PCA.

Figure 10.

Performance in controlled scenario with 1 training image per video with updated and fixed gallery set using LBP.

Figure 11.

Performance in adverse scenario with 1 training image per video with updated and fixed gallery set using PCA.

Figure 12.

Performance in adverse scenario with 1 training image per video with updated and fixed gallery set using LBP.

Figure 13.

Performance in degraded scenario with 1 training image per video with updated and fixed gallery set using PCA.

Figure 14.

Performance in degraded scenario with 1 training image per video with updated and fixed gallery set using LBP.

Scenario# of gallery
images per individual (n)
Recognition Performance (%)
AFA
updated gallery set
AFA
fixed gallery set
PCA/LBP
Controlled195.67 / 97.1297.60 / 10066.25 / 85.62
297.60 / 98.5698.56 / 10077.17 / 90.51
398.08 / 99.0499.04 / 10082.13 / 93.28
599.04 / 10099.52 / 10089.14 / 96.58
10100 / 100100 / 10096.16 / 98.38
Degraded190.39 / 94.2389.90 / 97.6063.06 / 84.09
295.67 / 95.6796.15 / 98.0873.48 / 88.30
396.63 / 98.5696.63 / 98.5678.25 / 91.30
598.08 / 99.5297.12 / 99.0484.76 / 94.43
10100 / 100100 / 10097.15 / 97.63
Adverse191.83 / 96.6392.31 / 97.6068.17 / 87.24
295.19 / 99.5296.63 / 98.5678.65 / 92.76
398.56 / 10098.08 / 99.5684.25 / 95.41
599.52 / 10099.04 / 10090.35 / 97.39
10100 / 100100 / 10099.04 / 98.89

Table 1.

Performance of the adaptive fitness approach (AFA) using different number of training images from BANCA database with fixed and updated gallery set.

Scenario# of gallery
images per individual
(n)
Recognition Performance (%)
updated gallery setfixed gallery set
AFAPCAAFALBPAFFAAFAPCAAFALBPAFFA
Controlled168.2780.7783.7768.7580.7784.25
279.3388.9489.9078.8588.9489.90
383.6690.8792.7983.1790.8792.79
589.9093.2795.6790.3893.2796.15
1095.6797.6010094.2397.60100
Degraded170.6775.8578.8571.1575.8579.33
280.2984.6287.5078.8584.6287.02
383.1789.9090.3883.6589.9090.87
589.9092.7994.7188.4692.7994.23
1091.8395.1910092.3195.1999.52
Adverse168.7574.0478.8569.2374.0479.33
273.0877.8883.2573.5677.8883.73
380.2985.1089.5479.8185.1089.54
586.0690.8795.6784.6290.8795.19
1093.2796.6310092.7996.63100

Table 2.

Performance of the adaptive fitness Fusion approach (AFFA) using different number of training images from BANCA database with fixed and updated gallery set.

scenarios. This is far more challenging since the 4 recordings of each individual are quit similar in terms of feature vectors. The results of this modification in the database size together with the adaptive fitness fusion approach (AFFA) results between LBP and PCA with updated and fixed gallery sets are shown in table 2.

The graphs in figure 15 to figure 20 show examples of the performance results of the proposed fitness fusion in the three different database scenarios (controlled, degraded, adverse) with 1 and 5 training images. The results shown in these figures are obtained using the scheme with fixed gallery set. It is clear in all figures that the fusing of separately obtained the fitness values PCA and LBP using the PCA and LBP feature vectors helped to improve the performance of the system. For example, in figure 15 the performance of AFA approach in the degraded scenario with 1 training image was 71.16 % and 75.85% using PCA and LBP, respectively. When fusion technique AFFA was applied the performance increased to 79.33 %. In figure 16, the training images were increased from 1 to 5 images in the same scenario. With AFA the performance was 88.46 % and 92.79% using PCA and LBP, respectively, and with AFFA it reached 94.23%. Same observation can be made for the other two database scenarios with different training images in figures 17 to 20.

Figure 15.

Performance in degraded scenario with 1 training image per video with fixed gallery set.

Figure 16.

Performance in degraded scenario with 5 training image per video with fixed gallery set.

Figure 17.

Performance in controlled scenario with 1 training image per video with fixed gallery set.

Figure 18.

Performance in controlled scenario with 5 training image per video with fixed gallery set.

Figure 19.

Performance in adverse scenario with 1 training image per video with fixed gallery set.

Figure 20.

Performance in adverse scenario with 5 training image per video with fixed gallery set.

Advertisement

6. Conclusion

In this chapter a new biologically inspired approach called Adaptive Fitness Approach (AFA) for identifying faces form video sequences is proposed. The fitness value of each image in the gallery set is calculated and accumulated as the probe video frames are processed. To schemes are used with the AFA approach. First scheme employs discarding of unfit images from gallery followed by an update of the feature vectors. In the second scheme gallery and thus the feature vectors are kept fixed.

In order to demonstrate the proposed AFA approach with updated and fixed gallery schemes, PCA and LBP derived features are employed for convenience. Performance of both schemes is far superior to single frame based PCA or LBP approaches. Even for very small number of training images. The adaptive fitness framework is also shown to conveniently accommodate fusing of different feature vectors with further and significant improvement in recognition performance over the AFA with single feature.

References

  1. 1. AhonenT.HadidA.PietikainenM. 2004 Face Recognition with Local Binary Patterns, in: Processing European Conference on Computer Vision, 469481
  2. 2. BelhumeurP.HespanhaJ.KriegmanD. 1997 Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection, IEEE Transaction on pattern analysis and machine learning, 19 7 711720
  3. 3. BrunelliR.PoggioT. 1993 Face Recognition: Features Versus Templates, IEEE Transaction on pattern analysis and machine learning, 15 10 10421052
  4. 4. EleyanA.DemirelH. 2007 PCA and LDA Based Neural Networks for Human Face Recognition, In: Face Recognition, Kresimir Delac and Mislav Grgic (Ed.), 93106 , 978-3-90261-303-5 I-Tech Education and Publishing, Croatia
  5. 5. EleyanA.OzkaramanliH.DemirelH. 2008 Complex Wavelet transform-based Face Recognition. EURASIP Journal on Advances in Signal Processing, 2008 Article ID 185281, 13 pages
  6. 6. EleyanA.OzkaramanliH.DemirelH. 2009 Adaptive and Fixed Eigenspace Methods with a Novel Fitness Measure for Video based Face Recognition, 24th International Symposium on Computer and Information Sciences, 636640
  7. 7. EleyanA.DemirelH. 2011 Co-Occurrence Matrix and Its Statistical Features as a New Approach for Face Recognition, Turkish Journal of Electrical Engineering & Computer Sciences, 19 97107
  8. 8. KirbyM.SirovichL. 1990 Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces. IEEE Transaction on pattern analysis and machine learning, 12 103108
  9. 9. LiB.ChellappaR. 2002 A generic Approach to Simultaneous Tracking and Verification in Video, IEEE Transaction on Image Processing, 11 530544
  10. 10. MartinezA. M.KakA. C. 2001 PCA versus LDA, IEEE Transaction on pattern analysis and machine learning, 23 228233
  11. 11. NilssonM.NordbergJ.ClaessonI. 2007 Face Detection using Local SMQT Features and Split up Snow Classifier, ICASSP 2007 2 pp: II-589 -II-592
  12. 12. OjalaT.PietikainenM.MaenpaaT. 2002 Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns IEEE Transaction on pattern analysis and machine learning, 24 7 971987
  13. 13. PhilippsP. J.MoonH.RivziS.RossP. 2000 The Feret Evaluation Methodology for Face-Recognition Algorithms, IEEE Transaction on pattern analysis and machine learning, 22 10901104
  14. 14. PopoviciV.ThiranJ.Bailly-BailliereE.BengioS.BimbotF.HamouzM.KittlerJ.MariethozJ.MatasJ.MesserK.RuizB.PoireeF. 2003 The BANCA Database and Evaluation Protocol, in: Proceedings of the International Conference on Audio- and Video-Based Biometric Person Authentication, 625638
  15. 15. SteffensJ.ElaginE.NevenH. 1998 Personspotter-Fast and Robust System for Human Detection, Tracking, and Recognition, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition, 516521
  16. 16. SinhaP.BalasB.OstrovskyY.RussellR. 2006 Face Recognition by Humans: Nineteen Results All Computer Vision Researchers Should Know About, Proceedings of the IEEE 94, 11 19481962
  17. 17. TurkM.PentlandA. 1991 Eigenfaces for Recognition, Journal of Cognitive Neuroscience, 3 7186
  18. 18. WechslerH.KakkadV.HuangJ.GuttaS.ChenV. 1997 Automatic Video-Based Person Authentication using The RBF Network, in: Proceedings of the International Conference on Audio- and Video-based Biometric Person Authentication, 8592
  19. 19. WiskottL.FellousJ. M.KrugerN.MalsburgC. 1997 Face Recognition by Elastic Bunch Graph Matching, IEEE Transaction on pattern analysis and machine learning, 19 7 775780
  20. 20. ZhaoW.ChellappaR.RosenfeldA.PhillipsP. J. 2003 Face Recognition: A Literature Survey, ACM Computing Surveys, 399458
  21. 21. ZhouS.KruegerV.ChellappaR. 2003 Probabilistic Recognition of Human Faces from Video, Computer Vision and Image Understanding, 91 214245

Written By

Alaa Eleyan, Hüseyin Özkaramanli and Hasan Demirel

Submitted: 06 November 2010 Published: 01 August 2011