Face recognition methods overview.
Face recognition, as one of the most successful applications of image analysis, has recently gained significant attention. It is due to availability of feasible technologies, including mobile solutions. Research in automatic face recognition has been conducted since the 1960s, but the problem is still largely unsolved. Last decade has provided significant progress in this area owing to advances in face modelling and analysis techniques. Although systems have been developed for face detection and tracking, reliable face recognition still offers a great challenge to computer vision and pattern recognition researchers. There are several reasons for recent increased interest in face recognition, including rising public concern for security, the need for identity verification in the digital world, face analysis and modelling techniques in multimedia data management and computer entertainment. In this chapter, we have discussed face recognition processing, including major components such as face detection, tracking, alignment and feature extraction, and it points out the technical challenges of building a face recognition system. We focus on the importance of the most successful solutions available so far. The final part of the chapter describes chosen face recognition methods and applications and their potential use in areas not related to face recognition.
- face recognition
- biometric identification
- image processing
Recent advances in automated face analysis, pattern recognition and machine learning have made it possible to develop automatic face recognition systems to address these applications. On the one hand, recognising face is natural process, because people usually do it effortlessly without much conscious. On the other hand, application of this process in area of computer vision remains a difficult problem. Being part of a biometric technology, automated face recognition has a plenty of desirable properties. They are based on the important advantage—non‐invasiveness. The various biometric methods can be distinguished into physiological (fingerprint, DNA, face) and behavioural (keystroke, voice print) categories. The physiological approaches are more stable and non‐alterable, except by severe injury. Behavioural patterns are more sensitive to human overall condition, such as stress, illness or fatigue.
The brief analysis of the face detection techniques using effective statistical learning methods seems to be crucial as practical and robust solutions.
Figure 1 points out the basic elements of the typical face recognition system.
Face detection performance is a key issue, so techniques for dealing with non‐frontal face detection are discussed. Subspace modelling and learning‐based dimension reduction methods are fundamental to many current face recognition techniques. Discovering such subspaces so as to extract effective features and construct robust classifiers stands another challenge in this area. Face recognition has merits of both high accuracy and low intrusive, so it has drawn the attention of the researches in various fields from psychology, image processing to computer vision.
The first stage is face detection in the acquired image that is regardless of scale and location. It often uses an advanced filtering procedure to distinguish locations that represent faces and filters them with accurate classifiers. It is notable that all translations, scaling and rotational variations have to be dealt in the face detection phase. For example, regarding to [1,2], facial expressions and hairstyle changes or smiling and frowning face still stands important variations during pattern recognition stage.
In the next step, anthropometric data set‐based system predicts the approximate location of the principal features such as eyes, nose and mouth. Of course, whole procedure is repeated to predict the subfeatures, relative to principal features, and verified with collocation statistic to reject any mislocated features.
Dedicated anchor points are generated as the result of geometric combinations in the face image and then it starts the actual process of recognition. It is carried out by finding local representation of the facial appearance at each of the anchor points. The representation scheme depends on approach. In order to deal with such complication and find out the true invariant for recognition, researchers have developed various recognition algorithms.
There are several boundaries for current face recognition technology (FERET). In [3,4] was provided early benchmark of face recognition technologies. While under ideal conditions, performance is excellent, under conditions of changing illumination, expression, resolution, distance or aging, performance decreases significantly. It is the fact that face recognition systems are still not very robust regarding to deviations from ideal face image. Another problem is an effective way of storing and access granting to facial code (or facial template) stored as a set of features and extracted from image or video.
Considering roughly presented elements above of the complex process of face recognition, a number of limitations and imperfections can be seen. They require clarification or replacing by new algorithms, methods or even technologies.
In this chapter, we have discussed face recognition processing, including major components such as face detection, tracking, alignment and feature extraction, and it points out the technical challenges of building a face recognition system. We focus on the importance of the most successful solutions available so far.
The final part of the chapter describes chosen face recognition methods and applications and their potential use in areas not related to face recognition.
The need for this study is justified by an invitation to participate in the further development of a very interesting technology, which is face recognition.
Despite the fact, there is continual performance improvement regarding several face recognition technology areas, and it is worth to note that current applications also impose new requirements for its further development.
2. Previous methods
2.1. Classical face recognition algorithms
There has been a rapid development of the reliable face recognition algorithms in the last decade. The traditional face recognition algorithms can be categorised into two categories: holistic features and local feature approaches. The holistic group can be additionally divided into linear and nonlinear projection methods.
Many applications have shown good results of the linear projection appearance‐based methods such as principal component analysis (PCA) , independent component analysis (ICA) , linear discriminate analysis (LDA) [7,8], 2DPCA  and linear regression classifier (LRC) .
However, due to large variations in illumination conditions, facial expression and other factors, these methods may fail to adequately represent the faces. The main reason is that the face patterns lie on a complex nonlinear and non‐convex manifold in the high‐dimensional space.
In order to deal with such cases, nonlinear extensions have been proposed like kernel PCA (KPCA), kernel LDA (KLDA)  or locally linear embedding (LLE) . The most nonlinear methods using the kernel techniques, where the general idea consists of mapping the input face images into a higher‐dimensional space in which the manifold of the faces is linear and simplified. So the traditional linear methods can be applied.
Although PCA, LDA and LRC are considered as linear subspace learning algorithms, it is notable that PCA and LDA methods focus on the global structure of the Euclidean space, whereas LRC approach focuses on local structure of the manifold.
These methods project face onto a linear subspace spanned by the eigenface images. The distance from face space is the orthogonal distance to the plane, whereas the distance in face space is the distance along the plane from the mean image. These both distances can be turned into Mahalanobis distances and given probabilistic interpretations .
Despite strong theoretical foundation of kernel‐based methods, the practical application of these methods in face recognition problems, however, does not produce a significant improvement compared with linear methods.
Another family of nonlinear projection methods has been introduced. They inherited the simplicity from the linear methods and the ability to deal with complex data from the nonlinear ones. Among these methods, it is worth to underline: LLE  and locality preserving projection (LPP) . They produce a projection scheme for training data only, but their capability to project new data items is questionable.
In the second category, local appearance features have certain advantages over holistic features. These methods are more stable to local changes such as expression, occlusion and misalignment. The common representative method names local binary patterns (LBPs) [19,20]. The neighbouring changes around the central pixel in a simple but effective way are described by LBP. It is invariant monotonic intensity transformation and supports small illumination variations. Many LBP variants are proposed to improve the original LBP such as histogram of Gabor phase patterns  and local Gabor binary pattern histogram sequence [22,23]. Generally, the LBP is utilised to model the neighbouring relationship jointly in spatial, frequency and orientation domains .
It allows to explore efficiently discriminant and robust information in the pattern. Further development of the mentioned subspace approaches represents discriminant common vectors (DCVs) approach .
The DCV method collects the similarities among the elements in the same class and drops their dissimilarities. Thus, each class can be represented by a common vector computed from the within scatter matrix.
In case of testing an unknown face, the corresponding feature vector is computed and associated to the class with the nearest common vector. Sometimes, kernel discriminative common vectors  or improved discriminative common vectors and support vector machine (SVM)  are introduced for the face recognition task.
Similarly to the LLE method, neighbourhood preserving projection (NPP) and orthogonal NPP (ONPP) are introduced in [27,28]. These approaches preserve the local structure between samples. To reflect the intrinsic geometry of the local neighbourhoods, they use data‐driven weights by solving a least‐squares problem. ONPP forces the mapping to be orthogonal and then solves an ordinary eigenvalue problem. NPP requires solving a generalised eigenvalue problem, regarding to imposing a condition of orthogonality on the projected data.
Block diagram of the traditional face recognition approaches is presented in Figure 2.
However, it is still unclear how to select the neighbourhood size and how to assign optimal values for other hyper‐parameters; for them, sparsity preserving projections [29,30] and LPPs  are also applied for face recognition.
In , a multi‐linear extension of the LDA method called discriminant analysis with tensor representation is proposed. It is different from preserving projection methods and implements discriminant analysis directly on the natural tensorial data to preserve the neighbourhood structure of tensor feature space. Another method of supervised and unsupervised multi‐linear NPP (MNPP) for face recognition is presented in . A survey of multi‐linear methods can be found in . They operate directly on tensorial data rather than vectors or matrices and solve problems of tensorial representation for multidimensional feature extraction and recognition. Multiple interrelated subspaces are obtained in the MNPP method by unfolding the tensor over different tensorial directions. The order of the tensor space determines the number of subspaces derived by MNPP [34,35].
2.2. Artificial Neural Networks in face recognition
A radial basis function neural network integrated with a non‐negative matrix factorisation to recognise faces is presented in . Moreover, for face and speech verifications,  utilise a momentum back propagation neural network. Non‐negative sparse coding method to learning facial features using different distance metrics and normalised cross‐correlation for face recognition is applied in .
A posterior union decision‐based artificial neural network approach is proposed in [33,34]. It has elements of both neural networks and statistical approaches and replenishes methods for recognising face images with partial distortion and occlusion.
2.3. Gabor wavelet‐based solutions
Gabor wavelets have been widely used for face representation by face recognition researchers [44,45,46], and Gabor features are recognised as better representation for face recognition in terms of (rank‐1) recognition rate . Moreover, it is demonstrated to be discriminative and robust to illumination and expression variations . When only one sample image per enrolled subject is available,  propose adaptively weighted sub‐Gabor array for face representation and recognition.
Moreover, two kinds of strategies to capture Gabor texture information: Gabor magnitude‐based texture representation (GMTR) and Gabor phase‐based texture representation (GPTR), are proposed in .
Gamma density to model the Gabor magnitude distribution characterises GMTR approach. The GPTR is characterised by the generalised Gaussian density for modelling the Gabor phase distribution. It allows the estimated model parameters to be served as texture representation of the face.
The Gabor wavelet applied at fixed positions, in correspondence of the nodes of a square‐meshed grid superimposed to the face image, is presented in . Each subpattern of the partitioned face image is defined as the extracted Gabor features that belong to the same row of the square‐meshed grid which are then projected to lower dimension space by Karhunen–Loeve transform. The obtained features of each subpattern, which are weighted using genetic algorithm (GA), are used to train a Parzen Window Classifier. Finally, matching process is done by combining the classifiers using a weighted sum rule.
The learning approach based on Gabor features and kernel supervised Laplacian faces for face recognition under the classifier fusion framework is introduced in . The Gabor features obtained from each channel as a new sample of the same class are used to adopt the classifier fusion strategy. Such approach is useful for improving the performance of the recognition results.
Histogram of Gabor phase feature is proposed in . In [54,55,56,57,58], the patch‐based histograms of local patterns are concatenated together to form the representation of the face image via learned local Gabor patterns. The feature representation problem by providing a learning method instead of simple concatenation or histogram feature is presented in . In , the Gabor features were adopted for the sparse representation (SR)‐based classification and a Gabor occlusion dictionary was learned under the well‐known SR framework.
The main drawback of Gabor‐based methods is that the dimensionality of the Gabor feature space is significantly high since the face images are convolved with a bank of Gabor filters.
However, selecting the most useful method from so many Gabor features is very time‐consuming . Furthermore, extracting the Gabor features is computationally intensive, so the features are currently useless for real‐time applications . A simplified version of Gabor wavelets is introduced in . Unfortunately, the simplified Gabor features are more sensitive to lighting variations in reference to the original Gabor features.
2.4. Face descriptor‐based methods
Local feature‐based face image description provides a global description. So local features of the image are evaluated in the neighbouring pixels and then aggregated to form the final global description [65,66]. This is unlike global methods in which the entire image is utilised to produce each feature, where the first steps start with the description of the face realised at a pixel level by making use of the local neighbourhood of each pixel. Then, the image is divided into a number of subregions, and from each subregion, a local description is formed as a histogram of the pixel level descriptions calculated in the previous step. Next, the information of the regions is combined into the final descriptor by concatenating the partial histograms [67,68].
Learning the most discriminant local features that can minimise the difference of the features between images of a same individual and maximise that between images from other people depending on the nature of these descriptors, which compute an image representation from local patch statistics stands the main idea of the approach.
The face verification accuracy ranked on the LFW benchmark after face verification using multiple local descriptors designed to capture statistics of local patch similarities is proposed in . Enhancing the face recognition performance by introducing the discriminative learning into three steps of LBP‐like feature extraction is presented in .
The discriminant image filters, the optimal soft sampling matrix and the dominant patterns are all learned from images. The general advantage of these methods is compact, highly discriminative and easy to extract learning‐based descriptor. These methods are discriminative and robust to illumination and expression changes.
2.5. 3D‐based face recognition
As 3D capturing process is becoming cheaper and faster , it is commonly thought that the use of 3D sensing has the potential for greater recognition accuracy than 2D. The advantage behind using 3D data is that depth information does not depend on pose and illumination, and therefore, the representation of the object does not change with these parameters, making the whole system more robust. 3D‐based techniques can achieve better robustness to pose variation problem than 2D‐based ones. A comprehensive survey of the 3D face recognition approaches is presented in .
A method for face recognition across variations in pose, which combines deformable 3D models with a computer graphics simulation of projection and illumination, can be found in . In this method, faces are represented by model parameters for 3D shape and texture. Their 3D morphable models are combined with spherical harmonics illumination representation  to recognise faces under arbitrary unknown lighting.
Using facial symmetry to handle pose variation in 3D face recognition is presented in , where an automatic landmark detector is used. It helps to estimate pose and detects occluded areas for each facial scan. Subsequently, an annotated face model is registered and fitted to the scan. During fitting, facial symmetry is used to overcome the challenges of missing data .
There is a generic 3D elastic model for pose invariant face recognition proposed in . It is constructed for each subject in the database using only a single 2D image by applying the 3D generic elastic model (3DGEM) approach. Each 3D model is subsequently rendered at different poses within a limited search space about the estimated pose, and the resulting images are matched against the test query. Finally, the distances between the synthesised images and test query are computed by using a simple normalised correlation matcher to show the effectiveness of the pose synthesis method to real‐world data.
In , a geometric framework for analysing 3D faces, with the specific goals of comparing, matching and averaging their shapes, is proposed to represent facial surfaces by radial curves emanating from the nose tips.
3D face recognition approach based on local geometrical signatures called facial angular radial signature (ARS) that can approximate the semi‐rigid region of the 3D face is proposed in . The authors employed KPCA to map the raw ARS facial features to mid‐level features to improve the discriminating power. Finally, the resulting mid‐level features are combined into one single feature vector and fed into the SVM to perform face recognition [80, 81, 82, 83, 84, 85, 86].
The drawback of using 3D data in face recognition is that these face recognition approaches need all the elements of the system to be well calibrated and synchronised to acquire accurate 3D data (texture and depth maps). The existing 3D face recognition approaches rely on a surface registration or on complex feature (surface descriptor) extraction and matching techniques. They are, therefore, computationally expensive and not suitable for practical applications. Moreover, they require the cooperation of the subject making them not useful for uncontrolled or semi‐controlled scenarios where the only input of the algorithms will be a 2D intensity image acquired from a single camera.
2.6. Video‐based face recognition
The analysis of video streams of face images has received increasing attention in biometrics . An immediate advantage in using video information is the possibility of employing redundancy present in the video sequence to improve still image systems. Although significant amount of research has been done in matching still face images, the use of videos for face recognition is relatively less explored . The first stage of video‐based face recognition (VFR) is to perform re‐identification, where a collection of videos is cross‐matched to locate all occurrences of the person of interest .
Generally, VFR approaches can be classified into two categories based on how they leverage the multitude of information available in a video sequence: (i) sequence based and (ii) set based, where at a high level, what most distinguishes these two approaches is whether or not they utilise temporal information [90, 91].
The formulation of a probabilistic appearance‐based face recognition approach is extended in . Originally, it was defined to do recognition from a single still image as previously explained, to work with multiple images and video sequences. In , there is the constrained subspace spanned from face images of a clip into a convex hull and then calculate the nearest distance of two convex hulls as the between‐set similarity. Thus, each test and training example is a set of images of a subject's face, not just a single image, so recognition decisions need to be based on comparisons of image sets.
In , VFR task is converted into the problem of measuring the similarity of two image sets, where the examples from a video clip construct one image set. The authors consider face images from each clip as an ensemble and formulate VFR into the joint sparse representation (JSR) problem. In JSR, to adaptively learn the sparse representation of a probe clip, they simultaneously consider the class‐level and atom‐level sparsity, where the former structures the enrolled clips using the structured sparse regulariser and the latter seeks for a few related examples using the sparse regulariser.
In order to identify the most important advantages and imperfections, discussed above methods are summarised in Table 1.
|Focuses on local structure|
of the manifold.
These methods project
face onto linear subspace spanned by the eigenface images. The distance from face space is
orthogonal to the plane
of mean image, so may be
easily turned to Mahalanobis distances
with probabilistic interpretation
|These methods may fail to|
adequately represent faces when
large variations in illumination facial
expressions and other factors occur. Regarding to , applying kernel‐based nonlinear methods do not produce a
significant improvement comparing to linear methods. LLE, LLP and LBP brought simple and effective
way to describe neighbouring changes in face
description. Subspace approaches
were applied in DCV‐ and SVM‐based methods. Preserving the local structure between samples is the domain of NPP and ONPP methods.
The problem is that it is still
unclear how to select the neighbourhood size or assign optimal values for them
|2.||Artificial neural networks||Radial basis function artificial|
neural network is naturally integrated
approaches for process
simplification regarding to
ANNs native linearisation feature and computation speed up.
Ideal solution, especially
for recognising face
images with partial distortion and occlusion
|The main disadvantage|
of this approach is
requirement of greater number of
training samples (instead one or limited number). It is inaccurate in the
same way like other statistically based methods
|3.||Gabor wavelets||The Gabor wavelets exhibit|
desirable characteristics of capturing
salient visual properties like spatial localisation orientation
selectivity and spatial
favour this approach
|The drawback of the|
Gabor‐based methods is
significantly high dimensionality of
the Gabor feature space since
face image is convolved with a
bank of Gabor filters.
Approach is computationally
intensive and impractical for real‐time applications.
Gabor features are sensitive
to lightning variations
|4.||Face descriptor‐based methods||The main idea behind developing|
image descriptors is to learn
the most discriminant local
features that minimise difference between
images of the same individual
and maximise that between images from the other people.
These methods are
discriminative and robust
to illumination and expression
changes. They offer compact,
easy to extract and highly discriminative descriptor
|Approach is computationally intensive during descriptor extraction stage, but encouraging simplicity and performance in reference to online applications|
2D capturing process
and has greater
potential for accuracy.
The depth information does not
depend on the pose
making solution more robust
|Require all the elements of the 3D face recognition system to be well calibrated and synchronised to existing 3D data. Computationally expensive and not suitable for practical applications|
|6.||Video‐based recognition||The main advantage|
of the approach is
possibility of employing redundancy
present in video
to improve still image systems
|Relatively poorly investigated.|
Multiply problems with measuring similarity of two (or more) images
Methods indicated in the Table 1 illustrate the evolution of face recognition technology. The huge potential of face descriptor‐based methods ought to be emphasised, regarding to the fact the local descriptor idea has been recently recognised as the most crucial design framework for face identification and verification tasks .
3. Face recognition applications
Many published works mention numerous applications in which face recognition technology is already utilised including entry to secured high‐risk spaces such as border crossings as well as access to restricted resources [95, 96, 97]. On the other hand, there are other application areas in which face recognition has not yet been used. The potential application areas of face recognition technology can be outlined as follows :
Automated surveillance, where the objective is to recognise and track people .
Monitoring closed circuit television (CCTV), the facial recognition capability can be embedded into existing CCTV networks, to look for lost children or other missing persons or tracking known or suspected criminals.
Image database investigations, searching image databases of licensed drivers, benefit recipients and finding people in large news photograph and video collections [99, 100], as well as searching in the Facebook social networking web site .
Multimedia environments with adaptive human computer interfaces (part of ubiquitous or context aware systems, behaviour monitoring at childcare or centres for old people, recognising customers and assessing their needs) .
Airplane‐boarding gate, the face recognition may be used in places of random checks merely to screen passengers for further investigation. Similarly, in casinos, where strategic design of betting floors that incorporates cameras at face height with good lighting could be used not only to scan faces for identification purposes, but possibly to afford the capture of images to build a comprehensive gallery for future watch‐list, identification and authentication tasks .
Sketch‐based face reconstruction, where law enforcement agencies in the world rely on practical methods to help crime witnesses reconstruct likenesses of faces . These methods range from sketch artistry to proprietary computerised composite systems [105, 106, 107].
Forensic applications, where a forensic artist is often used to work with the eyewitness in order to draw a sketch that depicts the facial appearance of the culprit according to his/her verbal description. This forensic sketch is used later for matching large facial image databases to identify the criminals [108, 109]. Yet, there is no existing face recognition system that can be used for identification or verification in crime investigation such as comparison of images taken by CCTV with available database of mugshots. Thus, utilising face recognition technology in the forensic applications is a must as discussed in [110, 111].
Face spoofing and anti‐spoofing, where a photograph or video of an authorised person's face could be used to gain access to facilities or services. Hence, the spoofing attack consists in the use of forged biometric traits to gain illegitimate access to secured resources protected by a biometric authentication system [112, 113]. It is a direct attack to the sensory input of a biometric system, and the attacker does not need previous knowledge about the recognition algorithm. Research on face spoof detection has recently attracted an increasing attention , introducing few number of face spoof detection techniques [115, 116, 117]. Thus, developing a mature anti‐spoofing algorithm is still in its infancy and further research is needed for face spoofing applications [118, 119].
There have been envisaged many applications for face recognition, but most of commercial ones exploit only superficially the great potential of this technology. Most of the applications are notable limited in their ability to handle pose, lighting changes or aging.
In reference to
The most of physical access control systems uses face recognition combination with other biometrics, for example speaker identification and lip motion .
One of the most interest in face recognition in application domain is associated with surveillance. Regarding to the generous type of information it contains, video is the medium of choice for surveillance. For applications that require identification, face recognition is the best biometric for video data. The biggest advantage of this approach is passive participation of subject (human). The whole process of recognition and identification can be carried out without the person's knowledge.
Although the development of face recognition surveillance systems has already begun, the technology seems to not accurate enough. It also brings additional problems concerning highly extensive perception in the data gathering and computing side of such complex solutions.
Another future domain, where face recognition is expected to become important, is area of pervasive or ubiquitous computing. Computing devices equipped with sensors become more widespread in reference to together networking. Such approach will allow envisage a future where the most of everyday objects are going to have some computational power, allowing to precisely adapt their behaviour to various factors including time, user, user control or host.
This vision assumes easy information exchange, also including images between devices of different types.
Currently, the most of devices have simple user interface, controlled only by active commands on the part of the user. Some of the devices are able to sense environment and acquire information about the physical word and the people within their region of interest. One of the crucial part of smart devices of human awareness is knowing the identity of the users close to a device, even currently implemented in several smartphones with different results. It is important when contributed with other biometrics regarding to passive nature of face recognition.
Face recognition is still a challenging problem in the field of computer vision. It has received a great deal of attention over the past years because of its several applications in various domains. Although there is strong research effort in this area, face recognition systems are far from ideal to perform adequately in all situations form real world. Paper presented a brief survey of issues methods and applications in area of face recognition. There is much work to be done in order to realise methods that reflect how humans recognise faces and optimally make use of the temporal evolution of the appearance of the face for recognition.