Open access peer-reviewed chapter

Spatial Domain Representation for Face Recognition

By Toshanlal Meenpal, Aarti Goyal and Moumita Mukherjee

Submitted: November 16th 2018Reviewed: February 22nd 2019Published: September 27th 2019

DOI: 10.5772/intechopen.85382

Downloaded: 139

Abstract

Spatial domain representation for face recognition characterizes extracted spatial facial features for face recognition. This chapter provides a complete understanding of well-known and some recently explored spatial domain representations for face recognition. Over last two decades, scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG) and local binary patterns (LBP) have emerged as promising spatial feature extraction techniques for face recognition. SIFT and HOG are effective techniques for face recognition dealing with different scales, rotation, and illumination. LBP is texture based analysis effective for extracting texture information of face. Other relevant spatial domain representations are spatial pyramid learning (SPLE), linear phase quantization (LPQ), variants of LBP such as improved local binary pattern (ILBP), compound local binary pattern (CLBP), local ternary pattern (LTP), three-patch local binary patterns (TPLBP), four-patch local binary patterns (FPLBP). These representations are improved versions of SIFT and LBP and have improved results for face recognition. A detailed analysis of these methods, basic results for face recognition and possible applications are presented in this chapter.

Keywords

  • spatial domain representation
  • face recognition
  • scale-invariant feature transform
  • histogram of oriented gradients
  • local binary patterns

1. Introduction

Face recognition is a powerful biometric system in today’s highly technological world. It is widely accepted over other biometric systems like, finger print, iris or speech recognition for security, surveillance, and commercial applications. Face recognition system is generally a procedure of multiple major stages: face detection, preprocessing, feature extraction and verification. A complete structure of face recognition system is shown in Figure 1. Face detection detects a single face or number of faces present in a given image. Viola-Jones face detection algorithms using Haar features [1], faster R-CNN face detector [2], and face detection based on Histograms of Oriented Gradient [3] are popular methods for detecting faces in an image. Generally, images are captured under unconstrained environment and hence needed to be preprocessed before feeding to feature extraction stage. Preprocessing mainly aims to reduce noise effect, difference of illumination, color intensity, background, and orientation. The correct recognition of image depends upon quality of captured image, lighting condition etc. [4]. Recognition rate can be improved by performing pre-processing on the captured image. Various pre-processing techniques are used in image processing to improve the recognition rate such as cropping, image resizing, histogram equalization and de-nosing filtering as described below.

  1. Face Detection and Cropping: - Face detection involves detecting face image from whole image. Cropping can be done based on one or more features of the image such as eyes, lips, nose etc.

  2. Image Resizing: - Variation in face image size, shape, pose etc. raises difficulty for designing face recognition algorithms. So it is very important to resize image before feature extraction. For this, face images are cropped again into a standard size. Affine transformation can be applied on face with Bilinear Interpolation algorithm.

  3. Image Equalization: - Illumination variation problem in the original resized image is overcome by using histogram equalization.

  4. Image De-noising and Filtering: - Raw images are captured with many noise during the time of capturing the image and later also. Wiener filter and median filter are used to remove noises [5].

Figure 1.

A complete structure of face recognition system.

Next is feature extraction which is considered as the most prominent stage in face recognition system to extract discriminative facial features. Extracted features are then represented as feature vector and are fed to verification stage. Feature selection is an optional stage before verification which reduces feature vector dimensions using dimensional reduction techniques [6]. Final stage is verification to identify an unknown by finding closest matching in gallery.

2. Existing face databases

There are a number of benchmark face databases for fair face recognition evaluation by researchers. These databases are designed with images or videos of a number of individuals with varying conditions and resolutions. A summary of benchmark face databases is tabulated in Table 1.

DatabaseNo. of individualConditionsImage ResolutionImages
A&T Database [7]40
40
40
40
40
Lighting,
Open eye,
closed eyes, smiling, not smiling, glasses, no glasses
92 × 112400
CAS-PEAL-R1 [8]1040
377
438
233
297
296
66
Pose
Facial expressions
Accessory
Illumination
Background
Distance
Time
360 × 48030,900
CMU Multi-PIE Database [9]68Pose
Illumination
Facial expressions
640 × 48641,368
FERET [10]1199Pose
Illumination
Facial expressions
Time
256 × 38414,051
Korean Face Database (KFDB) [11]1000Pose
Illumination
Facial expressions
640 × 48052,000
Yale Face Database B [12]10Pose
Illumination
640 × 4805850

Table 1.

Summary of benchmark face recognition databases.

A detailed structure of some of these face databases are provided below.

2.1 A&T Database

A&T Database originally known as ORL database has face images captured in the interval April 1992 to April 1994. This database is collected by researchers of Cambridge University Engineering department for face recognition project. There are total 400 images in A&T database captured by taking 10 different images of 40 individuals. All images are captured in a dark homogeneous background with resolution 92 × 112 pixels. Different varying conditions under which images captured are- times, lighting, open eyes, closed eyes, smiling, not smiling, glasses, no glasses, some images also have rotation variation. This database has 40 different directories, each with 10 images of an individual stored as .pgm format. Samples of images of A&T database is shown in Figure 2.

Figure 2.

Samples of images of A&T database with 10 varying conditions [7].

2.2 CAS-PEAL-R1

CAS-PEAL-R1 Database is collected under sponsors of National Hi-Tech Program and ISVISION by the Face Recognition Group of JDL, ICT, CAS. This database contains 30,900 images of 1040 individuals captured under different conditions as such, variation in pose, facial expression, accessory, illumination, background, distance, and time. For pose variation, each of 1040 individuals has approximately 21 different poses. Facial expression is captured for 377 individuals with 6 different expressions, similarly for accessory, 6 different images of 438 individuals with different accessory are used. Illumination variation has images of 233 individuals captured for minimum 10 and maximum 31 lighting variations. Background variation has images of 297 individuals for 2 to 4 different backgrounds. Further distance and time parameters have 296 and 66 individuals at an interval of 6-month. Samples of images of CAS-PEAL-R1 database are shown in Figure 3.

Figure 3.

Samples of images of CAS-PEAL-R1 database [8].

2.3 CMU Multi-PIE Database

CMU Multi-PIE Database is collected from October 2000 to December 2000 by taking 41,368 images of 68 individuals designed for 14 different poses, 43 illumination variation, and 4 different expressions. This database is known as CMU Multi-PIE by its varying conditions- pose, illumination, and expression. Image resolution is set to resolution 640 × 486 pixels. Samples of images of CMU Multi-PIE database is shown in Figure 4.

Figure 4.

Samples of images of CMU Multi-PIE Database [9].

This chapter mainly focuses on feature extraction stage in face recognition. It presents some well-known and recently explored spatial domain representations for face recognition. Scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns (LBP) are most commonly used spatial feature representations over past decade. Recently, other relevant feature representations, such as, spatial pyramid learning (SPLE), linear phase quantization (LPQ), variants of LBP such as improved local binary pattern (ILBP), compound local binary pattern (CLBP), local ternary pattern (LTP), three-patch local binary patterns (TPLBP), four-patch local binary patterns (FPLBP) are effectively used for face recognition.

3. Histogram of oriented gradients (HOG)

Histogram of oriented gradients (HOG) is introduced by Dalal et al. [13] in 2005 for human detection. HOG is an effective descriptor for face recognition by computing normalized histograms of face gradient orientations in dense grid [14]. Basically, HOG generates local appearance and shape of face rather than local intensity gradients. HOG is based on computation, fine orientation binning, normalization and descriptor blocks.

A detailed implementation for extracting HOG features for face recognition is given as:

  1. 1. Facial image is first divided into small regions called cells. For an image of size 64 × 64, overlapping cells of 8 × 8 pixels are obtained. Gradient directions over pixels are computed for each cell. Simple 1-D derivatives are used in horizontal and vertical directions with the following masks:

    Dx=101E1
    andDy=101E2

Results for a sample facial image using horizontal (Dx)and vertical Dyderivative masks are shown in Figure 5.

  1. Next step is fine orientation binning for extracting HOG features. Histogram channels are evenly selected in the range 0–180° for unsigned and 0–360° for signed gradient. Each cell can contribute in the form of pixel magnitude, gradient magnitude, square root or square of magnitude. In general, gradient magnitude yields the best results while square root reduces the performance [13].

  2. Gradients in each cell are normalized for local contrast normalization. Cell gradients are normalized from all blocks and are concatenated to form HOG feature vector. Dalal et al. [13] proposed 9 histogram channels (bins) to be computed for unsigned gradient. Hence, for 64 × 64 image, 1764 dimensional HOG feature vector is obtained representing full facial appearance. It can be explained as:

    64×648×8×50%overlapping=196blocksE3
    196blocks×9bin=1764dimensionalHOGvectorE4

  • Different normalization schemes are presented in [15] for block normalization. Let ν represents un-normalized block with νkas kthnorm for k = 1, 2 and ϱ a small constant. Different normalization schemes used are L1-norm, L1-sqrt, L2-norm and L2-hys. Generally, L2-hys is used for block normalization. L2-hys is obtained by first computing L2-norm and then clipping such that maximum value of ν is limited to 0.2 and then renormalizing.

  • Figure 5.

    Sample facial image and resultant derivatives. (a) Horizontal derivative. (b) Vertical derivative.

    Sample input facial image and resultant HOG features are shown in Figure 6.

    Figure 6.

    Sample example of (a) Input facial image of size 64 × 64. (b) Resultant HOG features (1764 dimensions).

    4. Scale invariant feature transform (SIFT)

    Scale invariant feature transform (SIFT) is introduced by Lowe et al. [16] for extracting discriminative invariant features in an image. SIFT descriptor is widely used for facial feature representation by extracting blob-like local features [17]. These features are invariant to scale, translation and rotation resulting reliable matching. SIFT is described in four sections as: (1) Detection of scale-space extrema, (2) Detection of local extrema, (3) Orientation assignment, and (4) Keypoint descriptor representation.

    4.1 Detection of scale-space extrema

    First step is to identify keypoints in scale-space of grayscale input image fabwhich is defined as:

    Labσ=GabσfabE5
    such that,Gabσ=12Пσ2ea2+b2/2σ2E6

    where σis standard deviation of Gaussian Gabσ.

    Two closest scales of image with difference of multiplication factor kare used to effectively detect extrema in scale-space. Difference of Gaussian (DOG) is computed by taking difference of these two scaled versions of image convolved with original image given as:

    Dabσ=GabGabσfab=LabLabσE7

    4.2 Detection of local extrema

    Local extrema (maxima/minima) of Dabσis calculated by comparing sample pixel with eight neighbors in 3 × 3 patch as well as nine neighbors above and below scaled images. To select sample point as local minima, it should be smaller than all 26 neighbors whereas for local maxima, selected point should be larger than all neighbors. After keypoint localization, low contrast and poorly localized points are removed by computing |Dabσand discarding points with lower value to defined threshold.

    4.3 Orientation assignment

    Orientation assignment to each keypoint results in rotation invariance. For each Gaussian smoothened image Lab, orientation is assigned by computing gradient magnitude mab, and gradient direction θabby its neighbor using Eqs. (8) and (9) respectively.

    mab=La+1bLa1b2+Lab+1Lab12E8
    θab=tanhLab+1Lab1/La+1bLa1b)E9

    4.4 Keypoint descriptor representation

    Finally, each detected keypoint is represented as 128 dimensional feature vector. This is obtained by computing magnitude and orientation of gradient at each point in 16 × 16 sized patch of an image. Each 16 × 16 patch is subdivided into 4 × 4 non-overlapping regions such that each 4 × 4 region is represented by 8 bins. Hence, each keypoint descriptor is represented by 4 × 4 × 8 = 128 length vector.

    Figure 7 shows an example of assignment of SIFT descriptor for 8 × 8 neighborhood. Length of each arrow corresponds sum of gradient magnitude in a specific direction for 4 × 4 region.

    Figure 7.

    Example of (a) Image gradients of 2 × 2 patch computed from 8 × 8 neighborhood. (b) Resultant SIFT keypoint descriptor.

    Processing flow to generate SIFT features for face recognition is shown in Figure 8. Input original image is first preprocessed and difference of Gaussian pyramid is generated as in Figure 8(c). Final resultant SIFT keypoints are then represented as feature vector to be fed to classifier for face recognition.

    Figure 8.

    Processing flow of SIFT for face recognition. (a) Original image. (b) Processed image. (c) Difference of Gaussian Pyramid. (d) SIFT keypoints.

    5. Linear phase quantization (LPQ)

    Local phase quantization (LPQ) introduced by Ojansivu et al. [18, 19] is blur tolerant texture based descriptor. LPQ is based on blur invariance property of frequency domain phase spectrum of an image. LPQ for face recognition is investigated by Ahonen et al. [20] and reported improved results for blurred facial images.

    LPQ on an image pixel is applied by using short-term Fourier transform (STFT) over M×Mpatch with image as center and four scalar frequencies. Imaginary and real components are then whitened and binary quantized to generate LPQ code for respective pixel. Complete process is detailed in Figure 9 where LPQ code is obtained for an image pixel [21]. Similarly, final LPQ feature vector can be obtained by shifting M×Mpatch over the entire image.

    Figure 9.

    LPQ encoding scheme. (a) Input 5×5 patch. (b) Frequency domain representation. (c) LPQ code.

    Spatial blurring is performed by convolving grayscale input image fabto point spread function (PSF). Frequency domain analysis can be represented as:

    Huv=Fuv.PuvE10

    here, FuvandPuvare DFT of original image and PSF respectively. Huvis DFT of resultant blurred image.

    Phase spectrum is obtained as:

    Huv=Fuv+PuvE11

    Now, if PSF is positive and even, then Puvmust be either 0or П, such that Puv=0for Puv0while, Puv=Пfor Puv<0.

    Since, shape of Puvgenerally selected is similar to Gaussian function, low frequency value of Puvis positive. This results Puv=0and Eq. (11) becomes Huv=Fuv.Hence, it can be stated that LPQ possesses blur invariant property. Detailed mathematical analysis of LPQ can be obtained from [21].

    6. Local binary patterns (LBP)

    Local Binary Patterns (LBP) is introduced by Ojala et al. [22] as rotation invariant texture based feature descriptor. LBP as feature representation for face recognition is proposed by Ahohen et al. [23]. It stated that texture analysis of a local facial region represents its local appearance and fusion of all regions can generate an encoded global geometry of face.

    Consider an input image and let fabbe its preprocessed version. Basic LBP operator on 3×3neighborhood of faband generated decimal code for center pixel is shown in Figure 10. LBP operator replaces each pixel of fabwith a calculated decimal code resulting in LBP encoded image fLBPab. It is done by thresholding each pixel of 3×3neighborhood with its center pixel. Resultant is a binary code which is then converted into corresponding decimal code. Center pixel is then replaced by decimal code of generated binary stream. LBP code assigned to center pixel is given by Eq. (12). Here, icrepresents center pixel, cnis gray level of neighbor pixels, and cpis gray level of center pixel.

    Figure 10.

    Basic LBP operator on 3×3 neighborhood for fab. (a) Preprocessed image. (b) 3×3 Neighborhood. (c) Corresponding gray levels of each pixel. (d) Result after thresholding. Finally, center pixel is replaced by code 42.

    LBPP,Ric=m=0P1scncp2ms=1ifcncp>00otherwiseE12

    Ahohen et al. [23] proposed that LBP operator can be used with varying neighborhood size M×Mand radius Rto deal with different image scales. Notation PRis used to represent Psampling points or neighbor pixels around center pixel for radius R. Thresholding is then performed by comparing center pixel with Pneighbor pixels. Example of some selected values of PRis shown in Figure 11.

    Figure 11.

    Different P and R combinations for LBP operator.

    LBP for face recognition processes by building local LBP descriptor to represent local region and then combined to obtain global representation for entire face. Encoded image fLBPabis evenly divided into non-overlapping blocks. Histogram for each block are calculated and final LBP feature vector is built by concatenating all regional histograms. LBP operator provides essential spatial information that plays a key role for face recognition. Complete processing flow to generate LBP feature vector is shown in Figure 12.

    Figure 12.

    Processing flow of LBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) LBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected non-overlapping patch. (f) Final LBP feature vector by concatenating histograms of all patches in image.

    Major advantages of LBP over other spatial feature representations are simple calculations, comparatively smaller feature vector size, more powerful towards noises and illumination balance. In recent years, various variants of LBP are widely implemented in texture analysis. Local ternary patterns (LTP) proposed by Tan et al. [24] is based on a ternary threshold operator. LTP is an improved LBP variant by using two LBP vectors for building one LTP representation. Other variants of LBP are compound local binary pattern (CLBP) [25], three-patch LBP (TPLBP) [26], four-patch LBP (FPLBP) [26] and improved local binary pattern (ILBP) [27]. These representations are verified to be more efficient than LBP against illumination and noise conditions.

    7. Local ternary patterns (LTP)

    Local ternary patterns (LTP) [24] is a generalization of LBP with reduced sensitivity to noise and illumination variations. LTP generates a 3-valued code by including a threshold around zero and improves resistance to noise. LTP works well for noisy images and different lighting conditions.

    In LBP, neighbor pixels are compared with center pixel directly. Hence, a small variation in pixel values due to noise can drastically change LBP code. To overcome this limitation, LTP introduces a threshold ±taround center pixel icand neighbor pixels are compared to generate 3-valued ternary code as:

    LTPP,Ric=m=0P1scncp2mE13
    s=1cpcn+t0cpt<t1cpcptE14

    Here, cpand cnrepresent gray levels of center pixel and neighbor pixels respectively. Understanding of LTP encoding scheme to generate ternary LTP code is shown in Figure 13. Here, threshold tis set to 5, hence with center pixel value 40, the tolerance range is [35, 45]. Neighbor pixels with gray level values in this range is replaced by zero, those above are replaced by 1 and below are replaced by −1 as described in Eq. (14).

    Figure 13.

    LTP encoding scheme to generate ternary LTP code. (a) Preprocessed image. (b) 3×3 Neighborhood. (c) Corresponding gray levels of each pixel. (d) Ternary LTP code after thresholding.

    Resultant ternary LTP code is split into two sub-LTP codes which are treated as two separate channels as shown in Figure 14. Lower and upper sub-LTP codes are generated by replacing ‘-1’ in original ternary code to ‘0’ and ‘1’ respectively. Hence, LTP represents each original image by two encoded images.

    Figure 14.

    Splitting of ternary LTP code to generate lower and upper sub-LTP codes. (a) 3 × 3 neighborhood of an image. (b) Ternary LTP code. (c) Lower sub-LTP code. (d) Upper sub-LTP code. Finally, lower and upper sub-LTP codes obtained are 7 and 168 respectively.

    8. Compound local binary pattern (CLBP)

    Compound local binary pattern (CLBP) proposed by Ahmed et al. [25] is an improved variant of LBP using 2Pbits code. CLBP overcomes limitation of LBP by improving performance in case of flat image. LBP results poor for images with bright spots or dark patches i.e. in case of flat image LBP fails as shown in Figure 15.

    Figure 15.

    LBP code for flat image. (a) 3×3 Neighbouhood of an image. (b) LBP encoded image.

    Original LBP generates Pbits code by taking gray level difference between center pixel and Pneighbor pixels (sampling points). CLPB is an extension to LBP by generating 2Pbits code for Pneighbor pixels. Here, extra Pbits encode magnitude information of difference between center pixel and Ppixels. This way, CLBP increases robustness of texture representation mainly in case of flat images.

    To generate 2Pbits code, CLBP represents each neighbor pixel with two bits for sign and magnitude information. The first bit is same as LBP bit and represents sign of difference between center pixel and respective neighbor pixel. Second bit encodes magnitude of difference with respect to a calculated threshold Mab. This threshold is obtained by taking mean of magnitudes of difference between center pixel and all Ppixels.

    First bit is set to ‘1’ if gray level of neighbor pixel is greater than or equals to center pixel and ‘0’ otherwise. Second bit is ‘1’ if absolute magnitude of difference between neighbor pixel and center pixel is greater than threshold and ‘0’ otherwise. CLBP

    CLBPP,Ric=m=0P1scncp2mE15

    s=00cncp<0,cncpMab01cncp<0cncp>Mab10cncp0cncpMab11otherwiseE16

    CLBP encoding scheme to generate 2Pbits code for 3×3neighborhood of an image is shown in Figure 16. A 16-bits CLBP code is generated after thresholding using Eq. (16). Resultant CLBP code is then split into two 8 bits sub-CLBP codes to reduce possible binary patterns from 216to (2×28).First 8-bits code is concatenation of bits from pixels marked red in Figure 16(c). Again, second 8-bits code is obtained by concatenating bit values from left over pixels. Finally, these sub-CLBP codes are treated as channels for final feature vector representation.

    Figure 16.

    CLBP encoding scheme to generate 2P bits code. (a) 3 × 3 neighborhood of an image. (b) 2P bits CLBP code after thresholding. (c) Separated sub-CLBP codes. (d) Resultant two 8-bit sub-CLBP codes.

    Processing flow to generate histograms of CLBP encoded image for face recognition is shown in Figure 17. It explains how each pixel of original image is converted into CLBP encoded image. Figure 17(c) shows two sub-CLBP encoded images. Histogram of each encoded image are obtained as in Figure 17(d). These histograms can be individually used as separate feature vectors for face recognition or can be concatenated as a single final vector.

    Figure 17.

    Processing flow of CLBP for face recognition. (a) Original image. (b) Preprocessed image. (c) Separated sub-CLBP encoded images. (d) Respective histograms of each encoded image. (e) Concatenated histogram.

    9. Three-patch LBP (TPLBP)

    Original LBP and different variants of LBP generate 1-bit value or 2-bit value (for CLBP) by comparing two pixels, one as center pixel and other as one of the Pneighbor pixels. Wolf et al. [26] proposed two different variants of LBP, namely, Three-patch LBP (TPLBP and Four-patch LBP (FPLBP) by comparing center pixel with more than one neighbor pixels.

    TPLBP assigns each neighbor pixel in encoded image with 1-bit value by comparing gray level of three patches. For each center pixel ic,M×Mpatch is considered and Padditional same sized patches with center at distance of radius Ris selected. Center pixel icis compared with center pixels of two patches at δ distance apart along the ring of radius R. This way, TPLBP generates Pbits code for icas:

    TPLBPP,R,M,δic=m=0P1fdcmcpdcm+δmodMcp2mE17

    here, cp, cmand cm+δmodMare gray level of ic, gray levels of center pixel of mthand m+δthpatches respectively. d.is L2norm and fis given as:

    fa=1,aτ0,a<τE18

    τis a user-specific threshold selected slightly greater than zero (say τ=.01) to obtain stability in flat regions. Figure 18 shows a sample example to generate TPBLP code for selected P=8,δ=2,M=3.TPLBP code generation for given sample using Eq. (17) is as:

    Figure 18.

    TPLBP code generation for selected P=8,δ=2,M=3.

    fdc0cpdc2cp20+fdc1cpdc3cp21+fdc2cpdc4cp22+fdc3cpdc5cp23+fdc4cpdc6cp24+fdc5cpdc7cp25+fdc6cpdc0cp26+fdc7cpdc1cp27E19

    Processing flow to obtain TPLBP feature vector for face recognition is shown in Figure 19. Input facial image of size 64×64is first represented as TPLBP encoded image as in Figure 19(c). TPLBP encoded image is then divided into non-overlapping patches of same size and histogram for each patch is obtained. These histograms are then normalized and truncated to value 0.2. Finally, TPLBP feature vector is obtained by concatenating all histograms.

    Figure 19.

    Processing flow of TPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) TPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected non-overlapping patch. (f) Final TPLBP feature vector by concatenating histograms of all patches in image.

    10. Four-patch LBP (FPLBP)

    Four-patch LBP (FPLBP) [26] is an extension to TPLBP by comparing center pixels of four patches to generate 1-bit value. Two different rings with radius R1and R2(R1<R2) and Ppatches of size M×Mfor each ring are selected around center pixel ic. Two patches with center symmetric are selected in inner ring and compared with corresponding patches in outer ring at distance δalong a circle. This way, FPLBP generates P/2bit code for icby obtaining P/2pairs as:

    FPLBPP,R1,R2,M,δic=m=0P/21fdci,mco,m+δmodMdci,m+P/2co,m+P/2+δmodM2mE20

    here, ci,mand co,m+δmodMare gray levels of center pixel of mthpatch in inner ring and m+δthpatch in outer ring respectively. Again, ci,m+P/2and co,m+P/2+δmodMare gray levels of center pixel of center symmetric m+P/2thpatch in inner ring and m+P/2+δthpatch in outer ring respectively. Figure 20 shows a sample example to generate FPBLP code for selected P=8,δ=2,M=3.Also FPLBP code generation for given sample using Eq. (20) is as:

    Figure 20.

    FPLBP code generation for selected P=8,δ=2,M=3.

    fdci0co1dci4co520+fdci1co2dci5co621fdci2co3dci6co722+fdci3co4dci7co823E21

    Processing flow to obtain FPLBP feature vector for a sample facial image similar to TPLBP is shown in Figure 21.

    Figure 21.

    Processing flow of FPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) FPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected non-overlapping patch. (f) Final FPLBP feature vector by concatenating histograms of all patches in image.

    11. Improved LBP (ILBP)

    Improved LBP (ILBP) originally named as CLBP (complete LBP) is proposed by Guo et al. [27]. It is termed as ILBP to distinguish its abbreviation from compound LBP (CLBP). In ILBP, neighbor pixels are represented by its center pixel and a local difference sign-magnitude transform (LDSMT). A complete processing flow to generate ILBP code is shown in Figure 22. ILPB generates 3Pbits code for Pneighbor pixels. An original image is first represented in terms of local threshold and global threshold. Local threshold is then further decomposed into sign and magnitude components. Consequently, three representations of Pbits are obtained namely, ILBP_Sign (ILBP_S), ILBP_Magnitude (ILBP_M) and ILBP_Gobal (ILBP_G) and combined to form 3Pbits ILBP code.

    Figure 22.

    Complete processing flow to generate ILBP code.

    Let cpand cnrepresent gray levels of center pixel icand Pneighbor pixels respectively. Local threshold is generated by taking difference sp=cncp. Subtracted vector spis further divided into components, namely, magnitude of subtraction (mp) and sign of subtraction (qp) as:

    sp=qpmp,whereqp=signspmp=spE22

    qp=1,sp01,sp<0E23

    Understanding of ILPB encoding scheme to generate 3Pbits ILBP code is shown in Figure 23. Figure 23(a) shows 3×3neighborhood with center pixel value 50. ILBP encoded image after local thresholding is shown in Figure 23(b) as [−38, −15, 20, 15, 22, −6, −41, 35]. After LDSMT, sign and magnitude vectors are obtained. It is clearly seen that original LBP uses only sign as LBP encodes −1 as 0 in sign vector representation. LBP code for above sample block is [0, 0, 1, 1, 1, 0, 0, 1]. Hence, LBP considers only sign components of subtraction while ILBP combines three representations, ILBP_S, ILBP_M and ILBP_G. Local region around center pixel is represented by LDSMT, assigning threshold value w.r.t sign leads ILBP_S and assigning threshold value w.r.t. magnitude leads ILBP_M. Similarly, image is also encoded using global threshold is termed as ILBP_G.

    Figure 23.

    ILBP encoding scheme. (a) 3 × 3 neighborhood of an image. (b) ILBP encoded image after thresholding. (c) Sign component. (d) Magnitude component.

    A comparative analysis of various spatial domain feature representations is given in Table 2.

    FeatureAdvantagesDisadvantages
    HOG
    • Rotation and scale invariant.

    • Very sensitive to image rotation. Not good choice for classification of textures or objects.

    SIFT
    • Rotation and scale invariant.

    • Mathematically complicated and computationally heavy.

    • It is not effective for low powered devices.

    LBP
    • High discriminative power.

    • Computational simplicity.

    • Not invariant to rotations.

    • Size of feature vector increases exponentially with number of neighbors leading to an increase of computational complexity in terms of time and space.

    • The structural information captured by it is limited. Only pixel difference is used, magnitude information ignored.

    • Performance decreases for flat images.

    LPQ
    • Performance is better as compare to LBP in case of blurred illumination and facial expression variations images.

    • LPQ vector is about four times longer than an LBP vector with 8 neighbor pixels.

    CLBP
    • It gives better performance as compared to LBP as it uses both difference sign and magnitude.

    • Feature vector is too long so it increases computational time.

    LTP
    • Resistant to noise.

    • Not invariant under gray-scale transform of intensity values as its encoding is based on a fixed predefined thresholding.

    TPLBP
    • Rotation invariant for texture descriptor.

    • Capture information for not only microstructure but also macrostructure.

    • Complexity increases.

    FPLBP
    • Rotation invariant for texture descriptor.

    • Capture information for not only microstructure but also macrostructure.

    • More complex.

    Table 2.

    Comparative analysis of spatial domain feature representations.

    12. Result analysis for face recognition

    Face recognition has been explored over last many years, hence there exists a large number of researches in this domain. In this section, we present existing face recognition results and analysis based on different spatial domain representations. Deniz et al. [28] proposed face recognition using HOG features by extracting features from varying image patches which resulted in an improved accuracy. Recognition accuracy is evaluated on FERET database with best result of 95.4%. Other related researches are [29] which used EBGM-HOG and showed robustness to change in illumination, rotation and small displacements. Some existing works on face recognition using SIFT features are [30, 31]. These works have also used variants of SIFT such as volume-SIFT (VSIFT), partial-descriptor-SIFT (PDSIFT), learning SIFT at specific locations to improve verification accuracy.

    Face recognition using LPQ feature representation is inspired by [18, 19] which used LPQ as blur invariant descriptor. Damane et al. [32] presented face recognition using LPQ under varying conditions of light, blur, and illumination. Experiments are performed on extended YALE-B, CMU-PIE, and CAS-PEAL-R1 face databases and results showed that LPQ has more robustness to light and illumination variation. Chan et al. [33] presented multiscale LPQ for face recognition and evaluated results on FERET and BANCA face databases. Multiscale LPQ is obtained by applying varying filter size and combining LPQ images, which are then projected into LDA space. Best results of 99.2% for FB, 92% for DP1 and 88% for DP2 are achieved on FERET probe sets.

    Face recognition using LBP feature representation is one of the most researched area [34, 35, 36, 37, 38]. Again, Tan et al. [24] evaluated face recognition under varying lighting condition using LTP feature representation on Extended Yale-B, and CMU PIE face databases. They showed that LTP is more discriminant and less sensitive to noise in uniform regions and improved results in case of flat images. Wolf et al. [26] proposed TPLBP and FPLBP features for face recognition. Accuracy results are validated on two well-known databases, labeled faces in the wild (LFW) and multi PIE. They showed that combining several descriptors from the same LBP boosts family recognition rate. This paper claimed that best accuracy of 80.75% for TPLBP and 75.57% for FPLBP are obtained with the combination of ITML with MultiOSS ID and pose variation. Ahmed et al. [25] proposed CLBP features for facial expression recognition. It is an extension of LBP features. Results are verified in Cohn-Kanade (CK) facial expression database. CLBP features are classified with the help of SVM classifier. They showed that classification rate can be effected by adjusting the number of regions into which expression images are partitioned. For this, they considered three cases by dividing images into 3 × 3, 5 × 5, and 7 × 6 patches. Best accuracy result for CLBP is 94.4% in case of image with 5 × 5 patch size.

    13. Conclusion

    This chapter presents well-known and some recently explored spatial feature representations for face recognition. These feature representations are scale, translation and rotation invariants for 2-D face images. This chapter covers HOG, SIFT and LBP feature representations and complete processing flow to generate feature vectors using these representations for face recognition. SIFT and HOG based on computing image gradients and local extrema are commonly used feature representations for face recognition. LBP performs texture based analysis to represent local facial appearance and an encoded facial image. Other relevant spatial domain representations, such as, LPQ and variants of LBP are explained and analyzed for face recognition. LPQ possesses blur invariant property and provides improved results for blurred facial image. Different variants of LBP, such as, LTP, CLBP, TPLBP and FPLBP are more robust to noise and lighting conditions. These representations characterize facial features more effectively and obtain discriminative feature vectors for face recognition.

    Acknowledgments

    The research work is supported by Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India for the research grant. The sanctioned project title is “Design and development of an Automatic Kinship Verification system for Indian faces with possible integration of AADHAR Database.” with reference no. ECR/2016/001659.

    Conflict of interest

    The authors have no conflict of interest.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Toshanlal Meenpal, Aarti Goyal and Moumita Mukherjee (September 27th 2019). Spatial Domain Representation for Face Recognition, Visual Object Tracking with Deep Neural Networks, Pier Luigi Mazzeo, Srinivasan Ramakrishnan and Paolo Spagnolo, IntechOpen, DOI: 10.5772/intechopen.85382. Available from:

    chapter statistics

    139total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    Related Content

    This Book

    Next chapter

    Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary Gradient Pattern for Face Recognition in Video Surveillance Area

    By Nuzrul Fahmi Nordin, Samsul Setumin, Abduljalil Radman and Shahrel Azmin Suandi

    Related Book

    First chapter

    Neural Forecasting Systems

    By Takashi Kuremoto, Masanao Obayashi and Kunikazu Kobayashi

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More About Us