Open access peer-reviewed chapter - ONLINE FIRST

Matrix Factorization on Complex Domain for Face Recognition

By Viet-Hang Duong, Manh-Quan Bui and Jia-Ching Wang

Submitted: December 4th 2018Reviewed: February 13th 2019Published: September 4th 2019

DOI: 10.5772/intechopen.85182

Downloaded: 35

Abstract

Matrix factorization on complex domain is a natural extension of nonnegative matrix factorization, but it is still a very new trend in face recognition. In this chapter, we present two complex matrix factorization-based models for face recognition, in which the objective functions are the real-valued functions of complex variables. Our first model aims to build a learned base, which is embedded within original space. The second model finds the base whose volume is maximized. Experimental results on datasets with and without outliers show that our proposed algorithms are more effective than competitive algorithms.

Keywords

  • complex matrix factorization
  • face recognition
  • nonnegative matrix factorization
  • projected gradient descent

1. Introduction

Face recognition is a central issue in computer vision and pattern recognition. The variations in lighting conditions, pose and viewpoint changes, facial expressions, makeup, aging, and occlusion are challenges that significantly affect recognition accuracy. Generally, the challenges in face recognition can be classified into four main categories as follows:

Illumination variations: The face of a person can appear dramatically different when illumination changes. This occurs because of spectra or source distribution and intensity changes. In practice, many two-dimensional (2D) methods show that recognition performance is notably decreased when illumination strongly occurs [1, 2]. Therefore, the problem of lighting variation is considered as one of the key challenges for face recognition system designer. Several methods have been proposed to handle variable illuminations such as extraction of illumination invariant features [3, 4, 5, 6, 7]; images with variable illuminations transformed to a canonical representation [8, 9]; modeling the illumination variations [10, 11]; facial shapes and albedos are based on 3D face models [12].

Pose/viewpoint changes: Deformed face and self-occluded face usually occur by pose or viewpoint changes which affect the recognition process [13]. Generally, viewpoint face recognition approaches are divided into two categories: viewpoint-transformed and cross-pose based [14]. Viewpoint transformed recognition methods aim to transform the probe image to match the gallery image in the pose, whereas cross-pose-based approaches attempt to estimate the light field of the face [15, 16]. Besides, other approaches integrated 2D and 3D information [17, 18] in order to cope with pose and illumination variations.

Facial expression: Face recognition tasks are more challenging when dealing with emotional states of a person in an image. In addition, hairstyle or facial hair such as beard and mustache can change facial appearance. To handle with difficulties of expression, facial expression recognition (FER) systems, including static image FER [19, 20, 21], and dynamic sequence FER [22, 23, 24] are designed. In static-based methods, the spatial information from the current single image is extracted to obtain the feature representation. In contrary, the dynamic-based methods consider the temporal relation among adjacent frames in the sequence of input facial expression.

Occlusion: Faces may be partially occluded by other objects such as sunglasses, scarf [62], etc. Other situations of occlusion are some faces may be occluded by other faces of a group of people [25]. It is very difficult to be observed and recognized because the available part of the face is very small. Therefore, occlusion problems become harder and need to be solved in face recognition.

In face recognition, image representation (IR) techniques play an important role in improving the accuracy performance. Commonly, an IR system is to transform the input signal into a new representation which reduces its dimensionality and explicates its latent structures. Over the past decades, the subspace methods, such as principal component analysis (PCA) [26], linear discriminant analysis (LAD) [27, 28], and nonnegative matrix factorization (NMF) [29, 30] have been successfully used in feature extraction. In particularly, PCA is known as a powerful technique for dimensionality reduction and multivariate analysis. PCA seeks a linear combination of variables such that the maximum variance is extracted from the variables by projecting data onto an orthogonal base which is represented in the directions of largest variance. In image representation, eigenfaces (PCA) result in dense representations for facial images, which mainly applied the global structure of the whole facial image. Likewise, LAD finds a linear transformation that maximizes discrimination between classes.

NMF is known as an unsupervised data-driven approach in which all elements of the decomposed matrix and the obtained matrix factors are forced to be nonnegative. Furthermore, NMF is able to represent an object as various parts, for instance, a human face can be decomposed into eyes, lips, and other elements. In order to make NMF algorithms more efficient, one has proposed some constraints into the cost function such as sparsity [31, 32], orthogonally [33], discrimination [34], graph regularization [35, 36], and pixel dispersion penalty [37]. Additionally, proposing an appropriate distance metric for an NMF model plays an important role in enhancing the efficacy of the estimated linear subspace of the given data. NMF techniques commonly apply the squared Frobenius norm (Fr) or the generalized Kullback–Leibler (KL) divergence for the independent and identically distributed noise data. But in many cases, they produce an arbitrarily biased subspace when data is corrupted by outliers [38]. To overcome this drawback, L2 and L1 norms were proposed by Kong et al. [39] to obtain a robust NMF, in which the noise was assumed to follow the Laplacian distribution. Similarly, the earth mover’s distance (EMD) and the Manhattan distance were also suggested in the work of Sandler et al. [40] and Guan et al. [41], respectively. A family of cost functions parameterized by a single shape parameter beta, called the beta-divergence [42], is commonly used on NMF approaches. Although NMFs are able to learn part-based representations and capture the Euclidean structure of high-dimensional data space, they are still limited to comprise the nonlinear sub-manifold structure behind the data.

Recently, matrix factorization techniques have been extended to complex matrix factorizations (CMFs) where the input data are complex matrices. These models have been obtaining promising results in facial expression recognition and data representation tasks [43, 44, 45]. The main idea of complex methods for face and facial expression recognition is that the original signal is projected on to the complex field by a mapping such that the distances of two data points in the original space and projection space are equivalent. Particularly, by transforming the real values of pixel intensive to complex domain, it is shown that the squared Frobenius norm of corresponding complex vectors and the cosine dissimilarity of real-valued vectors are equivalent. As a result, the real optimization problem with cosine divergence is replaced by optimizing a complex function with the Frobenius norm. Most of the mentioned CMF models were applied to facial expression and object recognition.

In this chapter, we present two complex matrix factorization-based models for face recognition. In the following sections, we denote M-dimensional column vector y=y1yMTR+Mto be an observed sample. Let Y be a dataset comprising of N-observations; Y is expressed in the matrix form as Y=y1yNR+M×N, where R+denotes the set of nonnegative real numbers. In the proposed models, the real data set Y is transformed to the complex domain, and the complex data matrix Z is factorized under imitating NMF frameworks. The contributions of this chapter are summarized as follows:

  1. The image analysis methods on the complex domain, which are called structured complex matrix factorization (StCMF) and constrained complex matrix factorization (CoCMF), are proposed.

  2. In complex domain, the updating rule for StCMF and CoCMF is derived based on gradient descent method.

  3. A thorough experimental study on face recognition is conducted, the results show that the proposed StCMF and CoCMF yield better performance compared to extensions of the real NMFs.

2. Background

2.1 Nonnegative matrix factorization

Assume that we are given an initial data matrix YR+M×Nand a positive integer K≪ min{M, N}. NMF methods aim to find a basis matrix UR+M×Kand a coding variable matrix VR+K×N, such that YUV.The standard NMF is usually formulated as an optimization:

minU,VDYUVs.t.U0,V0E1

where DYUVis a divergence function to measure the distance between Y and UV.

Most NMF techniques estimate the linear subspace of the given data by the Frobenius norm (F) or the generalized Kullback–Leibler (KL) divergence which have the following forms:

DFAB=ABF2=i,jAijBij2E2
DKLAB=limβ0DβAB=i,jAijlogAijBijAij+BijE3

The problem (1) is non-convex; thus, it may result in several local minimal solutions. To find an optimization solution, the iterative methods are commonly used. Generally, there are three classes of algorithms for solving this problem including multiplicative update, gradient descent, and alternating nonnegative least squares algorithms. The most popular approach to solve (1) is the multiplicative update rules proposed by Lee and Seung [30]. For example, the iteratively updating rules of a Frobenius NMF cost function are given by

VijtVijt1Ut1TYijUijt1TUt1Vij;E4
UijtUijt1YVt1TijUt1VijVijt1T;E5

2.2 The cosine divergence

Given the representations of two images, It and Is are M-dimensional vectors yt, ys in the lexicographic order, respectively. First, yt,ysRMis normalized to get the values ytc,ysc01, where c is the element vector index or the vector spatial location. The correlation between images It and Is through the cosine dissimilarity between yt and ys, is introduced by

DCytys=c=1M1cosαπytcαπyscE6

One of interesting properties of the cosine distance measurement is suppression outlier which is proved in [46]. The comparison between Frobenius norm and cosine divergence is showed in Figure 1. Liwiki et al. [46] show that the Frobenius distance between the original and the same subject is smaller; in contrary, a large distance between the original image and the image of a different person or occlusion image results from the cosine-based measure.

Figure 1.

Sample images for making comparison between dissimilarity measures.

2.3 Euler’s formula and a space transformation

Let us consider two mappings:

g:RMR2Msuch that

gyt=1NcosytTsinytTT;ytRNE7
wherecosyt=cosyt1cosyt2cosytMTE8
sinyt=sinyt1sinyt2sinytMTE9
gyt=1E10

and h:RMCMis defined by

zt=hyt=12eiαπyt=12eiαπyt1eiαπytME11

The nonlinear function h is to transform the real-valued features to complex feature space. In other words, a complex vector space with M-dimensions can be regarded as a 2 M-dimensional real vector space.

It is proven that the cosine dissimilarity distance of a pair of data in the input real space equals to the Frobenius distance of the corresponding data in complex domain [47]. This observation is the first motivation of StCMF and CoCMF by mapping the samples into the complex space with a nonlinear mapping function h and performing matrix factorization in this complex feature space.

2.4 Wirtinger calculus

Any function of a complex variable z can be defined as fzz=x+iy=Fxy=Uxy+iVxy, where i2 = −1 and x,yR. Palka et al. [48] defined the complex differentiability as follows:

Definition 1. Let ΑCbe an open set. The function f:ΑCis said to be differentiable at z0Αif there is a limit limzz0fzfz0zz0which exists independently on the manner where zz0.

A necessary condition for f being holomorphic is that the Cauchy-Riemann equations hold, that is, Ux=Vyand Uy=Vx; otherwise, it is nonholomorphic. In statistical signal processing, the functions of interest are real-valued and have complex arguments z and hence are not analytic on complex plane. In this case we can use Wirtinger calculus [49], which writes the expansions in conjugate coordinate system by considering the function f(z) as a bivariate function f(z, z*) and treating z and z* as independent arguments.

Definition 2. The pair of partial derivative operators for function fz=fzzreferred to as the Wirtinger derivative [49] is defined by

fz=12fxify,fz=12fx+ifyE12

In case of real-valued function of complex variables, we also have one special property which is useful for optimization theory described later.

Lemma 1. The differential df of a real-valued function f:ΑRwith complex valued zACcan be expressed as

df=2RefzzdzE13

3. Complex matrix factorization

Let the input data matrix Y = (Y1, Y2,…, YN) contain N data vectors as columns. As described in previous sections, the elements of real matrix Y are normalized and transformed into a complex number field to yield the complex data matrix Z. Two unconstraint and constrained optimization problems in an unordered complex field is introduced in the following sections, respectively.

3.1 Structured complex matrix factorization (StCMF)

The idea of structured complex matrix factorization (StCMF) is to build a learned base which is embedded within original space. The basis matrix in StCMF is constructed by the linear combination of the complex training examples. Given the complex data matrix ZCM×N, StCMF factorizes Z into the encoding matrix VCK×Nand the exemplar-embed basis matrix U=ZWwhere WCM×K.Therefore, the objective function of StCMF problem can be formulated as follows:

minW,VfStCMFWV=minW,V12ZZWVF2E14

where .Fdenotes the Frobenius norm and K≪ min{N, M}

and ZZWVF2=TrZZWVHZZWV=TrZHZVHWHZHZZHZWV+VHWHZHZWVEQ13

3.2 Constrained complex matrix factorization (CoCMF)

Considering a dataset of N complex vectors Z = [Z1, Z2,…, ZN], each of Zi represents a data instance. The proposed CoCMF model decomposes Z into a product of two matrices W and V such that each instance Zi is a convex combination of latent components W. We call V and W the encoding matrix and the basis matrix, respectively. Geometrically, the data points Zi, i = 1, 2, ..., N all lie in or on the surface of a simplicial cone SW, whose vertices correspond to the columns of W and

SW=z:z=i=1KWiviviR+E15

Note that SW lies in the positive orthant and the volume of SW (Vol (SW)) is given by the following formula [48]:

VolSW=detWK1!E16

In [51], Zhou et al. illustrated that the small-cone constraint on the bases W will impose suitable sparseness on V. Inversely, the large-cone penalty will result in sparseness on the bases of factorization and the reconstruction errors on the training data, and the test data will be simultaneously decreased [50, 52]. Therefore, all observed data can be reconstructed by linearly combining the bases of a dictionary. Combining the goals of enlarging the volume of the simplex base, the constrained complex matrix factorization (CoCMF) problem is formulated as follows:

minW,VfCoCMFWV=minW,V12ZWVF2detWK1!E17
s.tWCM×K,VR+K×Nandi=1KVij=1 j

Since 0 < det(WTW) ≤ 1 holds under the assumptions 1TWi = 1. To simply the model, in this work, the log-determinant function is exploited to modify the volume penalty, and Eq. (17) can be written as the following form:

minW,VfCoCMFWV=minW,V12ZWVF2logdetWTWE18
s.tWCM×K,VR+K×Ni=1KVij=1,andi=1KWij=1j

3.3 Complex matrix factorization via projected gradient descent

It can be seen that (12) and (16) are non-convex minimization problems with respect to both variables W and V, so they are impractical to obtain the optimal solution. These NP-hard problems can be tackled by applying the block coordinate descent (BCD) with two matrix blocks [53] to obtain a local solution. The specific problems (14) and (18) were solved by the following scheme:

Fixing W and solving the following one variable optimization problems

minVfStCMF_VV=minV12ZZWVF2E19
minVfCoCMF_VV=minV12ZWVF2E20
s.tVR+K×N,i=1KVij=1 j
Algorithm 1: Complex projected gradient (CPG) with Armijo rule
Input: Z, W

Output: v

1. Initialize any feasible V0,0<β<1,0<σ<1

2. Iterations, for k = 1, 2, …
Vk+1=PVkαkVfWVk
where αk=μtk, tkis the first nonnegative integer such that
fWVk+1fWVk2σReVfWVkVk+1Vk

Then, W is updated based on the Moore-Penrose pseudoinverse [54], which is dented by and W = (ZZ)Vfor Eq. (14) and W = ZVfor Eq. (18) with fixed V.

Taking advanced of Wirtinger calculus, the gradient is evaluated in the forms

VfStCMF_VV=WHZHZ+WHZHZWVE21
VfCoCMF_VV¯=WHWV¯WHZE22
whereV¯=V1V11V2V21VNVN1;V¯0E23

We summarize the projected gradient method for optimizing (21) and (22) in Algorithm 1.

4. Experiments

To investigate the recognition performance of the proposed StCMF and CoCMF methods, we have conducted extensive experiments on the ORL dataset [55] and the Georgia Tech face dataset [56] in two scenarios for face recognitions including holistic face and key point occluded face.

First, we give brief description about the data collections and experiment setting. Second, the performance comparisons and corresponding results are shown.

4.1 Datasets and experiment setting

The ORL dataset contains 400 grayscale images corresponding to 40 people’s face. The images were captured at different times, under different lighting conditions, with different facial expression (open or close eyes, smiling or non-smiling) and facial details (glasses or no glasses). All the face images are manually aligned and cropped. For the computational efficiency, each cropped image is resized to 28 × 23 for face recognition without occlusion and 32 × 32 pixels for face recognition with occlusion. Figure 2 shows some instances of such random face on ORL dataset.

Figure 2.

Sample facial images from ORL dataset [55].

The Georgia Tech face dataset (GT) contains images of 50 people taken during 1999 and stored in JPEG format. For each individual, there are 15 color images captured at resolution of 640 × 480 pixels. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In our experiments, original images are normalized, cropped and scaled into 31 × 23 pixels, and finally converted into gray level images. Examples of GT dataset are shown in Figure 3.

Figure 3.

Sample facial images from GT dataset [56].

We use the nearest neighbor (NN) classifier for all face recognition with/without occlusion experiments. The platform was a 3.0 GHz Pentium V with 1024 MB RAM running Windows. Code was written in MATLAB.

4.2 Performance and comparison

4.2.1 Face recognition on ORL dataset

For this case, in order to evaluate the performance of the proposed StCMF and CoCMF, we make the comparisons with seven representative algorithms, namely, NMF [29], P-NMF [57], P-NMF (Fr) [58], P-NMF (KL) [58], OPNMF (Fr) [59], OPNMF (KL) [59], NNDSVD-NMF [60], and GPNMF [60]. Different training numbers ranging from five to nine images were randomly chosen from each individual to construct the training set, and the rest images constitute the test set which was used to estimate the accuracy of face recognition [61]. The learning basic images in all selected algorithms are K = 40, and the mean recognition rate are described in Table 1.

No. TrainsStCMFCoCMFGPNMFNMFPNMFP-NMF (Fr)P-NMF (KL)OPNMF (Fr)OPNMF (KL)NNDSVD-NMF
590.8590.3086.584.582.483.785.080.079.043.0
691.7592.2587.584.485.818584.483.082.039.3
791.1794.7587.583.387.3385.685.984.480.036.8
893.7593.8888.7588.7588.588.888.084.383.040.8
997.5095.5092.58590.7587.2587.584.083.042.3
Avg.93.0093.3488.5585.1986.9686.0786.1683.1481.440.44

Table 1.

Face recognition accuracy on the ORL dataset with different train numbers.

Table 1 shows the detailed recognition accuracies of compared algorithms. As can be seen, our algorithms significantly outperform the other algorithms in all the cases. Almost algorithms achieve the best accuracy when the number of training face images per class is eight exceptionally our proposed methods and GPNMF. Besides, there is the same trend between the number of training images and accuracy rate; that is, the lower training numbers lead to a decreasing rate of recognition. StCMF achieves the best performance (97.50%) when the number of training samples is chosen largest. However, CoCMF achieves higher improvement in general.

It is observed that the above-selected algorithms employ a different kind of measurements such as Frobenius (Fr) and Kullback–Leibler (KL) and add more graph to regularize as well as adjust basic NMF to projective NMF. In a reprocessing image, centered aligning image technique is applied for other methods to enhance effective recognition rate that cannot be focused on our StCMF and CoCMF models. However, the best recognition rate of all obtained by our proposed CoCMF method which has extra regularizes term.

One of the difficulties in NMF is the estimation of the number of components or K. The choice of K results in a compromise between data fitting and model complexity; that is, a greater K leads to a better data approximation, but a smaller K makes a model being easier to estimate and fewer parameters to transmit. In almost NMFs, K is typically chosen such that it is larger than the estimated number of sources and follows the constraint N+MKNM. This limit of NMFs illustrated by the observation that among all results, the lowest rate belongs to NNDSVD-NMF, one NMF method utilizes SVD to get initialization which results from significant independency of NNDSVD-NMF on the number of bases K.

4.2.2 Face recognition on GT dataset

Table 2 shows the recognition rates versus feature dimension by the competing methods on GT dataset. GT dataset exists with many challenging samples that are harder to recognize. Thus, the performance of all methods is lower than those of ORL dataset. In this dataset, the implement was done similarly as those in the previous section in choosing algorithms to compare as well as dividing randomly into two different sets, each containing a different number of testing and training images. In our experiments, we set K = 50 and range the number training being five odd numbers as {5, 7, 9, 11, 13}. The experimental results show that as the number of training images increases, the efficiency of the recognition system also increases. We can see that CoCMF method achieves the best performance and StCMF holds the second place in overall. All the methods obtain their best results when 13 training samples are used (the largest number of training sample in our experiment). In this case, the highest recognition rate belongs to the StCMF method again.

No. TrainsStCMFCoCMFGPNMFNMFPNMFP-NMF (Fr)P-NMF (KL)OPNMF (Fr)OPNMF (KL)NNDSVD-NMF
539.6459.4059.1454.7046.8458.9057.9757.8948.0823.80
754.8062.2560.9659.3852.5060.2060.8860.4448.6823.83
975.2069.6762.562.4054.9364.0363.3562.4848.8424.30
1169.5070.5065.3765.2057.2563.7563.3863.1749.3627.35
1377.6073.0069.0067.4061.6065.6064.0563.5049.5030.20
Avg.63.3566.9663.3961.8254.6362.5061.9361.5048.9025.90

Table 2.

Face recognition accuracy on the GT dataset with different train numbers.

4.2.3 Face recognition on occluded ORL images

For a more convincing experimental assessment of the power of our proposed models in occlusion processing, we test the performance on occluded images of ORL database. In cropped 112 × 92 dimension test image gallery, occlusion was simulated by using a sheltering patch with different size ranges in set {10 × 10, 15 × 15, 20 × 20, 25 × 25, 30 × 30} and placed at random locations before resized in 28 × 21. Figure 4 shows examples of occluded ORL images.

Figure 4.

Occluded face samples from ORL dataset with patch sizes of 15 × 15, 20 × 20, 25 × 25, 30 × 30, and 35 × 35.

In this experiment, we take randomly the training images with the ratio 4:6 for training/testing and test several times on each sort of percent of randomly occluded test image. Table 3 shows the detailed recognition accuracy on all selected algorithms and our proposed methods. It can be seen that the recognition rate of all methods is increased when the size of occlusion batch is decreased. Obviously, StCMF and CoCMF outperform other tested approaches even if occlusion. This reveals that StCMF and CoCMF are more robust outlier than the other.

Occluded SizeStCMFCoCMFGPNMFNMFPNMFP-NMF (Fr)P-NMF (KL)OPNMF (Fr)OPNMF (KL)NNDSVD-NMF
15×1579.5880.2175.1674.3272.5569.1671.2574.1845.1654.46
20×2072.0873.7964.5265.4562.1567.5271.2365.0041.5225.62
25×2570.0071.1765.5455.1852.3865.5462.1955.0035.5419.83
30×3052.0861.5454.5345.6243.8748.5355.2145.8928.5313.22
35×3539.1741.0043.2533.6331.0643.2538.7933.3923.2516.13
Avg.62.5865.5460.6054.8452.4058.8059.7354.6934.8025.85

Table 3.

Face recognition accuracy on the occluded ORL image with different occlusion sizes.

5. Summary and discussion

In this paper, we have proposed a new approach to complex matrix factorization to face recognition. Preliminary experimental results show that StCMF and CoCMF achieve promising results for face recognition by utilizing the robustness of cosine-based dissimilarity and extend the main spirits of NMF from real number field to complex field which adds flexible constraints for the real-valued function of complex variables. We have also noted how strong is the proficiency of StCMF as well as CoCMF on face recognition task. Our proposed methods are simple frameworks which do not need more complicated regularizes like NMFs in the real domain. We believe that this capability of proposed methods will be stable in other application tasks. In future work, three aspects of the proposed system will be centered on. First, we add more regularized rules into objective function to a range of further application such as speech and sound processing. Second, we employ other classifiers such as complex neural network or complex SVM to treat well the complex-valued feature. Last, kernel methods will be exploited in both feature extraction and classification of StCMF and CoCMF constructed paradigm to develop the performance of nonlinear contexts.

Acknowledgments

This research is partially supported by the Ministry of Science and Technology under Grant Number 108-2634-F-008-004 through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan.

Download

chapter PDF

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Viet-Hang Duong, Manh-Quan Bui and Jia-Ching Wang (September 4th 2019). Matrix Factorization on Complex Domain for Face Recognition [Online First], IntechOpen, DOI: 10.5772/intechopen.85182. Available from:

chapter statistics

35total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us