Face recognition accuracy on the ORL dataset with different train numbers.

## Abstract

Matrix factorization on complex domain is a natural extension of nonnegative matrix factorization, but it is still a very new trend in face recognition. In this chapter, we present two complex matrix factorization-based models for face recognition, in which the objective functions are the real-valued functions of complex variables. Our first model aims to build a learned base, which is embedded within original space. The second model finds the base whose volume is maximized. Experimental results on datasets with and without outliers show that our proposed algorithms are more effective than competitive algorithms.

### Keywords

- complex matrix factorization
- face recognition
- nonnegative matrix factorization
- projected gradient descent

## 1. Introduction

Face recognition is a central issue in computer vision and pattern recognition. The variations in lighting conditions, pose and viewpoint changes, facial expressions, makeup, aging, and occlusion are challenges that significantly affect recognition accuracy. Generally, the challenges in face recognition can be classified into four main categories as follows:

*Illumination variations*: The face of a person can appear dramatically different when illumination changes. This occurs because of spectra or source distribution and intensity changes. In practice, many two-dimensional (2D) methods show that recognition performance is notably decreased when illumination strongly occurs [1, 2]. Therefore, the problem of lighting variation is considered as one of the key challenges for face recognition system designer. Several methods have been proposed to handle variable illuminations such as extraction of illumination invariant features [3, 4, 5, 6, 7]; images with variable illuminations transformed to a canonical representation [8, 9]; modeling the illumination variations [10, 11]; facial shapes and albedos are based on 3D face models [12].

*Pose/viewpoint changes*: Deformed face and self-occluded face usually occur by pose or viewpoint changes which affect the recognition process [13]. Generally, viewpoint face recognition approaches are divided into two categories: viewpoint-transformed and cross-pose based [14]. Viewpoint transformed recognition methods aim to transform the probe image to match the gallery image in the pose, whereas cross-pose-based approaches attempt to estimate the light field of the face [15, 16]. Besides, other approaches integrated 2D and 3D information [17, 18] in order to cope with pose and illumination variations.

*Facial expression*: Face recognition tasks are more challenging when dealing with emotional states of a person in an image. In addition, hairstyle or facial hair such as beard and mustache can change facial appearance. To handle with difficulties of expression, facial expression recognition (FER) systems, including static image FER [19, 20, 21], and dynamic sequence FER [22, 23, 24] are designed. In static-based methods, the spatial information from the current single image is extracted to obtain the feature representation. In contrary, the dynamic-based methods consider the temporal relation among adjacent frames in the sequence of input facial expression.

*Occlusion*: Faces may be partially occluded by other objects such as sunglasses, scarf [62], etc. Other situations of occlusion are some faces may be occluded by other faces of a group of people [25]. It is very difficult to be observed and recognized because the available part of the face is very small. Therefore, occlusion problems become harder and need to be solved in face recognition.

In face recognition, image representation (IR) techniques play an important role in improving the accuracy performance. Commonly, an IR system is to transform the input signal into a new representation which reduces its dimensionality and explicates its latent structures. Over the past decades, the subspace methods, such as principal component analysis (PCA) [26], linear discriminant analysis (LAD) [27, 28], and nonnegative matrix factorization (NMF) [29, 30] have been successfully used in feature extraction. In particularly, PCA is known as a powerful technique for dimensionality reduction and multivariate analysis. PCA seeks a linear combination of variables such that the maximum variance is extracted from the variables by projecting data onto an orthogonal base which is represented in the directions of largest variance. In image representation, eigenfaces (PCA) result in dense representations for facial images, which mainly applied the global structure of the whole facial image. Likewise, LAD finds a linear transformation that maximizes discrimination between classes.

NMF is known as an unsupervised data-driven approach in which all elements of the decomposed matrix and the obtained matrix factors are forced to be nonnegative. Furthermore, NMF is able to represent an object as various parts, for instance, a human face can be decomposed into eyes, lips, and other elements. In order to make NMF algorithms more efficient, one has proposed some constraints into the cost function such as sparsity [31, 32], orthogonally [33], discrimination [34], graph regularization [35, 36], and pixel dispersion penalty [37]. Additionally, proposing an appropriate distance metric for an NMF model plays an important role in enhancing the efficacy of the estimated linear subspace of the given data. NMF techniques commonly apply the squared Frobenius norm (Fr) or the generalized Kullback–Leibler (KL) divergence for the independent and identically distributed noise data. But in many cases, they produce an arbitrarily biased subspace when data is corrupted by outliers [38]. To overcome this drawback, *L*_{2} and *L*_{1} norms were proposed by Kong et al. [39] to obtain a robust NMF, in which the noise was assumed to follow the Laplacian distribution. Similarly, the earth mover’s distance (EMD) and the Manhattan distance were also suggested in the work of Sandler et al. [40] and Guan et al. [41], respectively. A family of cost functions parameterized by a single shape parameter beta, called the beta-divergence [42], is commonly used on NMF approaches. Although NMFs are able to learn part-based representations and capture the Euclidean structure of high-dimensional data space, they are still limited to comprise the nonlinear sub-manifold structure behind the data.

Recently, matrix factorization techniques have been extended to complex matrix factorizations (CMFs) where the input data are complex matrices. These models have been obtaining promising results in facial expression recognition and data representation tasks [43, 44, 45]. The main idea of complex methods for face and facial expression recognition is that the original signal is projected on to the complex field by a mapping such that the distances of two data points in the original space and projection space are equivalent. Particularly, by transforming the real values of pixel intensive to complex domain, it is shown that the squared Frobenius norm of corresponding complex vectors and the cosine dissimilarity of real-valued vectors are equivalent. As a result, the real optimization problem with cosine divergence is replaced by optimizing a complex function with the Frobenius norm. Most of the mentioned CMF models were applied to facial expression and object recognition.

In this chapter, we present two complex matrix factorization-based models for face recognition. In the following sections, we denote *M*-dimensional column vector **Y** be a dataset comprising of *N*-observations; **Y** is expressed in the matrix form as **Y** is transformed to the complex domain, and the complex data matrix **Z** is factorized under imitating NMF frameworks. The contributions of this chapter are summarized as follows:

The image analysis methods on the complex domain, which are called structured complex matrix factorization (StCMF) and constrained complex matrix factorization (CoCMF), are proposed.

In complex domain, the updating rule for StCMF and CoCMF is derived based on gradient descent method.

A thorough experimental study on face recognition is conducted, the results show that the proposed StCMF and CoCMF yield better performance compared to extensions of the real NMFs.

## 2. Background

### 2.1 Nonnegative matrix factorization

Assume that we are given an initial data matrix *K*≪ min{**M**, **N**}. NMF methods aim to find a basis matrix

where **Y** and **UV**.

Most NMF techniques estimate the linear subspace of the given data by the Frobenius norm (F) or the generalized Kullback–Leibler (KL) divergence which have the following forms:

The problem (1) is non-convex; thus, it may result in several local minimal solutions. To find an optimization solution, the iterative methods are commonly used. Generally, there are three classes of algorithms for solving this problem including multiplicative update, gradient descent, and alternating nonnegative least squares algorithms. The most popular approach to solve (1) is the multiplicative update rules proposed by Lee and Seung [30]. For example, the iteratively updating rules of a Frobenius NMF cost function are given by

### 2.2 The cosine divergence

Given the representations of two images, **I***t* and **I***s* are *M*-dimensional vectors **y***t*, **y***s* in the lexicographic order, respectively. First, *c* is the element vector index or the vector spatial location. The correlation between images **I***t* and **I***s* through the cosine dissimilarity between **y***t* and **y***s*, is introduced by

One of interesting properties of the cosine distance measurement is suppression outlier which is proved in [46]. The comparison between Frobenius norm and cosine divergence is showed in Figure 1. Liwiki et al. [46] show that the Frobenius distance between the original and the same subject is smaller; in contrary, a large distance between the original image and the image of a different person or occlusion image results from the cosine-based measure.

### 2.3 Euler’s formula and a space transformation

Let us consider two mappings:

and

The nonlinear function *h* is to transform the real-valued features to complex feature space. In other words, a complex vector space with *M*-dimensions can be regarded as a 2 *M*-dimensional real vector space.

It is proven that the cosine dissimilarity distance of a pair of data in the input real space equals to the Frobenius distance of the corresponding data in complex domain [47]. This observation is the first motivation of StCMF and CoCMF by mapping the samples into the complex space with a nonlinear mapping function *h* and performing matrix factorization in this complex feature space.

### 2.4 Wirtinger calculus

Any function of a complex variable *z* can be defined as *i*^{2} = −1 and

**Definition 1.** Let

A necessary condition for *f* being holomorphic is that the Cauchy-Riemann equations hold, that is, *f*(*z*) as a bivariate function *f*(*z*, *z**) and treating *z* and *z** as independent arguments.

**Definition 2.** The pair of partial derivative operators for function

In case of real-valued function of complex variables, we also have one special property which is useful for optimization theory described later.

**Lemma 1.** The differential *df* of a real-valued function

## 3. Complex matrix factorization

Let the input data matrix **Y** = (**Y**_{1}, **Y**_{2},…, **Y***N*) contain *N* data vectors as columns. As described in previous sections, the elements of real matrix **Y** are normalized and transformed into a complex number field to yield the complex data matrix **Z**. Two unconstraint and constrained optimization problems in an unordered complex field is introduced in the following sections, respectively.

### 3.1 Structured complex matrix factorization (StCMF)

The idea of structured complex matrix factorization (StCMF) is to build a learned base which is embedded within original space. The basis matrix in StCMF is constructed by the linear combination of the complex training examples. Given the complex data matrix **Z** into the encoding matrix

where *K*≪ min{*N*, *M*}

### 3.2 Constrained complex matrix factorization (CoCMF)

Considering a dataset of *N* complex vectors **Z** = [**Z**_{1}, **Z**_{2},…, **Z***N*], each of **Z***i* represents a data instance. The proposed CoCMF model decomposes **Z** into a product of two matrices **W** and **V** such that each instance **Z***i* is a convex combination of latent components **W**. We call **V** and **W** the encoding matrix and the basis matrix, respectively. Geometrically, the data points **Z***i*, i = 1, 2, ..., *N* all lie in or on the surface of a simplicial cone **S**_{W}, whose vertices correspond to the columns of **W** and

Note that **S**_{W} lies in the positive orthant and the volume of **S**_{W} (*Vol* (**S**_{W})) is given by the following formula [48]:

In [51], Zhou et al. illustrated that the small-cone constraint on the bases **W** will impose suitable sparseness on **V**. Inversely, the large-cone penalty will result in sparseness on the bases of factorization and the reconstruction errors on the training data, and the test data will be simultaneously decreased [50, 52]. Therefore, all observed data can be reconstructed by linearly combining the bases of a dictionary. Combining the goals of enlarging the volume of the simplex base, the constrained complex matrix factorization (CoCMF) problem is formulated as follows:

Since 0 < det(**W**^{T}**W**) ≤ 1 holds under the assumptions 1^{T}**W***i* = 1. To simply the model, in this work, the log-determinant function is exploited to modify the volume penalty, and Eq. (17) can be written as the following form:

### 3.3 Complex matrix factorization via projected gradient descent

It can be seen that (12) and (16) are non-convex minimization problems with respect to both variables **W** and **V**, so they are impractical to obtain the optimal solution. These NP-hard problems can be tackled by applying the block coordinate descent (BCD) with two matrix blocks [53] to obtain a local solution. The specific problems (14) and (18) were solved by the following scheme:

Fixing **W** and solving the following one variable optimization problems

Algorithm 1: Complex projected gradient (CPG) with Armijo rule |
---|

Input: Z, WOutput: v1. Initialize any feasible 2. Iterations, for k = 1, 2, … |

Then, **W** is updated based on the Moore-Penrose pseudoinverse [54], which is dented by **W** = (**Z****Z**)**V****W** = **ZV****V**.

Taking advanced of Wirtinger calculus, the gradient is evaluated in the forms

We summarize the projected gradient method for optimizing (21) and (22) in **Algorithm 1**.

## 4. Experiments

To investigate the recognition performance of the proposed StCMF and CoCMF methods, we have conducted extensive experiments on the ORL dataset [55] and the Georgia Tech face dataset [56] in two scenarios for face recognitions including holistic face and key point occluded face.

First, we give brief description about the data collections and experiment setting. Second, the performance comparisons and corresponding results are shown.

### 4.1 Datasets and experiment setting

The ORL dataset contains 400 grayscale images corresponding to 40 people’s face. The images were captured at different times, under different lighting conditions, with different facial expression (open or close eyes, smiling or non-smiling) and facial details (glasses or no glasses). All the face images are manually aligned and cropped. For the computational efficiency, each cropped image is resized to 28 × 23 for face recognition without occlusion and 32 × 32 pixels for face recognition with occlusion. Figure 2 shows some instances of such random face on ORL dataset.

The Georgia Tech face dataset (GT) contains images of 50 people taken during 1999 and stored in JPEG format. For each individual, there are 15 color images captured at resolution of 640 × 480 pixels. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In our experiments, original images are normalized, cropped and scaled into 31 × 23 pixels, and finally converted into gray level images. Examples of GT dataset are shown in Figure 3.

We use the nearest neighbor (NN) classifier for all face recognition with/without occlusion experiments. The platform was a 3.0 GHz Pentium V with 1024 MB RAM running Windows. Code was written in MATLAB.

### 4.2 Performance and comparison

#### 4.2.1 Face recognition on ORL dataset

For this case, in order to evaluate the performance of the proposed StCMF and CoCMF, we make the comparisons with seven representative algorithms, namely, NMF [29], P-NMF [57], P-NMF (Fr) [58], P-NMF (KL) [58], OPNMF (Fr) [59], OPNMF (KL) [59], NNDSVD-NMF [60], and GPNMF [60]. Different training numbers ranging from five to nine images were randomly chosen from each individual to construct the training set, and the rest images constitute the test set which was used to estimate the accuracy of face recognition [61]. The learning basic images in all selected algorithms are *K* = 40, and the mean recognition rate are described in Table 1.

No. Trains | StCMF | CoCMF | GPNMF | NMF | PNMF | P-NMF (Fr) | P-NMF (KL) | OPNMF (Fr) | OPNMF (KL) | NNDSVD-NMF |
---|---|---|---|---|---|---|---|---|---|---|

5 | 90.85 | 90.30 | 86.5 | 84.5 | 82.4 | 83.7 | 85.0 | 80.0 | 79.0 | 43.0 |

6 | 91.75 | 92.25 | 87.5 | 84.4 | 85.81 | 85 | 84.4 | 83.0 | 82.0 | 39.3 |

7 | 91.17 | 94.75 | 87.5 | 83.3 | 87.33 | 85.6 | 85.9 | 84.4 | 80.0 | 36.8 |

8 | 93.75 | 93.88 | 88.75 | 88.75 | 88.5 | 88.8 | 88.0 | 84.3 | 83.0 | 40.8 |

9 | 97.50 | 95.50 | 92.5 | 85 | 90.75 | 87.25 | 87.5 | 84.0 | 83.0 | 42.3 |

Avg. | 93.00 | 93.34 | 88.55 | 85.19 | 86.96 | 86.07 | 86.16 | 83.14 | 81.4 | 40.44 |

Table 1 shows the detailed recognition accuracies of compared algorithms. As can be seen, our algorithms significantly outperform the other algorithms in all the cases. Almost algorithms achieve the best accuracy when the number of training face images per class is eight exceptionally our proposed methods and GPNMF. Besides, there is the same trend between the number of training images and accuracy rate; that is, the lower training numbers lead to a decreasing rate of recognition. StCMF achieves the best performance (97.50%) when the number of training samples is chosen largest. However, CoCMF achieves higher improvement in general.

It is observed that the above-selected algorithms employ a different kind of measurements such as Frobenius (Fr) and Kullback–Leibler (KL) and add more graph to regularize as well as adjust basic NMF to projective NMF. In a reprocessing image, centered aligning image technique is applied for other methods to enhance effective recognition rate that cannot be focused on our StCMF and CoCMF models. However, the best recognition rate of all obtained by our proposed CoCMF method which has extra regularizes term.

One of the difficulties in NMF is the estimation of the number of components or *K*. The choice of *K* results in a compromise between data fitting and model complexity; that is, a greater *K* leads to a better data approximation, but a smaller *K* makes a model being easier to estimate and fewer parameters to transmit. In almost NMFs, *K* is typically chosen such that it is larger than the estimated number of sources and follows the constraint *K*.

#### 4.2.2 Face recognition on GT dataset

Table 2 shows the recognition rates versus feature dimension by the competing methods on GT dataset. GT dataset exists with many challenging samples that are harder to recognize. Thus, the performance of all methods is lower than those of ORL dataset. In this dataset, the implement was done similarly as those in the previous section in choosing algorithms to compare as well as dividing randomly into two different sets, each containing a different number of testing and training images. In our experiments, we set *K* = 50 and range the number training being five odd numbers as {5, 7, 9, 11, 13}. The experimental results show that as the number of training images increases, the efficiency of the recognition system also increases. We can see that CoCMF method achieves the best performance and StCMF holds the second place in overall. All the methods obtain their best results when 13 training samples are used (the largest number of training sample in our experiment). In this case, the highest recognition rate belongs to the StCMF method again.

No. Trains | StCMF | CoCMF | GPNMF | NMF | PNMF | P-NMF (Fr) | P-NMF (KL) | OPNMF (Fr) | OPNMF (KL) | NNDSVD-NMF |
---|---|---|---|---|---|---|---|---|---|---|

5 | 39.64 | 59.40 | 59.14 | 54.70 | 46.84 | 58.90 | 57.97 | 57.89 | 48.08 | 23.80 |

7 | 54.80 | 62.25 | 60.96 | 59.38 | 52.50 | 60.20 | 60.88 | 60.44 | 48.68 | 23.83 |

9 | 75.20 | 69.67 | 62.5 | 62.40 | 54.93 | 64.03 | 63.35 | 62.48 | 48.84 | 24.30 |

11 | 69.50 | 70.50 | 65.37 | 65.20 | 57.25 | 63.75 | 63.38 | 63.17 | 49.36 | 27.35 |

13 | 77.60 | 73.00 | 69.00 | 67.40 | 61.60 | 65.60 | 64.05 | 63.50 | 49.50 | 30.20 |

Avg. | 63.35 | 66.96 | 63.39 | 61.82 | 54.63 | 62.50 | 61.93 | 61.50 | 48.90 | 25.90 |

#### 4.2.3 Face recognition on occluded ORL images

For a more convincing experimental assessment of the power of our proposed models in occlusion processing, we test the performance on occluded images of ORL database. In cropped 112 × 92 dimension test image gallery, occlusion was simulated by using a sheltering patch with different size ranges in set {10 × 10, 15 × 15, 20 × 20, 25 × 25, 30 × 30} and placed at random locations before resized in 28 × 21. Figure 4 shows examples of occluded ORL images.

In this experiment, we take randomly the training images with the ratio 4:6 for training/testing and test several times on each sort of percent of randomly occluded test image. Table 3 shows the detailed recognition accuracy on all selected algorithms and our proposed methods. It can be seen that the recognition rate of all methods is increased when the size of occlusion batch is decreased. Obviously, StCMF and CoCMF outperform other tested approaches even if occlusion. This reveals that StCMF and CoCMF are more robust outlier than the other.

Occluded Size | StCMF | CoCMF | GPNMF | NMF | PNMF | P-NMF (Fr) | P-NMF (KL) | OPNMF (Fr) | OPNMF (KL) | NNDSVD-NMF |
---|---|---|---|---|---|---|---|---|---|---|

15×15 | 79.58 | 80.21 | 75.16 | 74.32 | 72.55 | 69.16 | 71.25 | 74.18 | 45.16 | 54.46 |

20×20 | 72.08 | 73.79 | 64.52 | 65.45 | 62.15 | 67.52 | 71.23 | 65.00 | 41.52 | 25.62 |

25×25 | 70.00 | 71.17 | 65.54 | 55.18 | 52.38 | 65.54 | 62.19 | 55.00 | 35.54 | 19.83 |

30×30 | 52.08 | 61.54 | 54.53 | 45.62 | 43.87 | 48.53 | 55.21 | 45.89 | 28.53 | 13.22 |

35×35 | 39.17 | 41.00 | 43.25 | 33.63 | 31.06 | 43.25 | 38.79 | 33.39 | 23.25 | 16.13 |

Avg. | 62.58 | 65.54 | 60.60 | 54.84 | 52.40 | 58.80 | 59.73 | 54.69 | 34.80 | 25.85 |

## 5. Summary and discussion

In this paper, we have proposed a new approach to complex matrix factorization to face recognition. Preliminary experimental results show that StCMF and CoCMF achieve promising results for face recognition by utilizing the robustness of cosine-based dissimilarity and extend the main spirits of NMF from real number field to complex field which adds flexible constraints for the real-valued function of complex variables. We have also noted how strong is the proficiency of StCMF as well as CoCMF on face recognition task. Our proposed methods are simple frameworks which do not need more complicated regularizes like NMFs in the real domain. We believe that this capability of proposed methods will be stable in other application tasks. In future work, three aspects of the proposed system will be centered on. First, we add more regularized rules into objective function to a range of further application such as speech and sound processing. Second, we employ other classifiers such as complex neural network or complex SVM to treat well the complex-valued feature. Last, kernel methods will be exploited in both feature extraction and classification of StCMF and CoCMF constructed paradigm to develop the performance of nonlinear contexts.

## Acknowledgments

This research is partially supported by the Ministry of Science and Technology under Grant Number 108-2634-F-008-004 through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan.