## 1. Introduction

Detection of faces and facial patterns in static or video images is an important but challenging problem in computer vision, as faces may present in different scales, orientations, positions and poses in an uncontrolled background. To date, numerous approaches have been implemented for face and facial pattern detection [1-4]. These approaches have been grouped into four broad categories [2]. These are knowledge-based, feature based, appearance based and template based methods.

In this chapter, we present a simple yet robust algorithm for template matching method for face detection through eyes localization by using Dynamic Time Warping (DTW) algorithm with different poses and variations. By using DTW, we overcome some of the main limitations of Classical template matching.

We elaborate through discussion and experiments’ results that how our image processing strategy applied to DTW algorithm effectively increase the performance and detection rate. Using such an approach does not require translation of human knowledge about the face and facial patterns to computer representation, training data set and training mechanisms or face geometry, and does not require large numbers of model templates to handle different pose and variation changes shown by facial features. In the coming section, short study of Dynamic Time Warping algorithm (DTW) is presented. Proceeding to the next section we discuss the necessary steps for our image processing strategy. This is followed by presenting our approach of image partitioning, and in the next section we present dynamic time warping algorithm application in our work. Towards the end of this paper, results and discussion are put forward and conclusion is presented.

## 2. Visiting Dynamic Time Warping review

Dynamic Time Warping (DTW) is a fast and efficient algorithm for measuring similarity between two sequences. It is used to find the optimal alignment between two time series, if one time series may be warped non-linearly along its time axis. This warping between the two time series can be used to determine the similarity which may vary in time or speed. Similarity is measured by aligning two sequences and computing a distance between them [5]. This technique has made it possible to make a matching between two curves that are “subject not only to alteration by the usual additive random error but also to variations in speed from one portion to another” [5]. Dynamic Time Warping (DTW) was initially introduced to recognize spoken words [7]. Since then it has been used and proved useful in different applications such as: handwriting recognition [6, 8], signature verification [9-12], fingers print verification [13], face recognition [14, 15], speech and gesture recognition [16], protein structure [17], gene expression [18, 19], chromosome classification [20], object detection and classification [21, 22], curve matching [23,24] control system [25], image and shape matching [24,26] database and data mining [27- 29, 31, 32,34-37), brain activity classification [38], voice control [39] and so on.

The DTW algorithm uses a dynamic programming technique to align time series with a given template so that a total distance measure is minimized [25]. To align these sequences (the reference/template vector X of length M and the test vector Y of length N), a local cost matrix is defined where each cell (i,j) represents the pair wise distances between the i^{th} component of X and the corresponding component j^{th} of vector Y. Generally the Euclidean distance is suggested as a distance cost function in DTW algorithm to build a local cost matrix [30], the cost (distance) function has a small value when sequences are similar and large value if they are different. After computing the local cost matrix between two 1-dimenasional features’ vectors X and Y, a warping path is constructed by building a cumulative/global cost matrix D(i,j) [6,7]. The cumulative distances matrix D(i,j) is defined as the sum of the local distance d(i,j) found in the current cell and the minimum of the cumulative distances of the adjacent cells. A warping path is a concatenation of cells starting from (1,1) to the cell (M,N). The objective is to search an optimal path for which a least cost is associated. This path provides the low cost areas between the two feature-vectors (see section 3.5 for more details of building the local and global matrices).

In DTW, contsriants are applied for constructing optimal warping path [7, 27]. Constraints are widely used to speed up DTW [7, 27, 30, 33] by reducing the number of paths to be considered during the computation. “These constraints serve to reduce the search space- the space of possible warping paths. Searching through all possible warping paths is computationally expensive. Therefore, out of concern for efficiency, “it is important to restrict the space of possible warping paths” [27]. Several well-known constraints have been applied to the problem to restrict the moves that can be made from any point in the path. Sakoe and Chiba applied a pattern matching algorithm with a nonlinear time-normalization effect using dynamic programming to spoken word recognition [7]. They introduced several kinds of reasonable constraints as outlined below [27]:

Monotonicity: The alignment path doesn’t go back in time index, i

_{k-1}≤ i_{k}and j_{k-1}≤ j_{k.}

This guarantees that features are not repeated in the alignment.

Continuity: The alignment doesn’t jump in time index, i

_{k}- i_{k-1}≤ 1 and j_{k-}j_{k-1}≤ 1. This guarantees that important features are not omitted.Boundary: The alignment starts at the top-left and ends at the bottom-right. This guarantees that the sequences are not considered only partially.

Warping window: A good alignment path is unlikely to wander too far from the diagonal. This guarantees that the alignment doesn’t try to skip different features or get stuck at similar features, and allowable points must fall within a given window, |i

_{k}- i_{k-1}| ≤w; where w is a positive integer window width (threshold).

A new restriction called a slope constraint was introduced in their work and it was claimed that symmetric form and slope constraint are effective, which resulted in optimum performance in comparison to several dynamic programming algorithms [7]. Slope constraint was also used later by Wang et al. [23], to prevent unrealistic warping.

Berndt and Clifford [27] applied the constraints introduced in Sakoe and Chiba work [7] using symmetric form to find patterns in time series for archiving purpose. They used tracing backward technique to find the optimal warping path. Daniel et al. [33] proposed the threshold DTW (TDTW) algorithm based on the DTW technique, for two dimensional spatial activity recognition, by introducing a user defined threshold to the diagonal matching condition of the DTW. Their work showed that the TDTW is less affected to noise and they accomplished higher classification accuracy than DTW. David Clifford et al. [36] used DTW for aligning chromatogram signals. A variable penalty was introduced into the DTW that was added to the distance metric to reduce the number of non-diagonal paths. The percentage of non-diagonal moves taken during the usual DTW ranges from 52% to 70%. After adding the variable penalty to DTW, This range was reduced and it ranges from 1% to 5%. It was shown in the experiments that penalized DTW is highly significant and achieves good alignment of peaks in chromatograms.

These constraints are best visualized in [27, 23, 40]. Using constraints helps speed up DTW execution time by a constant factor but the DTW algorithm time complexity remains O(N^{2}V) [40].

Some of the principal advantages of using Dynamic Time Warping include its capability to handle different scale and translation, its ability to compare two sequences of signals with different elongations. It does not require complex mathematical models.And “it is more robust against noise and provides scaling along the time axis. This ability allows DTW to identify similarities far more accurately and so enhances the functionality of the applications that use it” [31]. Furthermore, a simple pattern representation template is enough for similarity matching and detecting purposes with DTW [28]. However, DTW algorithm poses limitations in terms of time and space complexity [28, 40]. The time and space complexity of DTW is approximately *O (NMV)*, where *N* and *M* are sequences lengths (*O (N*^{2}*V)* if the two sequences is equal in length), and *V* is the number of templates to be considered. This means that practical applications may be quite slow. To overcome these limitations, researchers proposed different methods to make DTW faster and to reduce the complexity of time and calculations required for alignment process.

Another significant issue with DTW is to achieve the higher accuracy for classification problems. This issue is dealt on application by application or domain by domain basis. Research efforts are being put forward to address this issue [48,49].

### 2.1. Dynamic Time Warping and face detection

Dynamic time warping gained significant attention in the last decade as a classification algorithm and similarity measure technique for different applications in different domains. Researchers have also attempted to use DTW with different existing algorithm within the domain of face localization and face recognition.

Luis et *al.* [43] introduced a face localization algorithm using DTW supplemented with skin color filter in RGB space to reduce the computational complexity. The template used in DTW is an average face created by manually extracting and aligning 40 different human faces with a constant size of 80 × 83 from 66 still images. The averaged face is quantized and smoothed using median filter prior to extracting the feature vectors. The feature vectors, in their work, consist of vertical projection of the template and two horizontal vectors one for the eyebrows and eye region and the later for the nose and mouth region. Prior to DTW classification algorithm, they had segmented the image using a morphological skin color filter in RGB space. Then they had extracted the large regions, which contain the skin color as a candidate face region and restricted the DTW classification algorithm to the extracted regions that had reduced the total computational complexity of their approach. Finally the DTW is used to calculate the Pearson correlation coefficient as a similarity distance measure for every extracted feature vector in stepwise. Alexander et *al.* [15] proposed DTW and LSTM ANN algorithms for face recognition. For DTW, they used the Euclidian distance as similarity measure. The templates consists of 50 face images of the size of 19 × 19 pixels converted into 1-D 361 element feature vector. The test set consists of 50 face images for the same persons in the templates and the DTW classification algorithm is applied to calculate the similarity measure for every test image with all the templates. To validate their approach, they repeated the experiments in three scenarios:

The templates and the test images are free of noise.

The templates are free of noise and the test images are degraded with Gaussian noise (σ

^{2}=0.1).The templates and the test images are degraded with Gaussian noise (σ

^{2}=0.1).

The first scenario showed a 100% detection rate and the path sizes matrix showed a perfect alignment of the templates and the test faces. The performance of the algorithm was reduced to 94% when applied to the second scenario; however, it shows the robustness of the algorithm in the presence of noise. The last scenario the performance is further reduced to 70%. All the scenarios are repeated using MSE as similarity measure and the performances were reported as 92%, 72%, and 34% respectively. For LSTM neural network, they used 50 faces converted to 1×361 feature vector as training set to represent the main classes. Eigen faces were extracted using PCA to reduce the dimensionality of the input space. The experiment is performed in two scenarios:

Using 10 PCA, and

Using 20 PCA.

Selecting the learning rate to be 0.02 and training the LSTM ANN for 50000 epochs in both scenarios to reach 0.05 mean square error. In the first scenario the detection rate was 96% and 88% for the second one. Computational complexity of their approach was not investigated. The proposed LSTM ANN exhibited high performance and robustness.

## 3. Method for face detection

The eyes region has the most edge information in a facial image [42], and hence, it can be considered as a vital signature of a human face. Thus, the eyes region was chosen in this study as a facial pattern to apply the proposed method.

To have an eye region as a facial pattern template, a face image of a person from a data set (reference) has been cropped to eyes. Significant information (features) is then extracted from this particular template and transformed into a 1-Dimendioanl (1-D) series sequence. In addition, this facial pattern is used as a reference to detect eyes in the input images from the data set. On the other hand, an input face image from the image data set is gone through a partitioning process (see sub-section 3.2.3). This process produces virtual partitions of an input image. Each partition is equivalent to the template image in size. For each partitioned region, significant information is extracted and converted into a 1-D vector sequence to be aligned with the reference sequence (from template image) by DTW. The partition which belongs to a sequence that best matches the reference sequence is selected as the eyes/face region of the facial image. The proposed image processing strategy consists of the following steps:

Edge Detection,

Histogram Equalization,

Image Partitioning, and

Features-Vectors Extraction.

Figures 1 and 2 present the functional flow of the image processing strategy and the image processing strategy respectively. All the steps stated above are described in the following sections.

### 3.1 Edge detection and cropping rules

The first step of proposed strategy is edge detection. An input image is processed for edge detection to find the region boundaries. Standard Sobel operators are used with 3X3 filters for edge detection. Once the boundaries of the image are determined, a cropping step is applied.

Cropping an image is important in reducing the size of the data and iterations required to scan full image by DTW. For this purpose, a cropping rule is set instead of the manual cropping. An image is cropped till the first boundary region determined in the previous step is obtained. Figure 3 shows the original input image and the edge image for the same image, as well as the result of the cropping rules.

### 3.2. Histogram equalization

Intensities in the images are highly sensitive to external factors, such as illuminations, pose, rotations of the images, etc. These external factors affect the distribution of intensities in the histogram of the images [44], which in turn affects the detection rate to a great extent. In order to make the intensities relatively insensitive to a particular contrast, and brightness of the original image, etc., a histogram equalization is applied with a flat envelop on the image to re-distribute the intensities throughout the range. Figure 4 shows the histogram of an image before applying the histogram equalization step. In comparison to the histogram shown in Figure 4, the histogram of the same image is shown in Figure 5 after applying the histogram equalization. It can be seen in Figure 5 that the intensities of the image have been distributed throughout the range of the histogram.

### 3.3. Image partitioning

Im.age partitioning is an important step of the proposed strategy for face detection using DTW. In this step, virtual image partitions are extracted from an image. It is applied only on the test (input) images, which have been cropped through the steps of the image processing strategy. The image partitioning performed by our heuristic is given in Figure 6 below.

All partitions are equal in the size of the template image. Partitioning of an image is performed from the beginning of the first row and column and these are continued till the last row and column. All image partitions are further processed through the steps of the proposed image processing strategy, as shown in Figure 1. These image partitions are transformed into 1-D feature vectors (see section 3.4). It is important to note that at this point, DTW algorithm is used to measure the similarity between the template features vector and the vectors of the partition features to detect facial patterns and faces.

Partitioning process has been elaborated with the example given in Figure 7. In this figure, the first input image partition is virtually extracted (equivalent in the size of the template) from the first row and the first column, as shown by the sky blue colour. The second partition is extracted from the first row and the second column, which is shown in white (see Figure 7). In this way, the partitioning of an input test image goes through each column and row in a test image to find the optimal path between the two sequences. Thus, horizontally, a column and vertically a row becomes a *step size*.

### 3.4. Features vector extraction

To extract a particular feature from the image using 1-D DTW algorithm, horizontal and vertical projections are computed by integrating the columns and rows to produce 1-D vectors. The horizontal and vertical integral projections are presented in Equations 1 and 2, respectively.

In Equations 1 and 2, N is the width of the template/input partition image (the total number of columns available), M is the height of the template/input partition image (total number of rows). Figure 8 and 9 show the horizontal and vertical relief of the image (eyes region) respectively.

The feature vector is formed by concatenating the y_{-relief} and x_{-relief} vectors. From Equations 1 and 2, the feature vector f can be obtained, as follows:

Where ⊕ denotes vector concatenation and k=N+M.

Equation 3 represents the feature vector extracted from a 2-D image. In the present work, the last element of the row sequence was joined to the first element of the column sequence. It was found through the experiment that an advantage of concatanating the horizontal and vertical projections in the features would include the cross correlation between the values in the rows and columns in the resulted accomulative distance matrix (in DTW algorithm). This row-column cross correlation has been found to make the classification less sensitive to the variation in the pose and orientation of the images. Figure 10 shows the result of cocatnation the vertical and horisontal vectors of the image template (eyes in this case).

### 3.5. Dynamic Time Warping algorithm

Assume that there are two image vectors, I (the input image partition) and T (representing template image) each having noise

In Equations 4 and 5, m denotes the number of columns (the length) in the input image’s partition I, and k denotes the number of column in the template image T (both vectors I and T have same length in the proposed method), whereas I_{n} and T_{n} are the input image partition and template image vectors with noise, respectively.

To align the image sequences, first a grid, I× T, local cost matrix is defined, where each cell represents the distance between the corresponding indices of the two feature-vectors. Normally, Euclidean distance is chosen to calculate the distance between the two elements in the matrix, d. The local distance matrix d can be computed as follows:

The d matrix is of the size of m×k, presenting the cost values between each element of vector I and T, as shown in Equation 3.7 below:

Rearranging the terms in each element in Equation 7, the local distance matrix can be written as shown in Equation 8:

After computing and building the local distance cost matrix (d) between two 1-D features’ vectors I and T, a warping path (WP) is constructed by building a cumultaive/global cost matrix D(i,j). As previously mentioned (see section 2), the cumulative distance matrix D(i,j) is defined as the sum of the local distance d(i,j) found in the current cell and the minimum of the cumulative distances of the adjacent cells. A warping path is a concatenation of the cells starting from (1,1) to the cell (m,k). Let’s define the minimum operator as follows:

Where M_{ij} is the minimum of the three cells in the global cost matrix D, from where the cell can be reached (depicted in Figure 11).

Therefore, the global distance matrix (or the accumulative distance) can be formed from *d* (*I, T*) in Equation 7, as presented in Equation 10.

Applying the minimum operator and the steps to create the global distance matrix will result in building matrix D. Equations 11-21 represent the process of building the global cost matrix D.

The first element in the D matrix is equivalent to zero.

The entire first row and first column of the global matrix D is filled by infinity to ensure that the first element D (1,1) is the minimum. The second element of D, D (2, 2) is filled as follows:

Proceeding in the same manner for the second column of D and the second row of D:

In the same manner, the contents of the second column can be written as:

Once again, the second row elements of the matrix D can be expressed in the same manner, as:

The rest of D elements can be computed on the same manner, D will be:

Constraints are applied for constructing the optimal warping path (Sakoe & Chiba 1978), (see section 2 for more details).

Monotonicity: The alignment path does not go back in the time index, i

_{l-1}≤ i_{l}and j_{w-1}≤ j_{w.}This guarantees that the features are not repeated in the alignment (see Figure 3.12).

Continuity: The alignment does not jump in the time index, i

_{l}– i_{l-1}≤ 1 and j_{w-}j_{w-1}≤ 1. This guarantees that the important features are not omitted. Continuity condition is shown in Figure 12, along with the monotonicity constraint.Boundary: The alignment starts at the beginning of the matrix and ends at the bottom-right. This guarantees that the sequences are not considered only partially. The blue boxes in Figure 12 show that the path starts at D (1,1) and ends at D (m,k).

Warping window: A good alignment path is unlikely to wander too far from the diagonal. This guarantees that the alignment does not try to skip different features or get stuck at similar features, and allowable points must fall within a given window, |i

_{l}– i_{l-1}| ≤w; where w is a positive integer window width (threshold). Windowing constraint appears in Figure 12, as the two dark red lines surrounded the purple diagonal line.

Figure 12 illustrates the global cost matrix together with the warping path and the constraints applied to the paths.

## 4. Working example of the Dynamic Time Warping algorithm for face detection

The proposed algorithm is implemented in MatLab 9 running on a computer with 2.0GHz dual core OPTERON processor and 1.87GB RAM. To evaluate the proposed techniques and method, two publicly available databases were used in this study. One database, containing face images with a resolution of 512 x 342 pixels under controlled lighting conditions and different rotations, is available from University of Bern, Switzerland [47]. Additionally, face images have changes in their facial expressions and immediate changes in head poses (right, left, up down and straight). The second one is the BioID face images database with the resolution of 384 x 286 pixels [46]. This database has face images with a complex background, along with different expressions and various illumination conditions, poses, and rotations. We conducted the experiments in different settings to evaluate the efficacy of our approach. In first setting, we applied the technique for eyes localization using DTW without our image processing strategy; the detection rate was very low. In second setting, we applied our image processing strategy with histogram equalization and DTW algorithm on the same dataset, the detection rate improved within 35%. Figure 13 shows the arbitrary eyes template chosen for our experiment.

With our experimental analysis and observation, histogram equalization reduced the effect of light on intensity distribution.

Comparing to other template matching methods [45] our system is simple fast and easy to implement with no training data set and shape model is needed, and template of eyes is manually cropped. The proposed method is showing acceptable detection rate and less time complexity comparing to other classifier and distance measurement systems.

Using the proposed image processing strategy, features vector were extracted (see section 3.4). With these extracted features, histogram equalization was applied with a flat envelop for both the template image and input test image(s) so as to re-distribute the intensities throughout the range. Histogram equalization distributes the intensities of an image throughout a range of histogram. The better distribution of the intensities has helped to detect facial patterns and face in bright background with existing illumination and lighting conditions. For example, the face image shown in Figure 14(a) was processed in Experiment setting 1 for the eye detection. Figure 14(a) shows that the eyes were not detected before histogram equalization was applied. Figure 14(b) illustrates the alignment between both the template and input image template vectors.Before thatfigure 14(c) depicts the histogram distribution for the image shown in Figure 14(a). It is obvious that both vector sequences are not well aligned. With our experimental analysis, histogram equalization was applied to reduce the effect of light on intensity distribution. The same facial image is shown in Figure 15(a) after applying histogram equalization. From the figure, it can be seen that the eyes have been detected correctly. Furthermore, both the vectors have been well aligned, as shown Figure 15(b) below:

Figure 16 and 17 show some of the image faces in different poses and lighting conditions, which are detected correctly and the alignment between the template image and the test images corresponding to these images, respectively.

## 5. Conclusions

We have presented our approach of template-based matching method and Dynamic time warpping to efficiently enhance the detection and localization of eyes in complex background with face images under differnt illumination, light conditions, expressions and poses. This method overcomes some of the limitations of tempalte matching for face detection. With more developed DTW technique this method helps in getting high accuracy in matching process. Results of our approach show that supplementing DTW with proposed image processing strategy enhances the detection of facial feature patterns and face detection by overcoming intensities and face pose variations. Future work involves the improvement of DTW algorithm and Dynamic Programming constraints to further increase the detection rate.