Transfer Learning of Pre-Trained CNN Models for Fingerprint Liveness Detection

Machine learning experts expected that transfer learning will be the next research frontier. Indeed, in the era of deep learning and big data, there are many powerful pre-trained CNN models that have been deployed. Therefore, using the concept of transfer learning, these pre-trained CNN models could be re-trained to tackle a new pattern recognition problem. As such, this work is aiming to investigate the application of transferred VGG19-based CNN model to solve the problem of fingerprint liveness recognition. In particular, the transferred VGG19-based CNN model will be modified, re-trained, and finely tuned to recognize real and fake fingerprint images. Moreover, different architecture of the transferred VGG19-based CNN model has examined including shallow model, medium model, and deep model. To assess the performances of each architecture, LivDet2009 database was employed. Reported results indicated that the best recognition rate was achieved from shallow VGG19-based CNN model with 92% accuracy.


Introduction
Recently, deep CNN models have been successfully applied for many pattern recognition problems such as human facial expression recognition [1], vehicle detection [2], and lung diseases diagnosis [3]. The application of CNN models for fake fingerprint recognition was investigated by Nogueira et al. [4]. Particularly, they have studied the effectiveness of different schemes including Local Binary Patterns (LBP), SVM, VGG, and Alexnet model. These discussed models were evaluated using the dataset of liveness detection competition for the years of 2009, 2011, and 2013. The outcomes of Average Classification Error (ACE) measure showed that the best accuracy of was reported by VGG-based deep model was 3.4. Further Anti-spoofing approach for fingerprint recognition was conducted by Uliyan [5]. They have presented deep Restricted Boltzmann Machines (RBM) to encode and represent the features. Then, KNN classifier was used to classify the input pattern as real or fake case. To assess the performances of RBM-KNN model in [5], LivDet dataset was used. Reported results showed that 3.6 ACE value was achieved on LivDet 2013 benchmark images.
An incremental learning approach was given by Kho et al. [6]. The key idea is that an ensemble of SVM classifiers was constructed using boosting technique. Specifically, each base classifier in the ensemble model was trained with different subsets of the given training set. For feature extraction, three different types of handcrafted features were utilized namely LPQ , LBP, and BSIF. Experimental results indicated that the presented ensemble model outperforms single SVM classifier. In addition, they have investigated the performances of CNN as a feature extractor with ensemble model as a classifier. The outcomes show the superiority of deep CNN features against the classical hand-crafted features, that is, LPQ , LBP, and BSIF. A recent deep CNN-based approach was discussed by Fei et al. [7]. In their work VGG19, Alexnet and Mobilenet CNN models were employed. Their models were retrained with LiveDet2013 and LiveDet2015 images. The outcome indicated that the best accuracy performance was achieved from VGG19 among other CNN-based models.
Nowadays, transfer learning becomes a promising technique that could be applied to utilize and reuse a powerful pre-trained CNN models to handle different pattern problems. For example, a transferred CNN models was applied for the recognition of brain tumors [8], wildfire detection [9], pneumonia diagnosis [10], seizure classification [11], remote sensing image retrieval [12], and bearing fault detection [13]. Nevertheless, the idea of transfer learning of a pre-trained CNN network is considered as a new and has not been widely studied for liveness detection. As such, this work is aiming to investigate transferring of various architectures of VGG19 CNN model to handle the problem of liveness detection. The remaining part of this chapter is organized as follows. The proposed transferred model is explained in Section 2. A series of experiments has been conducted to evaluate the effectiveness of the proposed approach is given in Section 3. A summary of the research findings and conclusions of this study is presented in Section 4.

Architecture of pre-trained VGG19 CNN model
The basic architecture of VGG19 CNN model is given in Figure 1. As can be seen that VGG network contains four different types of layers namely convolution layer, max-pool layer, fully connected layer (FC), and soft-max classification layer.  The main aim of convolution layer is to perform convolution operation of a pretrained filter with the input image. As indicated in Figure 1, the input image size is 224 × 224 × 3 and the first layer consists of 64 filter of size 3 × 3. Going deeper into VGG, the number of convolution filters has been increased from 64 to 512 as shown in Figure 1.
Max-pooling layer in VGG19 is used to reduce the dimensionality of input data. In particular, a sliding window of size 2 × 2 has been used for computing the max value in the sliding box which represents the reduced data. As such, after applying max-pooling operation, the image of size 224 × 224 will be reduced to half of its size and becomes 112 × 112. So, these CNN operations, that is, convolution and maxpooling are repeated until the final image size becomes 14 × 14 as shown in Figure 1.
After that, a flattening operation is applied to reshape the data from 14 × 14 × 15 to be as 1-D vector of size 4096.
Fully connected layers in VGG19 will take an input 1-D vector of size 4096 and feed it to a fully connected neurons of size 4096. It should be noted that VGG19 contains two consecutive FC layers with the same size as shown in Figure 1. Finally, soft-max classifier is used to perform the task of classification. Therefore, the input image will be classified as one of the 1000 different classes which are car, dog, etc.

Transfer learning of pre-trained VGG-19 CNN model
The basic idea of transfer learning is to employ a pre-trained network such as VGG19, then, to perform replacement for the last layer, that is, soft-max classifier. The new classification layer will be set according to the number of classes in the problem that need to be tackled. Finally, the model will be re-trained with a new training set. This idea is described in Figure 2.
In this study, the performance of three different architectures of VGG19 will be investigated. The transferred models include shallow, medium, and deep model as shown in Figure 3. For example, shallow VGG19-based CNN model contains the first and second block of VGG19. In addition, soft-max classifier has been replaced with a new classifier with two classes, that is, neurons. One neuron of soft-max is used to recognize and give probability of fake fingerprints meanwhile the second neuron is used for recognizing real fingerprints. It should be noted that the architecture of deep VGG19 CNN model contains the whole layers except the classification layer which replaced with two neurons as explained previously.

Experimental analysis
This study uses the database of LivDet2009 Database [14]. A few samples for real and fake images are shown in Figure 4. As described in [14], fake images were collected from a cloned fingerprint using silicon material. The total number of images used in this analysis was 1040 images for training and 2953 images for testing purposes.
The conducted analysis examined three different types of VGG architecture which are shallow, medium, and deep CNN model. Besides that, a new CNN model has been crated from scratch with the same architecture of shallow model. Each CNN model in this experiment was trained using the same training set. Table 1 shows the outcomes for each model. As can be seen from the reported results that created CNN from scratch produced the worst performances in terms of all examined measures. This is due to lack of number of training images which usually required for building deep CNN models. On the other hand, the transferred shallow VGG19-based CNN model was able to achieve the best performances in terms of accuracy, precision, recall, and F1 score. Deep CNN model  achieved the lowest performances among the transferred models because it lacks for generalization as compared with shallow model. Additional analysis was conducted by computing the confusion matrix for each model as reported in Tables 2-5. As shown from the results, the best true positive rate (TP) was achieved from shallow model. Specifically, the shallow model was able to correctly classify a total of 1308 cases with only 165 missing cases. In addition, shallow model reported the minimum false alarms with only 70 cases as given in Table 3. This is due to the benefit of transfer learning and generalization ability as compared with deep CNN models.
Further analysis was conducted by computing the receiver operating characteristic curve (ROC) for each studied model. ROC is shown in Figure 5, and the plotted curves show a very close results achieved from shallow and medium model. The worst performance was produced by a CNN model created from scratch as given in Figure 5.
Finally, the area under the curve (AUC) measure was computed for each model as given in Table 6. As can be shown that AUC value resulted from the transferred models outperform the outcomes of CNN model created from a scratch. This implies that transfer learning of a pre-trained models represent a good alterative to be used instead of building a new CNN model from a scratch which required a huge training data.      Figure 6 visualizes the intermediate layers of the transferred VGG19 model for the three studied architectures, that is, shallow, medium, and deep model. As can be seen in Figure 6 that at deep layers, the fine details of fingerprint are disappear. This is due to max-pooling operations which shrink down image size. This implies that shallow and intermediate layers produce better recognition results owing to keeping the content and details of the convolved input image as shown in

Conclusion
This chapter discusses the idea of transfer learning technique of a pre-trained VGG19 model to handle the problem of liveness detection of fingerprint images. A total of three different architectures of VGG19 were examined in this chapter. These architectures include shallow, medium, and deep CNN model. The reported results confirmed the performances of the transferred VGG19 models as compared with a CNN model created from scratch. Among the transferred VGG19 models, shallow model shows the best performances in terms of accuracy, precision, recall, and F1 score.

Conflict of interest
Authors declare no conflict of interest.