Abstract
So far, there exist many publicly available palmprint databases. However, not all of them have provided the corresponding region of interest (ROI) images. If everyone uses their own extracted ROI images for performance testing, the final accuracy is not strictly comparable. Since ROI localization is the critical stage of palmprint recognition. The location precision has a significant impact on the final recognition accuracy, especially in unconstrained scenarios. This problem has limited the applications of palmprint recognition. However, many currently published surveys only focus on feature extraction and classification methods. Throughout these years, many new ROI localization methods have been proposed. In this chapter, we will group the existing ROI localization methods into different categories, analyze their basic ideas, reproduce some of the codes, make comparisons of their performances, and provide further directions. We hope this could be a useful reference for further research.
Keywords
- biometrics
- palmprint recognition
- palmprint database
- region of interest localization
- palm region segmentation
1. Introduction
Palm-related biometrics can easily reach high accuracy due to two reasons. One is that palmprint contains plenty of features, such as principal lines, wrinkles, ridges and valleys, and minutiae points; another one is that the regions of interest (ROIs) could be aligned with the help of the finger valley points. Since the captured palms may have different rotations and scales, to obtain high accuracy, the extracted palmprint images should be aligned with each other. It means the palmprint region should be localized based on the relative coordinate system, which is established basing on the keypoints of the finger valleys. Most of the current palmprint recognition algorithms are based on the direction information of the palmprint lines and textures [1, 2]. Hence, misalignment will significantly affect the final matching score. A robust and precise ROI localization method is essential for palmprint recognition, especially for touchless applications. Many organizations have collected their palmprint databases based on different research targets. More and more novel databases arise in recent years; some of them are captured across different devices, some with different illuminations, and some at different distances. In the following section, we will review the current palmprint databases and ROI localization methods.
2. Palmprint databases and ROI localization methods
2.1 Comparisons of the current palmprint databases
Table 1 summarizes the current palmprint databases. Some basic information is compared. In Table 1, official ROI means whether the official ROI images are provided; localization code means whether the corresponding ROI extraction code is released. Some sample images of these databases are shown in Figure 1.
Dataset | No. of palms | No. of images | No. of sessions | Image size | Format | Official ROI | Localization code | Ref. | |
---|---|---|---|---|---|---|---|---|---|
PolyU | 386 | 7752 | 2 | BMP | [3, 4] | ||||
PolyU-MS | 500 | 24,000 | 2 | JPG | ✓ | [5, 6] | |||
CASIA | 624 | 5502 | 1 | JPG | [7, 8] | ||||
CASIA-MS | 200 | 7200 | 2 | JPG | [9, 10] | ||||
IITD | 470 | 2601 | 1 | BMP | ✓ | [11, 12, 13] | |||
PolyU-IITD | 1400 | 14,000 | — | JPG | — | [14] | |||
COEP | 167 | 1305 | 1 | JPG | [15] | ||||
KTU | 145 | 1752 | 1 | BMP | ✓ | [16, 17] | |||
GPDS | 100 | 2000 | 1 | BMP | ✓ | [18, 19] | |||
Tongji | 600 | 12,000 | 2 | TIFF | ✓ | [20, 21, 22] | |||
MPD | 400 | 16,000 | 2 | JPG | ✓ | ✓ | [23] | ||
NTU-CP-v1 | 655 | 2478 | 2 | 420 | JPG | [24] | |||
NTU-PI-v1 | 2035 | 7781 | — | 30 | JPG | ✓ | [24, 25] |
Table 1.
Comparisons of the publicly available palmprint datasets.

Figure 1.
Image samples of different databases. (a) PolyU Palmprint Database; (b) PolyU Multi-spectral Palmprint Database; (c) CASIA Palmprint Database; (d) CASIA Multi-spectral Palmprint Database; (e) COEP Palmprint Database; (f) GPDS100Contactlesshands2Band Palmprint Database; (g) KTU Touchless Palmprint Database; (h) NTU Contactless Palmprint Database; (i) NTU Palmprint Database from Internet; (j) IITD Touchless Palmprint Database; (k) PolyU-IITD Contactless Palmprint Database; (l) Tongji Palmprint Database; (m) Tongji MPD.
2.1.1 The Hong Kong Polytechnic University (PolyU) Palmprint Database
The PolyU Palmprint Database is the first publicly available palmprint database. It contains 7752 images captured from 386 hands in two sessions; around 10 images are collected for each palm in each session. The palmprint acquisition device is a contact-based device that consists of a high-quality industrial monochrome camera and a well-selected ring light source. The palm pose also is restricted by the pegs. So the captured images have high image qualities. However, the released image resolution,
2.1.2 PolyU Palmprint Multi-spectral Database
Authentication by just RGB or gray images may not be safe enough; attacks from fake palm images and videos can easily spoof the system. Hence, the multi-spectral based palmprint recognition starts to draw attention. Four spectrums, red, green, blue, and near-infrared (NIR), are utilized to establish the PolyU Multi-spectral Palmprint Database. A contact-based device is employed to capture images from 250 volunteers in two sessions. 24,000 images are captured from 500 palms. Each palm contributes six images in each session under each spectrum. Our observation shows that the images captured under blue light have the highest sharpness, while that captured under NIR light has the lowest.
2.1.3 CASIA Palmprint Image Database
In CASIA Palmprint Database, 5502 images are captured from 312 subjects, around 9 images for each palm. The authors also made their own image acquisition device, which has a big enclosure and a black backboard. During the capture process, the user puts his/her palm into the enclosure back on the unicolor board; the ambient light is blocked by the enclosure. Hence, the palm images are captured in an ideal environment, but the sharpness of the palm images is not very high. Besides, some palm images are captured with significant rotations, and some fingers have moved out of the imaging window. These factors make it difficult to localize the palmprint ROIs.
2.1.4 CASIA Multi-spectral Palmprint Image Database V1.0
CASIA-MS-Palmprint V1 is a touchless multi-spectral palmprint database collected under six spectrums in two sessions. The 460, 630, 700, 850, 940 nm and WHITE spectrums are employed in their self-developed device. However, the image sharpness also is not very high compared with the PolyU multi-spectral database in which the palm is captured by a contact-based device. This database contains 7200 palm images captured from 200 different palms. For each session, under each light spectrum, three images are captured from each palm.
2.1.5 IIT Delhi (IITD) Touchless Palmprint Database version 1.0
The palm images in the IITD palmprint database are captured with large rotation variations. The touchless imaging setup consists of a big black box, a digital camera, and a circular fluorescent light source. It provides 2601 palm images collected from 470 hands, including 1301 left-palm images and 1300 right-palm images. The official ROI images have been normalized and thus show obvious principal lines and wrinkles.
2.1.6 PolyU-IITD Contactless Palmprint Images Database version 3.0
This database is collected from the volunteers in different countries, China and India, by a general-purpose handheld camera over the years. Totally 14,000 images are captured from 1400 palms. The characteristic of this database is that the images are collected across different locations, times, occupations, and age ranges. Both normal and abnormal hands are involved (as is shown in Figure 1(k)).
2.1.7 COEP Palmprint Database
The palm images in this database have high resolutions. They are captured by Canon PowerShot SX120 IS, the image resolution is
2.1.8 GPDS100Contactlesshands2Band Database
Both visible light camera and infrared (IR) light camera are adopted to collect 1000 visible light palm images and 1000 IR light palm images from 100 volunteers. Each palm contributes 10 visible light images and 10 IR light images. The user places his/her palm over the camera and touchlessly adjusts the position and pose of the hand in order to overlap with the hand mask drawn on the device screen. The image sharpness is not very high. However, it is a meaningful database because the images’ qualities are more close to that captured in real-world applications.
2.1.9 KTU CVPR Lab. Contactless Palmprint Database
The author made a new device by a low-cost camera to capture palm images with a resolution of
2.1.10 Tongji Palmprint Database
It was the biggest touchless palmprint database in 2017. The authors also made a novel palmprint acquisition equipment that consists of a digital camera, a ring light source, a screen, and a vertical enclosure. This device can capture both visible light palmprint images and infrared light palm vein images. During collection, the user’s palm is put into the enclosure to avoid ambient light. At the same time, the upper screen will show the palm in real time, so that the user knows how to put his/her hand and when to stop and hold. Totally 12,000 images are captured from 600 hands in 2 sessions. For each palm, in each session, 10 palmprint images are collected.
2.1.11 Tongji Mobile Palmprint Dataset
The device used in Tongji Palmprint Database provides a stable environment for palmprint acquisition; this strategy can ensure the final recognition performance. However, the big enclosure also has limited its applications. So the Tongji group further collected another novel database by the widely used mobile phones. The palm images are captured in the natural indoor environment. Two mobile phones are used, including HUAWEI and Xiaomi. This dataset contains 16,000 palmprint images from 400 palms collected in 2 sessions. In each session, each mobile phone captures 10 images for each palm. All palm images are labeled, and corresponding codes are released on the author’s homepage [23].
2.1.12 Xi’an Jiao Tong University (XJTU) Unconstrained Palmprint Database
The XTJU-UP databases [26] are collected by five mobile phones, including iPhone 6S, HUAWEI Mate 8, LG G4, Samsung Galaxy Note5, and Xiaomi MI8, with and without the built-in flashlight source. There are 100 volunteers; each palm provides 10 images for each phone, under each illumination condition. So, totally 20,000 images are captured (
2.1.13 Nanyang Technological University (NTU) Palmprint databases version 1
The NTU palmprints from the Internet (NTU-PI-v1) database consists of 7781 hand images collected from the Internet. Hence, the palm images are captured in an uncontrolled and uncooperative environment. The images in it are collected from 2035 different palms of 1093 subjects with different ethnicity, sex, and age. Around four images are collected for each palm. It is the first large database established for studying palmprint recognition in the wild. But the image sharpness is relatively low compared with the normal palmprint images. The NTU Contactless Palmprint Database (NTU-CP-v1) contains 2478 palm images captured from 655 palms of 328 subjects using cameras of Canon EOS 500D or NIKON D70s, around four images for each palm. Currently, the samples for each category are relatively few compared with the other databases.
2.2 Related work on ROI localization
The most widely used ROI localization method is proposed in [4]. Its main idea is first detecting the keypoints of the finger valleys and then establishing a local coordinate system based on the detected keypoints, so that the ROI coordinates are determined based on the palm direction and position (as is shown in Figure 2). Most of the current ROI localization methods [10, 27, 28, 29, 30, 31, 32] are based on this strategy. The main problem of ROI localization is keypoint detection. There are two approaches to localize the landmarks: one is first segmenting the palm region and then searching for landmarks using the digital image processing techniques based on the detected edges; another is directly regressing the landmarks by utilizing both the hand shape and texture information.

Figure 2.
The classical keypoint detection methods for ROI localization. (a) Local-extremum-based keypoint detection for a palm sample with big rotation; (b) distance curve for fingertips and finger valley points detection; (c) line-scan-based keypoint detection method; (d) notations for ROI coordinates computation; (e) the normalized ROI image.
2.2.1 Classical methods
One important goal of the first strategy mentioned above is simplifying the background. There exist three approaches:
Capture the palm with a unicolor backboard [4, 6, 8, 12, 21, 33]
Employ an IR camera or a depth camera to capture an IR image or a depth image to assist palm segmentation [34, 10, 30, 35]
Enhance the contrast of the foreground and background by setting a strong light source intensity and a short exposure time
Their target is enhancing the contrast of the palm region and the background. For example, in [28], the mobile phone’s built-in LED flash is utilized for palm segmentation. When the flash is turned on, the palm surface is much brighter than the background, because the palm is much closer to the camera than the background. The built-in auto-exposure control function of the image signal processor (ISP) on the camera chip will automatically decrease the exposure time to capture proper palm images; the palm region should fall into the proper grayscale range. As a result, the captured background is very dark.
After hardware and acquisition mode optimization, the palm region could be segmented by skin-color thresholding or the Otsu-based methods [30, 31, 36, 37]. Maximum-connected-region detection is useful to delete the background noise. After palm region image is obtained, there are four approaches to detect the valley points:
Competitive valley detection algorithm [35], which traverses each contour pixel by testing and comparing its neighbor pixels’ grayscale values. After palm segmentation, a binary palm image is obtained. The pixel on the palm contour is tested, taking the current contour pixel as the center point, and then 4, 8, and 16 testing points are placed around it, respectively. If in all the three tests, the pixels’ values meet the predefined conditions, a line will be drawn from the center point toward the non-hand region. If this line does not cross any hand region, this center pixel is considered as a valley location. In the same way, we can find the other candidate valley points.
Line-scan-based methods [4, 27, 33]. After rotation normalization, the pixels are tested through a row or a column according to the specific hand orientation. In the segmented hand image, the hand region pixels are set as white, and the background pixels are set as zero. So once the pixel value changes from white to black or from black to white, the keypoints of the finger contour are detected. Then, the finger valleys can be obtained by edge tracking (as is shown in Figure 2(c)).
Local-extremum-based methods [30, 33, 38, 39, 40, 41]. As is shown in Figure 2(a), by selecting a point as the start point, we can calculate the distances between the start point and all the palm contour points to generate a distance curve. Then, on the distance curve, the local maximum points correspond to fingertips; the local minimum points correspond to finger valley points. The finger valleys could be segmented from the palm contour around the detected valley points. Then the tangent line of the two finger valleys can be detected as the reference line.
Convex hull-based methods [28, 42, 43]. The minimum polygon is detected to encapsulate the palm contour. Generally, the fingertips are vertexes of the convex hull. Then, the finger valleys and the valley points could be obtained as the methods mentioned above.
Generally, after the four finger valley points are obtained, we should get to know whether this hand is left or right so that the two desired valley points can be determined. As to how to identify the left and right hand, literature [35] uses geometric rules of the coordinates; literature [30] utilizes the valley areas, generally, the valley area between thumbs and index finger is bigger than that between the little and ring finger; literature [41] trained a CNN to classify it; the method proposed in [44] does not need to know the left or right information.
Rotation normalization and scale normalization are two key problems lie in palmprint preprocessing. In [33], the authors analyzed the existing methods and provided their optimized solutions in palm width detection and center point generation. Rotation normalization aims to rotate all the palms to a standard direction. To determine the main direction of the palm, many methods have been proposed. In [40], principal component analysis (PCA) is utilized to estimate the rotation angle of the palm. In [17, 44], the author utilized the training set to learn a regression model which can map the landmarks’ coordinates to the palm’s main direction. Hence, after landmark detection, the palm direction can be obtained by the regression model. In [30], the line crossing the middle fingertip and the palm center point is treated as the palm’s center line; the palm’s orientation is estimated by the line’s slope.
The center point of the palm could be determined by different methods, such as the centroid of the palm region [40], the center point of the palm’s maximum inscribed circle [30], the point which reaches the maximum distance value after distance transform [45, 46], or the shift from the middle point of the palm width line detected based on heart line [33].
With the information of the hand rotation angle, the palm image could be normalized to the standard direction. Then, what we need to do is scale normalization, which means to determine the side length of the ROI. The work reported in [10, 17, 21, 30, 33, 47] utilized the palm width to determine the size of the ROI, while the work reported in [4, 27, 28, 29] utilized the length of the tangent line to determine the size of the ROI. In [30], the author found that big ROI performs better. Perhaps big regions can decrease the influences of the misalignment. Here, we provide two examples for better understanding the whole process of ROI extraction.
In [27], the center block (
In [30], the palm is segmented from the IR light palm image by the Otsu and maximum connected domain algorithms. Then the center point of the palm is determined by the maximum inscribed circle. Right of the center point, a start point could be set. Then, the two-phase keypoint detection method is utilized for detecting the finger tips and valleys. First, the distance curve is generated by the start point and the palm contour points, and the fingertip of the middle finger is then obtained. Based on the fingertip and the center point, a new reference point could be generated to replace the start point used in the first phase. Then, with the palm orientation information, a new distance curve is generated. The precise fingertips and valley points are finally detected by the extremum points of this new distance curve. The tangent line of the valleys around the two detected valley points are obtained (as is shown in Figure 2(a)); we scan the palm region using lines which are parallel with the tangent line. Each line provides a palm width value, and the final palm width is determined by their median value. Last, the ROI is derived according to the reference line and the palm width.
2.2.2 New-generation methods
The methods mentioned above are all based on traditional digital image processing techniques. Most of them just utilized the edge information of the palm. However, it is not sufficient and it leads the algorithms being sensitive to palm postures and background objects. In recent years, many new methods have been proposed, such as the active shape model (ASM)-based methods [48, 49], the active appearance model (AAM)-based methods [17, 29, 50], the regression tree-based methods [47], and the deep learning-based methods [24, 41]. The new-generation methods utilized both the edge and texture information to learn much more robust models to regress the landmarks. The main stages of palmprint ROI localization is detecting the palm region from the whole image, regressing the landmarks, determining the palm orientation and width, establishing local coordinate system, and computing the ROI locations.
In [17, 44], 25 hand landmarks are selected to form a shape, including 10 end points and 15 landmarks of the finger valleys and palm boundary. This shape convers the finger roots and the interdigital regions of the palm. By AAM algorithm, both the hand shape and the palm texture information are utilized, the shape and corresponding landmark points can automatically reshape itself to fit the real hand contour. To evaluate the localization performance, the authors proposed a modified point-to-curve distance and a margin width metric. Since the initial position of the shape model is critical to the regression performance, the fitting process is divided into two stages. At first five rotations and five scale factors are used to generate 25 initial shapes. After regression, only the shape models, which obtain the 15 optimal reconstruction errors, are passed to the second stage for fine-grained regression.
In [41], the authors proposed a CNN framework based on LeNet [51] to detect the finger valley points. The proposed network involves convolutional layers and fully connected layers; the output is a six-dimensional vector corresponding to the three valley points between fingers excluding the thumb. In their work, two neural networks are designed: one is for identifying whether the hand is left or right, and another is for landmark localization. According to their experiments, the first network can perfectly identify the hand being a left or right hand, and the landmark localization performance is better than the classical method which is based on Otsu segmentation and Zhang’s ROI localization algorithm [4].
In [24], based on VGG-16 [52], the authors designed an end-to-end neural network to localize the hand landmarks, generate the aligned ROI, and do feature extraction and recognition tasks at the same time. The hand region is extracted from the original Internet image, and then it is resized to
In [47], at first the palm position is detected by techniques of sliding window, histogram of oriented gradient (HOG) [53], and support vector machine (SVM). In the training set, 14 landmark points are determined and labeled manually. After landmark point regression, the reference line is established by the two valley points. The position of the center point and the side length of the ROI both are determined by the palm width.
However, there still exist some challenging problems in ROI localization waiting for better solutions:
Palm region segmentation under complex backgrounds
Keypoint detection on palms with closed or incomplete fingers
Keypoint detection on palms with big rotations
Left and right hand detection on palms having long thumbs
Palm scale (palm width) determination under various palm and finger poses in touchless scenarios
Compared with the new-generation methods which need to label the landmarks manually and train the regression model, the classical methods based on hardware and capture mode optimization and digital image processing algorithms are easier to use. They also can achieve high localization precisions due to its strict imaging conditions. Hence, in this chapter we still utilize the classical methods to extract the ROIs for different palmprint databases.
3. The method
3.1 The ROI localization method
As discussed above, palmprint localization involves four main stages: (1) palm region segmentation, (2) palm contour and finger valley landmark detection, (3) ROI coordinate computation, and (4) abnormal detection. The method used in this chapter is modified from [4, 30]. For palm segmentation, Otsu-based methods can achieve good results in IR image, but for visible light image, the segmentation results will be interfered by the shadow regions on the palm surface. To achieve high success rates of ROI localization, the skin-color based classifier is utilized to separate the palm region. The main work of landmark localization is detecting the finger valleys of the index-middle fingers and the ring-little fingers (as is shown in Figure 2(a) and (d)). For contact-based palmprint image, the palm position is restricted by the pegs, so the finger valleys can be easily localized by line-scan-based methods (as is shown in Figure 2(c)). The pixels are tested from up to down, once the value changes from white to black, point
3.2 Abnormal detection and iterative localization
The keypoint detection method described above is based on local vision. The algorithms know whether the predefined keypoints,

Figure 3.
Abnormal ROIs caused by complex background objects, difficult hand poses, and bad illuminations.
4. Experiments on different databases
In this section, the performance of the designed method is tested on different palmprint databases. More details and further updates can be found at [56]. The error rates of ROI localization are enumerated in Table 2. For IITD and COEP, the numbers in hard samples stand for PalmID_SampleID; for PolyU, the numbers stand for PalmID. After each experiment, the error cases are analyzed in detail.
Database | Error rate (%) | Hard samples | |
---|---|---|---|
IITD | Left | 0.38 | 0037_0006, 0107_0002, 0152_0003, 0181_0001, 0209_0003 |
Right | 0.31 | 0137_0001, 0140_0004, 0204_0003, 0204_0004 | |
COEP | 0.31 | 0103_0004, 0145_0003, 0159_0006, 0164_0002 | |
PolyU | 0.54 | 0004, 0039, 0073, 0109, 0127, 0187, 0223, 0224, 0245, 0246, 0259, 0271, 0273, 0287, 0293, 0307, 0311, 0328, 0379 |
Table 2.
ROI localization results of different databases.
4.1 ROI extraction for IITD Touchless Palmprint Database
Figure 4 shows the ROIs localized by the proposed method. Although the palm images in IITD Touchless Palmprint Database are captured in a black box, it is still difficult to segment the palm region. The strong light source and the small enclosure lead to light reflections and ray occlusions, which generate many bright regions in the background and dark regions on the palm surface. Hence, the brightness information is not sufficient for segmenting the palm region. The color information should be utilized. As is shown in Figure 5, we randomly cropped some palm skin and background image patches to build a training set for segmentation. A SVM-based binary classifier is learned from the training dataset; the segmented palm region can be seen in Figure 6. For palm skin patches, both the bright and dark regions are selected to learn a precise classification plane. In the color space, the palm can be easily segmented from the unicolor background. After palm region segmentation,

Figure 4.
ROI localization results on IITD. (a) Left hand; (b) right hand.

Figure 5.
Image patches cropped from the palm skin and the black box. (a) Patches of the palm surface; (b) patches of the black box.

Figure 6.
Images cannot be localized in IITD database. (a) and (c) are the finger detection failed samples; (b) and (d) are the ROI localization error samples.
4.2 ROI extraction for COEP Palmprint Database
The line-scan-based keypoint detection method is used for COEP. Figure 7 shows the ROI localization results on COEP database. Since the pegs used in their imaging setup may interfere the keypoint detection algorithm, we should delete them first. The pegs’ colors are green, blue, and yellow. After removing the bright yellow pixels in the image, we extract the red channel from the original RGB image to conduct ROI localization algorithm. In this way, the green and blue pegs can be automatically removed (as is shown in Figure 7). Results: as is shown in Figure 8, after ROI localization, four images failed to be correctly localized. All of the four error cases are caused by closed fingers.

Figure 7.
ROI localization result of the COEP database.

Figure 8.
Images cannot be localized of COEP database. (a)–(d), (e)–(h), and (i)–(l) are the original, binary, and ROI localization images, respectively. The image ID of (a), (e), and (i) is 0103_0004; the image ID of (b), (f), and (j) is 0145_0003; the image ID of (c), (g), and (k) is 0159_0006; and the image ID of (d), (h), and (l) is 0164_0002.
4.3 ROI extraction for PolyU Palmprint Database
For PolyU database, which contains 7752 images, the line-scan-based method is utilized to localize the ROI. At last, 42 samples failed to be localized. As is shown in Figure 9, most of them are caused by small finger valleys (palm pose) and unideal palm region segmentations (only grayscale information can be utilized). In Table 2, only the user ID is listed for the PolyU database.

Figure 9.
Hard samples of PolyU.
5. Conclusions
The motivation of this chapter is providing a uniform ROI localization method to extract standard ROI images. This is very meaningful for comparing the new proposed feature extraction and identification algorithms. This also can lower the threshold of the palmprint research for beginners, because preprocessing is very complex and time-consuming. The method used in this chapter is not for real-world applications; it is only a ROI extraction tool for the publicly available databases. According to this goal, the simple method, based on classical digital image processing and machine learning techniques, is selected in this chapter.
Acknowledgments
We would like to thank all the volunteers who contributed their palm images to establish these palmprint databases and thank all the organizations who shared their databases. This work is supported in part by the Shenzhen Institute of Artificial Intelligence and Robotics for Society.