Face of an individual is a biometric trait that can be used in computer-based automatic security system for identification or authentication of that individual. While recognizing a face through a machine, the main challenge is to accurately match the input human face with the face image of the same person already stored in the face-database of the system. Not only the computer scientists, but the neuroscientists and psychologists are also taking their interests in the field of development and improvement of face recognition. Numerous applications of it relate mainly to the field of security. Having so many applications of this interesting area, there are challenges as well as pros and cons of the systems. Face image of a subject is the basic input of any face recognition system. Face images may be of different types like visual, thermal, sketch and fused images. A face recognition system suffers from some typical problems. Say for example, visual images result in poor performance with illumination variations, such as indoor and outdoor lighting conditions, low lighting, poses, aging, disguise etc. So, the main aim is to tackle all these problems to give an accurate automatic face recognition. These problems can be solved using thermal images and also using fused images of visual and thermal images. The image produced by employing fusion method provides the combined information of both the visual and thermal images and thus provides more detailed and reliable information which helps in constructing more efficient face recognition system. Objective of this chapter is to introduce the role of different IR spectrums, their applications, some interesting critical observations, available thermal databases, review works, some experimental results on thermal faces as well as on fused faces of visual and thermal face images in face recognition field; and finally sorting their limitations out.
2. Thermal face recognition
Any typical face image is a complex pattern consisting of hair, forehead, eyebrow, eyes, nose, ears, cheeks, mouth, lips, philtrum, teeth, skin, and chin. Human face has other additional features like expression, appearance, adornments, beard, moustache etc. The face is the feature which best distinguishes a person, and there are "special" regions of the human brain, such as the fusiform face area (FFA), which when get damaged prevent the recognition of the faces of even intimate family members. The patterns of specific organs such as the eyes or parts thereof are used in biometric identification to uniquely identify individuals.
Thermal face recognition deals with the face recognition system that takes thermal face as an input. In preceding description, the concept of thermal images will be made clearer. Thermal human face images are generated due to the body heat pattern of the human being. Thermal Infra-Red (IR) imagery is independent of ambient lighting conditions as the thermal IR sensors only capture the heat pattern emitted by the object. Different objects emit different range of Infra-red energy according to their temperature and characteristics. The range of human face and body temperature nearly same and quite uniform, varying from 35.5°C to 37.5°C providing a consistent thermal signature. The thermal patterns of faces are derived primarily from the pattern of superficial blood vessels under the skin. The vein and tissue structure of the face is unique for each person and, therefore, the IR images are also unique. Fig. 1 shows a thermal image corresponding to its visual one.
In Latin ‘infra’ means "below" and hence the name 'Infrared' means below red. ‘Red’ is the color of the longest wavelengths of visible light. Infrared light has a longer wavelength (and so a lower frequency) than that of red light visible to humans, hence the literal meaning of below red.
'Infrared' (IR) light is electromagnetic radiation with a wavelength between 0.7 and 300 micrometers, which equates to a frequency range between approximately 1 and 430 THz. IR wavelengths are longer than that of visible light, but shorter than that of terahertz radiation microwaves.
2.1.1. Infrared bands and thermal spectrum
Objects generally emit infrared radiation across a spectrum of wavelengths, but only a specific region of the spectrum is of interest because sensors are usually designed only to collect radiation within a specific bandwidth. As a result, the infrared band is often subdivided into smaller sections.
The International Commission on Illumination (CIE) recommended the division of infrared radiation into three bands namely, IR-A that ranges from 700 nm–1400 nm (0.7 µm – 1.4 µm), IR-B that ranges from 1400 nm–3000 nm (1.4 µm – 3 µm) and IR-C that ranges from 3000 nm–1 mm (3 µm – 1000 µm).
A commonly used sub-division scheme can be given as follows:
Near-infrared (NIR, IR-A DIN): This is of 0.7-1.0 µm in wavelength, defined by the water absorption, and commonly used in fiber optic telecommunication because of low attenuation losses in the SiO2 glass (silica) medium. Image intensifiers are sensitive to this area of the spectrum. Examples include night vision devices such as night vision camera.
Short-wavelength infrared (SWIR, IR-B DIN): This is of 1-3 µm. Water absorption increases significantly at 1,450 nm. The 1,530 to 1,560 nm range is the dominant spectral region for long-distance telecommunications.
Mid-wavelength infrared (MWIR, IR-C DIN) or Intermediate Infrared (IIR): It is of 3-5 µm. In guided missile technology the 3-5 µm portion of this band is the atmospheric window in which the homing heads of passive IR 'heat seeking' missiles are designed to work, homing on to the IR signature of the target aircraft, typically the jet engine exhaust plume.
Long-wavelength infrared (LWIR, IR-C DIN): This infrared radiation band is of 8–14 µm. This is the "thermal imaging" region in which sensors can obtain a completely passive picture of the outside world based on thermal emissions only and require no external light or thermal source such as the sun, moon or infrared illuminator. Forward-looking infrared (FLIR) systems use this area of the spectrum. Sometimes it is also called "far infrared“.
Very Long-wave infrared (VLWIR): This is of 14 - 1,000 µm.
NIR and SWIR is sometimes called "reflected infrared" while MWIR and LWIR is sometimes referred to as "thermal infrared". Due to the nature of the blackbody radiation curves, typical 'hot' objects, such as exhaust pipes, often appear brighter in the MW compared to the same object viewed in the LW.
Now, we can summarize the wavelength ranges of different infrared spectrums as in Table 1.
Developments in infrared technology (camera) over the last decade have given computer vision researchers a whole new diversity of imaging options, particularly in the infrared spectrum. Conventional video cameras use photosensitive silicon that is typically able to measure energy at electromagnetic wavelengths from 0.4µm to just over 1.0µm. Multiple technologies are currently available, with lessening cost and increasing performance, which are capable of image measurement in different regions of the infrared spectrum, as shown in Fig. 2.
At wavelengths of 3 µm and longer imaged radiation from objects become significantly emissive due to temperature, and is hence generally termed as thermal infrared. The thermal infrared spectrum is divided into two primary spectra (Wolff et al., 2006) the MWIR and LWIR. Between these spectra lies a strong atmospheric absorption band between approximately 5 and 8 µm wavelength, where imaging becomes extremely difficult due to nearly complete opaqueness of air. The range beyond 14µm is termed the very long-wave infrared (VLWIR) and although in recent years it has received increased attention. The amount of emitted radiation depends on both the temperature and the emissivity of the material. Emissivity in the thermal infrared is conversely analogous to the notion of reflective albedo used in the computer vision literature (Horn, 1977, Horn et al., 1979). For instance, a Lambertian reflector can appear white or grey depending on its efficiency for reflecting light energy. The more efficient it is in reflecting energy (more reﬂectance albedo) the less efficient it is in thermally emitting energy respective to its temperature (less emissitivity). Objects with perfect emissivity of 1.0 are completely black. Many materials that are poor absorbers transmit most light energy while reflecting only a small portion. This applies to a variety of different types of glass and plastics in the visible spectrum.
Different types of thermal imaging are shown in Fig. 3.
2.1.2. Different types of thermal spectrums
Thermal Face Spectrums are captured mainly under three classes and those are
SWIR (Short-wave Infrared): Short-wave Infrared ranges from 1-3 µm (micro-meter/micron). SWIR has its own characteristics which are suitable for human face images. These characteristics are:
Light in the SWIR band is not visible to the human eye: The visible spectrum ranges from the wavelengths of 0.4 microns (blue, nearly ultraviolet to the eye) to 0.7 microns (deep red). Wavelengths longer than visible wavelengths can only be seen by dedicated sensors, such as InGaAs. Though shortwave infrared lights interact with the objects similar way the light of visual wavelength does, human eyes cannot track the object in short wavelength. SWIR light is reflective light; it bounces off objects much like visible light. As a result of its reflective nature, SWIR light has shadows and contrast in its imagery. Images from an InGaAs camera are comparable to visible images in resolution and detail; however, SWIR images are not in color. This makes objects easily recognizable and yields one of the tactical advantages of SWIR, namely, object or individual identification (http://www.sensorsinc/whyswir).
InGaAs sensors can be made extremely sensitive: literally counting individual photons. Thus, when built as focal plane arrays with thousands or millions of tiny point sensors, or sensor pixels, SWIR cameras will work in very dark conditions (http://www.sensorsinc/whyswir).
SWIR Works at Night: An atmospheric phenomenon called night sky radiance emits five to seven times more illumination than starlight, nearly all of it in the SWIR wavelengths. So, with a SWIR camera and this night radiance - often called nightglow - we can "see" objects with great clarity on moonless nights and share these images across networks as no other imaging device can do. Fig. 4 describes the clarity even at night can be obtained using SWIR cameras.
Low Power: InGaAs cameras can be small and use very little power, but give good results. InGaAs found early applications in the telecom industry as it is sensitive to the light used in long distance fiber optics communications, usually around 1550 nm.
Ability to Image through Glass: One major benefit of SWIR imaging that is unmatched by other technologies is the ability to image through glass. So, in short, SWIR images are basically used because of high sensitivity, high resolution, seeing in the light of night glow or night sky radiance, day-to-night imaging, covert illumination, able to see covert lasers and beacons, no cryogenic cooling required, conventional and low-cost visible spectrum lenses, small size and low power.
These foresaid characteristics have made SWIR useful even in the area of face recognition. Detecting Disguises at Border and Immigration Security Checkpoints Using Short Wave Infrared (SWIR) Cameras is one of such applications in Homeland-Defense (http://sensorsinc/border_security1). SWIR has a unique aspect of identifying artificial and natural materials. This helps to detect disguise, artificial hairs appear much darker than the original one, and sometimes it comes almost black. Fig. 5 shows how does an actor seen in disguise when he is captured under SWIR.
Translating this to a border crossing or checkpoint situation, one can assume that anyone wearing a disguise as they approach the border has some form of criminal or even hostile intent. Employed clandestinely, SWIR imaging can add a very valuable layer of protection in the defense of the homeland.
MWIR (Middle-Wave Infrared). MWIR ranges from 3-5 µm (micro meter / micron). For night - vision applications, the SWIR imaging with InGaAs technology is enhanced with thermal imaging cameras for MWIR and LWIR in the form of uncooled microbolometers or cooled infrared cameras. Their thermal detectors only show the presence of warm objects against a cooler background. In combination with thermal images, SWIR cameras thus simplify the identification of objects which in the thermal image alone are more difficult to recognize. LWIR is being discussed in the following point.
LWIR (Long-Wave Infrared). LWIR ranges from 8-14 µm (micro meter / micron). The sensor elements of microbolometer cameras for LWIR are made up of IR – absorbing conductors or semiconductors, whose radiation - dependent resistance is measured. Because polysilicon is also suitable as an absorber, they can also be made from polysilicon as MEMS and combined with evaluation circuits in CMOS technology. Fig. 6 shows an LWIR image.
2.1.3. Best thermal spectrum for face recognition purpose
The spectral distribution of energy emitted by an object is simply the product of the Planck distribution for a given temperature, with the emissivity of the object as function of wavelength (Siegal & Howell, 1981). In the vicinity of human body temperature (37 °C), the Planck distribution has a maximum in the LWIR around 9µm, and is approximately one-sixth of this maximum in the MWIR. The emissivity of human skin in the MWIR is at least 0.91, and at least 0.97 in the LWIR. Therefore, face recognition in the thermal infrared favors the LWIR, since LWIR emission is much higher than that in the MWIR.
Thermal IR and particularly Long Wave Infra-Red (LWIR) imagery is independent of illumination since thermal IR sensors operating at particular wavelength bands measure heat energy emitted and not the light reflected from the objects. More importantly IR energy can be viewed in any light conditions and is less subject to scattering and absorption by smoke or dust than visible light. Hence thermal imaging has great advantages in face recognition under low illumination conditions and even in total darkness (without the need for IR illumination), where visual face recognition techniques fail.
3D position variations naturally give rise to 2D image variations (Friedrich & Yeshurun, 2002). IR based face recognition is more invariant than CCD based one under various conditions, specifically varying head 3D orientation and facial expressions. Modification of facial expressions and head orientation cause direct 3D structural changes, as well as changes of shadow contours in CCD images that deteriorate the accuracy of any classification method. In an IR image this effect is greatly reduced.
Anatomical features of faces, useful for identification, can be measured at a distance using passive IR sensor technology with or without the cooperation of the subject (Gupta & Majumdar). The thermal infrared (IR) spectrum comprises of mid-wave infrared (MWIR), and long-wave infrared (LWIR), all longer than the visible spectrum.
Accelerated developments in camera technology over the last decade have given computer vision researchers a whole new diversity of imaging options, particularly in the infrared spectrum. Conventional video cameras use photosensitive silicon that is typically able to measure energy at electromagnetic wavelengths from 0.4µm to just over 1.0µm. Multiple technologies are currently available, with dwindling cost and increasing performance which are capable of image measurement in different regions of the infrared spectrum, as shown in Fig. 7 and Fig. 8, which shows the different appearances of a human face in the visible, shortwave infrared (SWIR), midwave infrared (MWIR) and long wave infrared (LWIR) spectra. Although in the infrared, the near-infrared (NIR) and SWIR spectra are still reﬂective and differences in appearance between the visible, NIR and SWIR are due to reﬂective material properties. Both NIR and SWIR have been found to have advantages over imaging in the visible for face detection (Dowdall et al., 2002) and detecting disguise (Pavlidis & Symosek, 2000).
Thermal infrared imaging for face recognition first used MWIR platinum silicide detectors in the early 1990s (Prokoski, 1992). At that time, cooled LWIR technology was very expensive. By the late 1990s, uncooled micro bolometer imaging technology in the LWIR became more accessible and affordable, enabling wider experimental applications in this regime. At that time, cooled MWIR technology was about ten times more sensitive than uncooled micro bolometer LWIR technology, and even though faces are more emissive in the LWIR, in the late 1990s MWIR could still discern more image detail of the human face. At present, uncooled micro bolometer LWIR technology coming off the assembly lines is rapidly approaching one-half of the sensitivity of cooled MWIR. For face recognition in the thermal infrared, this is a turning point as for the first time the most appropriate thermal infrared imaging technology (i.e. LWIR) for studying human faces is also the most affordable.
2.2. Advantages of thermal face over visual face
Visual images are considered to be the best in some cases like extracting and locating facial features easily. Another advantage of visual image is that the visual cameras are less
expensive. But Visual images have some problems (Kong et al., 2005) with themselves:
Visual images results in poor performance with illumination variations, such as in indoor and outdoor lighting conditions.
Again it is not efficient enough to distinguish different facial expressions.
It is difficult to segment out faces from cluttered scene.
Visual images are useless in very low lighting.
Visual images are unable to detect disguise.
The challenges are even more profound when one considers the large variations in the visual stimulus due to
viewing directions or poses,
disguises such as facial hair, glasses, or cosmetics.
Unlike using the visible spectrum, recognition of faces using different multi-spectral imaging modalities, particularly infrared (IR) imaging sensors (Yoshitomi et al., 1997; Prokoski, 2000; Selinger & Socolinsky, 2001; Wolff et al., 2006; Wolff et al., 2001; Heo et al., 2004) has become an area of growing interest. So, to solve different challenges faced while using visual images in face recognition systems, thermal images are used because of (Kong et al., 2005):
Face (and skin) detection, location, and segmentation are easier when using thermal images.
Within-class variance smaller.
Nearly invariant to illumination changes and facial expressions.
Works even in total darkness.
Useful for detecting disguises.
Fig. 9 shows that thermal face images of a human being have no effect of illumination.
2.3. Some critical observations on thermal face
However, thermal imaging needs to solve some challenging problems, which are discussed next.
2.3.1. Identical twins
Though identical twins have some different thermal patterns, there are some exceptions too. The twins’ images in Fig. 10 are not necessarily substantially different from each other.
2.3.2. Exhale-inhale effect
High temporal frequency thermal variation is associated with breathing. The nose or mouth appears cooler as the subject is inhaling and warmer as he or she exhales, since exhaled air is at core body temperature, which is several degrees warmer than skin temperature. Fig. 11 shows the different thermal images of same person while exhaling and inhaling.
2.3.3. Metabolism effect
Symptoms such as alertness and anxiety can be used as a biometric, which is difficult to conceal as redistribution of blood flow in blood vessels causes abrupt changes in the local skin temperature. Thermal signatures can be changed significantly according to different body temperatures caused by physical exercise or ambient temperatures (Heo et al., 2005).
Fig. 11 shows comparable variability within data collected with LWIR sensor. Each column shows images acquired in different sessions. It is clear that thermal emission patterns around the eyes, nose and mouth are rather different in different sessions. Such variations can be induced by changing environmental conditions. For example, exposed to cold or wind, capillary vessels at the surface of the skin contract, reducing the effective blood flow and thereby the surface temperature of the face. Also, when a subject transitions from a cold outdoor environment to a warm indoor one, a reverse process occurs, whereby capillaries dilate suddenly flushing the skin with warm blood in the body’s effort to regain normal temperature.
Additional fluctuations in thermal appearance are unrelated to ambient conditions, but are rather related to the subject’s metabolism. Vigorous physical activity, consumption of food, alcohol or caffeine may all affect the thermal appearance of a subject’s face.
2.3.4. Effects of using glasses
Thermal images of a subject wearing eyeglasses may lose information around the eyes since glass blocks a large portion of thermal energy; in fact most of the thermal energy is blocked. Thermal imaging has difficulty in recognizing people inside a moving vehicle (because of speed, glass).
2.3.5. Liveness solution
Spoofing attack (or copy attack) is a fatal threat for biometric authentication systems. Liveness detection, which aims at recognition of human physiological activities as the liveness indicator to prevent spoofing attack, is becoming a very active topic in field of fingerprint recognition and iris recognition. In face recognition community, although numerous recognition approaches have been presented, the effort on anti-spoofing is still very limited. Liveness detection methods allow differentiating live human characteristics from characteristics coming from other sources. Spoofing, now-a-days has become a big threat for biometrics, especially in the field of face recognition. Therefore anti-spoof problem should be well solved before face recognition could widely be applied in our life. If we are thinking of keeping image as a part of biometric identity then it is very easy for someone to spoof an image and retrieve sensitive information from somebody’s account and retrieve their valuable information by hacking their account. There are various methods to spoof an image. Photo attack is the cheapest and easiest spoofing approach, since one's facial image is usually very easily available in the public. Video spoofing is another big threat to face recognition systems, because it is very similar to live face and can be shot in front of legal user’s face by a needle camera. It has many physiological clues that photo does not have, such as head movement, facial expression, blinking etc.
Thermal images can be a solution to the spoofing problem and detecting live faces as it captures only the heat emitted and so the thermal images generated from the emitted heat by a photograph or a video will be totally different from the thermal image of an original human face.
2.4. Drawbacks of thermal images
Though thermal images have found to be useful in accurate recognition of human face; there are some limitations of them.
Redistribution of blood flow due to alertness and anxiety causes abrupt changes in the local skin temperature.
Thermal calibration is mandatory in ambient temperature or activity level may change thermal characteristics.
Energetic physical activity, consumption of food, alcohol, caffeine etc. may also affect the thermal characteristics.
While breathing exhaling results big change in skin temperature.
Glasses block most of thermal energy.
Not appropriate for recognition of vehicle occupants (because of speed, glass).
Thermal images have low resolution.
Thermal cameras are expensive.
2.5. Problems of face recognition solvable by thermal images
In contrast to visual face images thermal face images are having certain characteristics by which they are able to handle some difficulties mentioned earlier. But they do have some limitations like pose variation; aging etc can’t be solved by thermal images. Table 2 describes the problems in face recognition that can or cannot be solved using thermal images.
2.6. Thermal face image databases
There are only two publicly available thermal face image databases. These are IRIS (Imaging, Robotics and Intelligent System) Thermal/Visible Face Database and Terravic Facial IR Database. Among them IRIS database comprises of almost all the features that should be present in a standard face database. A short overview of these two databases is given in Table 3.
|Name of the Database||No. of Subjects||Pose||Illumination||Facial Expression||Time|
|IRIS Thermal/Visible Database||30||11||6||3||1|
|Terravic Facial Infrared Database||20||3||---||1||1|
These two databases are publicly available benchmark dataset for testing and evaluating novel and state-of-the-art thermal face recognition algorithms. The benchmark contains videos and images recorded in and beyond the visible spectrum and are available for free to all researchers in the international computer vision communities. It also allows a large spectrum of IEEE and SPIE vision conference and workshop participants to explore the benefits of the non-visible spectrum in real-world applications, contribute to the OTCBVS workshop series, and boost this research field significantly.
2.6.1. IRIS (Imaging, Robotics and Intelligent System) thermal/visual database
In the IRIS database unregistered thermal and visible face images are acquired simultaneously under variable illuminations, expressions, and poses. Total no. of 30 individuals of RGB color image type with Exp1 (surprised), Exp2 (laughing) and Exp3 (Anger) are available. Resolution of each image is 320 x 240. Illumination types available in this database are left light on, right light on, both lights on, dark room, left and right lights off with varying poses like left, right, mid, mid-left, mid-right. Two different sensors are used to capture this database. One is Thermal - Raytheon Palm-IR-Pro and another is Visible - Panasonic WV-CP234. Table 4 is furnished with the IRIS database (http://cse.ohio-state.edu/otcbvs-bench/Data/02/download) overview.
|No. of Subjects||Conditions||Image Resolution||Total Number of Images|
|30||Facial Expression||3||320 x 240||Thermal||1529|
Setup of the Camera. Fig. 13 shows the camera set up used for preparing the IRIS Thermal/Visible database.
It contains images of 30 individuals (28 men and 2 women).
The imaging and recorded condition (camera parameters, illumination setting, camera distance).
Total 176-250 images per person and 11 images per rotation (poses for each expression and each illumination) are captured.
Database images having the format of bmp color images are 320 x 240 pixels in size.
The subjects were recorded in 3 different expressions Exp-1 (Surprised), Exp-2 (laughing), Exp-3 (Anger) and 5 different illumination Lon (left light on), Ron (right light on), 2on (both lights on), dark (dark room), off (left and right lights off) with varying poses.
Size of this database is 1.83 GB.
Variable numbers of images are available per class.
Total 3058 images are available; 1529 images are thermal and other images are visual.
All the classes don’t contain each type of illumination.
This database has disguise faces too. Samples of images with different facial expression and different illumination conditions, and different disguise faces are given in Fig. 14, Fig. 15, and Fig. 16, respectively.
2.6.2. Terravic facial infrared database
The Terravic Facial Infrared database contains total no. of 20 classes (19 men and 1 woman) of 8-bit gray scale JPEG thermal faces. Size of the database is 298MB and images with different rotations are left, right and frontal face images also available with different items like glass and hat. Table 5 is furnished with the Terravic Facial Infrared database (http://cse.ohio-state.edu/otcbvs-bench/Data/02/download) overview.
Sensors Used. Raytheon L-3 Thermal-Eye 2000AS.
In this database, they provide total 20 classes.
They use different poses for this database like front, left, right, indoor, outdoor; glasses, hat, both.
Type of that database is thermal in JPEG format.
Size of that database is 298 MB.
Total no. of images is 21,308.
In Fig. 17 some image samples of Terravic Facial Infrared database are shown.
2.7. Some review work on thermal face recognition
Over the last few years, many researchers have investigated the use of thermal infrared face images for person identification to tackle illumination variation, facial hair, hairstyle etc. (Chen et al. 2003; Buddharaju et al., 2004; Socolinsky & Selinger, 2004, Singh et al., 2004, Buddharaju et al., 2007).
It is found that while face recognition using different expressions with visible-light imagery outperforms that with thermal imagery when both gallery and probe images are acquired indoors, if the probe image or the gallery and probe images are acquired outdoors, then it appears that the performance possible with IR can exceed that with visible light. IR imagery represents a feasible substitute to visible imaging in the search for a robust and practical identiﬁcation system. Leonardo Trujillo et.al. proposed an unsupervised local and global feature extraction paradigm to the problem of facial expression using thermal images (Trujillo et al., 2005).
First they have localized facial features by novel interest point detection and clustering approach and after that they apply PCA for feature extraction and at last they use SVD for facial expression classification. For facial expression recognition they use the IRIS dataset. Their experimental results show that their FER system clearly degrades when classifying the “happy” expression of the dataset. Xin Chen et al. developed a face recognition technique with PCA. They used PCA to study the comparisons and combination of infrared and visible images to the effects of lighting, facial expression change and the time difference between gallery and probe images (Chen et al., 2003).
The techniques developed by Socolinsky & Selinger (Socolinsky & Selinger, 2004) show performance statistics for outdoor face recognition and recognition across multiple sessions. A few experimental results with thermal images in face recognition are being recorded in Table 6. All the result support the conclusion that face recognition performance with thermal infrared imagery is stable over multiple sessions.
3. Fused images
There is another type of face image used in face recognition field which is known as fused image. Fused image comprises of more than one image. So, the resulted fused image is more informative than any of the individual image. In face recognition, sometimes visual images are found to be more helpful than thermal images. For example, problem of variation in temperature can be solved by visual images which cannot be in case of thermals. Again, thermal images are best to detect liveliness. So, we can apply the concept of fusion in face recognition to get better result of person recognition. So, fusion of both thermal and visual images will generate a new fused image that will store the information of both thermal and visual faces. Fig. 18 shows a pictorial example of a fused image.
Image fusion methods mainly fall under two categories viz. spatial domain fusion and transform domain fusion. Many scientists have been involved in research and development in the area of data fusion for over a decade.
3.1. Different types of fusion techniques
Over a decade, researchers are working on image fusion. Generally, in face recognition three types of fusion techniques can be used. Those are:
3.1.1. Feature level fusion
Before merging the features of the source data, all the features are required to be merged together. Fusion at feature level involves the integration of feature sets corresponding to different sensors. These feature vectors are often fused to form joint feature vectors from which the classification is made. The first hurdle towards feature level fusion is effective feature detection. Once features are selected, the role of feature-level fusion is to establish boundaries in feature space and separate patterns belonging to different classes. Thus two main issues are involved: feature detection and the use of distance metrics for clustering. In general, signal features can be classified into the following 3 categories:
Time Domain Features that describe waveform characteristics (slopes, amplitude values, maxima/minima and zero crossing rates) and statistics (mean, standard deviation, energy, kurtosis, etc).
Frequency Domain Features (periodic structures, Fourier coefficients, spectral density)
Hybrid Features that cover both time and frequency domains (Wavelet representations, Wigner-Ville distributions, etc.)
Since the feature set contains richer information about the raw biometric data than the match score or the final decision, integration at this level is expected to provide better recognition results. However, fusion at this level is difficult to achieve in practice because of the following reasons:
The feature sets of multiple modalities may be incompatible. For example, minutiae set of finger prints and eigen-coefficients of faces.
The relationship between the feature spaces of different biometric systems may not be known.
Concatenating two feature vectors may result in a feature vector with very large dimensionality leading to the ‘curse of dimensionality’ problem.
3.1.2. Decision level fusion
Decision level fusion combines the results from multiple algorithms to yield a final fused decision. Decision level fusion is generally based on a joint declaration of multiple single source results (or decisions) to achieve an improved classification or event detection. At the decision level, prior knowledge and domain specific information can also be incorporated. Widely used methods for decision level fusion include the following:
Bayesian inference (Information theory, inference and learning algorithms, book by David Mackay).
Classical inference; computing a joint probability given an assumed hypothesis usually using Maximum A Posteriori (MAP) or maximum likelihood decision rules.
Decision-level fusion schemes can be broadly categorized according to the type of information the decision makers output. Abstract-level fusion algorithms are used to fuse the individual experts that produce only class labels. In this category, plurality voting is the most commonly used one that just outputs the class label having the highest vote.
3.1.3. Pixel/data level fusion
Pixel level fusion is the combination of the raw data from multiple source images into a single image. It combines information into a single image from a set of image sources using pixel, feature or decision level techniques.
The task of interpreting images, either visual images alone or thermal images alone, is an unconstraint problem. Thermal image can at best yield estimates of surface temperature that, in general, is not specific in distinguishing between object classes. The features extracted from visual intensity images also lack the specificity required for uniquely determining the identity of the imaged object.
The interpretation of each type of image thus leads to ambiguous inferences about the nature of the objects in the scene. The use of thermal data gathered by an infrared camera, along with the visual image, is seen as a way of resolving some of these ambiguities. On the other hand, thermal images are obtained by sensing radiation in the infrared spectrum. The radiation sensed is either emitted by an object at a non-zero absolute temperature, or reflected by it. The mechanisms that produce thermal and visual images are different from each other. Thermal image produced by an object’s surface can be interpreted to identify these mechanisms. Thus, thermal images can provide information about the object being imaged which is not available from a visual image (Yin & Malcolm, 2000).
A great deal of effort has been expended on automated scene analysis using visual images, and some work has been done in recognizing objects in a scene using infrared images. However, there has been little effort on interpreting thermal images of outdoor scenes based on a study of the mechanism that gives rise to the differences in the thermal behavior of object surfaces in the scene. Also, nor has been any effort been made to integrate information extracted from the two modalities of imaging.
The process of image fusion may be where pixel data of 70% of visual image and 30% of thermal image of same class or same image is brought together into a common operating image or now commonly referred to as a Common Relevant Operating Picture (CROP) (Hughes, 2006). This implies that an additional degree of filtering and intelligence is to be applied to the pixel streams to present pertinent information to the user. So image pixel fusion has the capacity to enable seamless working in a heterogeneous work environment with more complex data. For accurate and effective face recognition we require more informative images. Image by one source may lack some information which might be available in images by other source (i.e. visual). So if it becomes possible to combine the features of both the face images then efficient, robust, and accurate face recognition can be developed.
Ideally, the fusion of common pixels can be done by pixel-wise weighted summation of visual and thermal images (Horn & Sjoberg, 1979), as below:
where, F(x, y) is a fused output of a visual image, V(x, y), and a thermal image, T(x, y), while a(x, y) and b(x, y) represent the weighting factors for visual and thermal images respectively.
Being inspired by the characteristics of thermal and fused images that are applicable for solving different problems of face recognition, we have been using thermal and fused images in our so far research-works. In this section some important observations and outcome of these research works are briefly discussed.
To handle the challenges of face recognition that include pose variations, changes in facial expression, partial occlusions, variations in illumination, rotation through different angles, change in scale etc., two techniques have been applied. In the first method log-polar transformation is applied to the fused images, which are obtained after fusion of visual and thermal images, whereas in second method fusion is applied on log-polar transformed individual visual and thermal images. Log-polar transformed images are capable of handling complicacies introduced by scaling and rotation. The second method has shown better performance, which is 95.71% (maximum) and on an average 93.81% as correct recognition rate on Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database (Bhowmik et al, April 11 – 15, 2011).
In another experiment, the aim was to recognize thermal face images for face recognition using line features and Radial Basis Function (RBF) neural network as classifier for them. The proposed method works in three different steps. In the first step, line features are extracted from thermal polar images and feature vectors are constructed using these lines. In the second step, feature vectors thus obtained are passed through eigenspace projection for the dimensionality reduction of feature vectors. Finally, the images projected into eigenspace are classified using a Radial Basis Function (RBF) neural network.
Experimental results of verification and identification is performed in the OTCBVS database and the maximum success rate is 100% whereas on an average it is 94.44% (Bhowmik et al, April 25 – 29, 2011a).
For achieving better recognition rate, an image fusion technique based on weighted average of Daubechies wavelet transform (db2) coefficients from visual face image and their corresponding thermal images have been conducted. Both PCA and ICA have separately been applied for dimension reduction (Bhowmik et al, April 25 – 29, 2011b). The resulted fused images have then been classified using multi-layer perceptron (MLP). Experimental results show that the performance of ICA architecture-I is better than the other two approaches i.e. PCA and ICA-II. The average success rate for PCA, ICA-I and ICA-II are 91.13%, 94.44% and 89.72% respectively. However, approaches presented here achieves maximum success rate of 100% in some cases, especially in case of varying illumination over the IRIS Thermal/Visual Face Database.
Thermal images minimize the affect of illumination changes and occlusion due to moustache, beards, adornments etc. The training and testing sets of thermal images are registered in a polar coordinate that is capable to handle complicacies introduced by scaling and rotation and then polar images are projected into eigenspace and finally classified using a multi-layer perceptron. The results improve significantly in the verification and identification performance and the success rate is 97.05% over the OTCBVS database (Bhowmik et al., 2008).
Fused images generated from visual and thermal ones are projected into eigenspace and finally classified using a radial basis function neural network that is useful to recognize unknown individuals with a maximum success rate of 96% of the Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database (Bhowmik et al., 2009).
Biometric is a unique identity for each person where security as well as authentication is concerned. Based on machine learning algorithms, a general framework is designed to show the effectiveness of a biometric system using different levels of pixel fusion. One of the most popular feature extraction algorithms along with multilayer perceptron (MLP) has been used for classification purpose over the OTCBVS database that leads to an optimal recognition result (Bhowmik et al., 2011).
When only one-sided semi profile thermal images of an individual are available, a mosaicing technique can be applied to build an apparent 2-D front profile view of that person (Majumder et al., 2011). The available semi-profile image is converted into a mirror image which is similar to the opposite half semi-profile thermal image of that person. Human face is symmetric. So, simple concatenation of these two images (one is the original side view image and another is the mirror or opposite side view image) produces an apparent 2-D front face thermal mosaiced image of that person. This mosaiced image can further be imputed in any simple thermal face recognition system.
As a part of future work of the foresaid thermal face mosaicing, the mosaiced faces are experimented for face recognition purpose where we are checking the similarity between mirror face images and normal face images. Here thermal and visual faces are fused together and then mosaicing is applied to construct apparent or complete profile view of an individual and then these mosaiced faces are classified using support vector machine classifier.