Open access peer-reviewed chapter - ONLINE FIRST

Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images and Deep Learning

Written By

Suganthi Srinivasan, Rakesh Rajegowda and Eshwar Udhayakumar

Submitted: 05 October 2023 Reviewed: 11 October 2023 Published: 07 December 2023

DOI: 10.5772/intechopen.1003683

Digital Image Processing - Latest Advances and Applications IntechOpen
Digital Image Processing - Latest Advances and Applications Edited by Francisco Cuevas

From the Edited Volume

Digital Image Processing - Latest Advances and Applications [Working Title]

Dr. Francisco Javier Cuevas

Chapter metrics overview

75 Chapter Downloads

View Full Metrics

Abstract

Perception system plays an important role in Advanced driver assistance systems (ADAS) & Autonomous vehicles (AV) to understand the surrounding environment and further navigation. It is highly challenging to achieve the accurate perception of ego vehicle mimicking human vision. The available ADAS and AV solutions could able to perceive the environment to some extent using multiple sensors like Lidars, Radars and Cameras. National Highway Traffic Safety Administration Crash reports of ADAS and AV systems shows that the complete autonomy is challenging to achieve using the existing sensor suite. Particularly, in extreme weather, low light and night scenarios, there is a need for additional perception sensors. Infrared camera seems to be one of the potential sensors to address such extreme and corner cases. This chapter aimed to discuss the advantage of adding infrared sensors to perceive the environment accurately. The advancements in deep learning approaches further leverages to enhance ADAS features. Also, the limitations of current sensors, the need for infrared sensors and technology, artificial intelligence and current research focus using IR images are discussed in detail. Literature shows that by adding IR sensor to existing sensor suite may lead a way to achieve level 3 and above autonomous driving precisely.

Keywords

  • perception
  • ADAS
  • autonomous vehicle
  • visible image
  • infrared image
  • automotive sensors
  • deep learning
  • object detection

1. Introduction

Recent development in sensor technology, processors and computer vision algorithms used for processing the captured data lead to rapid development of perception solutions for Advanced driver assistance systems (ADAS) and autonomous vehicles (AV). The drive for autonomous vehicles was triggered due to the DARPA grand challenge. The US Armed Forces and Defense Advanced Research Projects Agency (DARPA) conducted a Robotic challenge towards the development of unmanned autonomous systems which could eventually replace human drivers in combat zones and hazardous areas without remote operators. Many DARPA Grand Challenges were organized in order to come up with the technology for fully autonomous ground vehicles with collaboration across diverse fields. The first challenge took place in 2004 when 15 self-driving vehicles were competing to navigate around 228 km across the desert in Primm, Nevada. None of the team was able to succeed due to the technological hurdles involved. The second event was held in 2005 in southern Nevada were 5 teams competed to navigate 212 km. With limited and better technology, this time, Stanford University’s Stanley managed to complete the distance and won the prize money. In 2007, third event took place in an urban environment commonly known as DARPA Urban Challenge. Here, the team need to showcase the autonomous driving capability in the driving traffic scenarios and needs to perform complete maneuvering that includes braking and parking events. The boss vehicle from Carnegie Mellon University won the first prize and the Junior vehicle from Standford University claimed the second prize [1]. Since after DARPA, the grand challenges for autonomous driving solutions, the accurate perception and navigation of the vehicle more autonomously became one of the hottest fields for research and industry. There are 6 levels of Society of Automotive Engineers (SAE) international standards to achieve fully autonomous driving from ADAS as shown in Figure 1. Upto level 3, the presence of driver is mandatory to take control of the vehicle whenever needed. Levels 4 and 5 allow for fully autonomous driving with and without driver, respectively [3].

Figure 1.

Levels of automation according to SAE standards [2].

Worldwide, major accidents were reported due to human error which might have resulted in fatal incidents. Considering the safety of the drivers and passengers, ADAS systems have indeed been targeted by top manufactures to support the drivers during unpredicted circumstances. The most common ADAS systems that are available include lane departure warning, forward collision warning, high beam safety system, traffic signals recognition, adaptive cruise control and so on. ADAS systems are semi-autonomous driving concepts that assist drivers during driving. The objective is to automate, adapt, and enhance safety by reducing human errors. The fully autonomous vehicle is capable of sensing the environment and navigating without human intervention under all environment circumstances. Here, the vehicle is capable of perceiving the environment, thinking and reasoning to take a decision and control the vehicle autonomously similar to a human driver [4]. The major components of autonomous vehicles include sensing, perception, decision making such as path and motion planning and actuating by generating controls for steering and braking.

Like human driver, ADAS and AV systems aim to perceive the environment using various sensors which help them to navigate autonomously. Passive sensors such as visible cameras and active sensors such as Lidars, Radars and ultrasonic are the most commonly preferred sensors to perceive the surrounding environment. Multiple sensors of different types are configured in such a way that the complete 360° surrounding environment can be perceived.

Figure 2 shows the state-of-the-art ADAS solution and sensors used to perceive and derive the information from environment. This article focuses mainly on the accurate perception of environment using various sensors. It includes vision, lidar, radar and ultrasonic sensors. Vision sensors are used either as monocular and/or stereo cameras. They are mostly used for 2D/3D object & pedestrian detection, lane, parking slots, traffic signs and signal detection and recognition. Both 2D and 3D Laser scanners (Lidar) are precise in determining the object’s position, orientation and object dimension. All moving and static objects that include buildings and surroundings along with road markings and kerbs can be detected using laser scanners. Automotive lidars are capable of scanning the environment both vertically and horizontally covering from certain field of view to 360° view. On the other hand, Radars generate the object position and relative velocity information more accurately for the detected objects. Each sensor exhibits its own advantages and disadvantages [3, 4]. For example, in adverse weather conditions, the performance of camera sensor may not be accurately compared to radar sensor to some extent. Similarly, camera sensor lacks in-depth estimation whereas, radars and lidars estimate depth more accurately. Individual sensors may not help to achieve level 2 and above ADAS solutions. Hence, to achieve the goal of level 3 and above autonomy, multiple sensor modalities in different configurations shall be beneficial [3, 4, 5]. The fusion of complementary and redundant information from various sensors helps to generate the complete perception of the ADAS and AV systems [4, 6]. In the automotive industry, many original equipment manufacturers (OEM)s and Tier-1 s are extensively concentrating on the research and development of successful autonomous driving concepts and Level 3 & above ADAS. Waymo, Uber, Tesla, Zoox etc., are a few OEMs into this autonomous vehicle research and development. Every autonomous vehicle is equipped with one or more automotive sensors that include cameras, lidars and radars in order to sense the environment as a perception system, supported by guidance and navigation systems. Manufacturers and operates of ADAS (Level 2 and above) and AV systems need to report crashes to the US agency as per a general standing order issued by NAtional Highway Traffic Safety Administration (NHTSA) in 2021 [2]. An average of 14 accidents were reported since July 2021 with the maximum of 22 and minimum of 8 crashes reported in a month by AV (level 3 and above) equipped vehicles still in development. Similarly, an average of 44 crashes were reported since July 2021 with maximum of 62 and minimum of 26 crashes reported in a month by level 2 ADAS equipped vehicles. Autonomous vehicle collision reports shows that the on road experience of such to be claimed as matured ADAS & autonomous driving technology still needs improvement to match with human driver perception. Numerous incidents as major and minor crashes are reported by self-driving cars on the road for test drive. This shows that the existing sensor suite that are claimed to achieve level 4 autonomous driving are lacking with the performance especially during extreme weather conditions, dark/night scenarios, glare and so on. This demands for more robust sensors to achieve full autonomous driving irrespective of environment conditions.

Figure 2.

State-of-the art-ADAS sensors.

The recent advancements in artificial intelligence (AI) have a significant impact on the fast development and deployment of level 3 and above ADAS and AV solutions. Especially, in order to generate precise information from surrounding environment, high data from different sensing modalities and advanced computing resources play a key role in enabling AI as an essential component for ADAS and AV perception system [7]. Extensive research and development activities are currently invested to analyze the effective use of AIs in various functionalities of AVs such as perception, planning and control, localization and mapping and decision making.

In this article, the limitations and challenges of the existing sensor suite, need for infrared camera, infrared technology, applications of AI in the development of perception systems and the need for multiple sensor fusion strategy are presented in detail. Also, some latest research in the field of infrared sensors and deep learning approaches for ADAS and AV systems are discussed in detail.

Advertisement

2. Perception sensors for ADAS and AV systems

In ADAS and AV systems, sensors are considered equivalent to eyes and ears of the human driver to sense and perceive the environment. Different aspects of the environment are sensed and monitored using various types of sensors and the information will be shared with the driver or Electronic control unit. This section introduces commonly used automotive sensors and their functionality in achieving level 2 and above ADAS and AV solutions. Figure 3 shows the representative image of a vehicle equipped with perception sensors for one or more ADAS and AV solutions [9].

Figure 3.

Representative figure of AV/ADAS vehicle with various perception sensors. Figure from [8].

2.1 Vision sensors

RGB or visible cameras are most commonly used in any ADAS and AV systems due to their low cost and easy installation. Normally, more than one camera is used to capture/sense the complete environment. Images captured by vision sensors are processed by an embedded system to detect, analyze, understand and track various objects in the environment. Captured images are rich in information such as color, contrast, texture and details which are more unique features over other sensors. Visible cameras are used as a single lens camera called as monocular camera and two lens camera called as stereo setup. Monocular cameras are low-cost and require less processing power. They are commonly used for object detection and classification, lanes and parking lines detection, traffic sign recognition etc., Monocular cameras lack distance/depth information compared to other active sensors. There are a few techniques used to estimate the distance information but not as accurate as expected by ADAS and AV systems for maneuvering autonomously. On the other hand, stereo cameras are useful in extracting depth or distance information as the system consists of two lenses separated by a distance resembling two eyes of the human. Such systems are highly beneficial to detect and classify the objects in the environment along with depth/distance information with better accuracy compared to monocular cameras. When compared to the other automotive sensors, the depth estimation using stereo cameras is reliable up to 30 m over a short distance [10]. Autonomous vehicles demand accurate distance estimation at far distances, especially in high ways (Figure 4) [3, 10].

Figure 4.

Common adverse scenarios where current ADAS/AV sensor suite struggles to perform.

2.1.1 Lidar

LiDAR stands for Light Detection and Ranging. This is an active sensor that works by emitting a laser beam which gets reflected by any object. The time taken between the emitted laser beam and its reflection measures the distance of the object. These sensors are capable of generating high-resolution 3D point clouds and operate at longer ranges when compared to vision sensors. Also, it can generate 360° 3-D images surrounding the ego vehicle with accurate depth information. In recent autonomous vehicles, LiDAR sensor plays a major role in driving the vehicle autonomously by generating accurate and precise environment perception. ADAS systems such as autonomous braking, parking solutions, collision avoidance, object detection etc., can be achieved with more accuracy using Lidar sensors. A major setback of Lidars is the bulky size and expensive. Also, extreme weather conditions such as rain and fog can impact the performance of Lidar sensors. Due to the latest advancements in semiconductor technology, significantly smaller and inexpensive Lidars may be possible in future [3, 10].

2.1.2 Radar

Radar stands for radio detection and ranging. This is an active sensor that works on the principle of the Doppler effect. Radars emit microwave energy and measure the frequency difference between emitted and reflected beams in order to estimate the speed and distance of the object from which the energy gets reflected. It is capable of detecting objects at a longer distance compared to lidar and vision sensors. Radar performs equally in all weather conditions, including extreme conditions such as rain and fog. Radars are classified as short, medium and long-range sensors. Short and medium-range sensors are mostly used in blind spot detection, cross traffic alert and they will mounted in the corners of the vehicle. Long-range radars are mostly used for adaptive cruise control and are mounted near front/rear bumpers [3, 10].

2.1.3 Ultrasonic sensors

Ultrasonic sensors are active sensors which use sound waves in order to measure the distance between ego vehicle and objects. Such sensors are most commonly used to detect nearby objects to the vehicle such as kerbs, especially in parking spaces [3, 10].

Other sensors such as GPSs and IMUs are also used in most of the ADAS and AV use cases. These sensors are used to measure the position and for localization of the ego vehicle. Table 1 shows the consolidated summary of the advantages and disadvantages of various sensors for ADAS and AV systems.

SensorAdvantagesDisadvantages
RadarLong rangeLow accuracy and resolution
Works in poor visibilityMutual interference of Radar
Consumes less powerPoor azimuthal and elevation resolution
Robust to FailureError-prone object classification
Small, lightweight and affordableNo surround view perception
LidarLong rangeExpensive and high-maintenance
Surround-view perceptionTransmission is sparse
Good accuracy and resolutionAffected by varying climatic conditions
No significant interferencesPoor small object detection
UltrasonicLow in costOnly short-range distances
Low in cost and small in dimensionsSensitive to temperatures
At short ranges, higher resolutionProne to interference and reverberation
Overcome pedestrian occlusion problemsPick noise from environments
CameraHigh-resolution and color scalesRequires a powerful computation
Surround-view and 3D informationPoor distance estimation
Low cost and maintenanceSensitive to adverse conditions
Small size and Easy to deployInaccurate during low light
InfraredWorks at all light conditionsDemanding computation sources
The sensing range can cover up to 200 mClassification issues in cold conditions
Better vision through dust, fog, and snowChallenging to detect and classify

Table 1.

List of advantages and disadvantages of various sensors for ADAS/AV perception systems [11].

2.1.4 Challenges with existing sensor suite

Human-perceivable sunshine is a small part of the spectrum of solar irradiance, which approximately contains 5% ultraviolet, 43% visible, and 52% infrared wavelengths [12]. Whereas at night, streetlamps and headlamps are the primary sources of light. However, the lighting pattern of the vehicle headlamps is strictly regulated for safety reasons: the range of low and high beams can only vary from 60 m to 150 m. The visibility of the targets of interest may vary based on the light reflections from their surfaces. Diffuse reflection is observed when the surface is rough like asphalt, clothing, wood etc. and specular reflection is observed when the surface is mirror-like, wet, epoxy, metallic etc. However, as shown in Figure 5, adverse conditions, such as direct sun glare, dense fog, heavy rain, high beam glare, surface reflections and low light, would cause light reflections that result in the reduction of visibility for RGB Cameras.

Figure 5.

Electromagnetic Spectrum-representation of IR range

Existing ADAS solution in the market uses vision & ultrasonic sensors predominantly to achieve level 1 features such as warning and alert signals for traffic lights, and detecting obstacles while reversing the vehicle etc., To achieve level 2 and above ADAS features, vision, radar and ultrasonic sensors are used either as individual sensor modality or combination of these sensors. For example, a vision sensor is capable of generating accurate detection and classification of on-road objects, static objects etc., whereas radar sensor is capable of generating accurate position and velocity, distance of the objects from a vehicle. Hence, these two pieces of information will be fused to generate the combined representation of the detected objects with their position, velocity and class. Here, each object can be represented with more rich information so that the guidance and navigation module can make proper decisions and planning for the vehicle to move autonomously. Fusion of the information from multiple sensors modality implies the combined representation of complementary and redundant information in order to represent the environment more precisely. Also, this enables theextension the feasibility of ADAS and AV systems to function in all weather/lighting conditions [6]. However, above said sensor fusion is best suited to address the challenges during daylight and not fully compliance during night time. The sensors combination in the latest ADAS and AV solutions are still not able to precisely generate the environment perception as human driver perceives. The latest standing general order crash report [2] available on the National Highway Traffic Safety Administration web page, clearly shows that the existing sensor suite is not sufficient to achieve level 4 & above autonomous driving. The system fails especially during adverse weather conditions, low light and dark scenarios, extreme sun glare and so on. After the accident of the autonomous Uber car in 2018 [13], the research community started considering including the infrared sensor in the ADAS sensor fusion suite. This shows that there is a need for another sensor which can complement the information, especially during extreme weather and lighting conditions. To enable level 4 or 5 ADAS functionalities with zero human intervention, it is necessary to make the system more robust to various weather and lighting conditions [11, 14].

2.1.5 Need for infrared sensors for ADAS and AV

The most state-of-the-art approaches constitute camera and lidar sensors as the major sensory modality for detecting the object, mostly due to the fact that these sensors will have dense image pixel information/high-density point cloud data. Whereas, these lidar and radar sensors are costly and computationally expensive. Vision sensor-based perception algorithms depend on the brightness and contrast of the images captured. They are cost-effective sensors and use either image processing or deep learning-based techniques for object detection. Due to limited image features, CNN-based algorithms are able to detect the objects only with good lighting conditions or with minimum lux level. Moreover, vision sensor-based approaches often fail at daylight glare, night-time glare, fog, rain, and strong light/direct sun situations. The performance may be degraded in poor lighting conditions such as dark scenarios, dams, tunnels, parking garages, etc.,. Similarly, when there is contamination in the camera lens and increased complexity of the scenes such as the detection of pedestrians in crowded environment are always challenging for any vision-based perception algorithms [15].

Apart from the mentioned challenges, the vision sensors are also limited to the distance of the objects. Reasonable performance can be expected up to 20 meters using visible images and hence, recognition of long-distance objects is limited. It has limitations in low light conditions, fog and rain weather conditions to sense the environment precisely. The limitations and challenges can be overcome by adding additional sensors such as lidars and radars that can detect objects at a long distance even in fog and rain.. Recent accidents by Uber and Tesla autonomous vehicles indicate that the sensors suite comprising vision sensors, lidars and radars are not sufficient, especially in the detection of cars and pedestrians during extreme weather and lighting conditions The robust perception algorithm is expected to work in all lighting and adverse weather conditions as shown in Figure 5. Hence, the ADAS/autonomous driving sensor suite should require a night vision-compliant data-capturing sensor, such as infrared or thermal imagery. Infrared (IR) sensors look promising, especially in extreme conditions such as poor lighting, night, bright sun glare and inclement weather. IR sensors are capable of classifying vehicles, pedestrians, animals and all objects in common driving conditions. They are capable of performing equally in daylight and dark scenarios. It outperforms in low light and dark lighting conditions compared to other sensors used for ADAS and AV applications [15].

Advertisement

3. Infrared sensor technology

Infrared falls within electromagnetic radiation with wavelengths longer than the visible spectrum and shorter than radio waves. IR ranges from 0.75 μm to 1000 μm. Any object with an absolute temperature over 0 K is capable of radiating infrared energy. Generally, it is a measure of internal energy due to the acceleration of electrically charged particles. Hotter objects radiate more energy. It is invisible to human eyes but sensed as a warmth on the skin. The electromagnetic spectrum and IR range are shown in Figure 5. Infrared sensor is an electronic device which is capable of emitting and detecting Infrared radiations within the IR range. It works on three basic laws of physics [16, 17, 18]:

  1. Planck’s law of radiation—It states that any object whose temperature is not equal to absolute zero Kelvin (O K) emits radiation

  2. Stephan Boltzmann’s law—It states that the total energy emitted by a black body at all wavelengths is related to the absolute temperature

  3. Wein’s law of displacement—It states that objects at different temperatures emit spectra whose peak is at different wavelengths that are inversely proportional to the temperature.

Figure 6 shows the radiant existence of a perfect black body according to Planck law and the maximum peak is inversely proportional to the temperature as per Wein’s displacement law.

Figure 6.

Radiant existence of a black body according to Planck law [16].

Based on the operating principles, IR sensors are broadly classified into thermal and photonic detectors. Infrared rays emitted by any objects are captured by thermal sensors and then converted into heat which is then transformed into a change in resistance. Thermo electromotive force extracts the output. Quantum sensors use the photo conductive effect and photovoltaic effect in semiconductors and PN junctions. For ADAS and AV applications, thermal sensors are most widely accepted. Another classification of IR detectors into cooled and non-cooled detectors based on operating temperature is often used at initial stages. Based on the detector’s construction, it can be further classified into single and linear or array detectors. The commonly used detector arrangement is the FPA, focal plane array sensor which consists of multiple single detectors placed as array of matrices [16, 17, 18]. Further classification of IR sensor type is based on the operating frequency band as IR detectors operate in the bank where maximum transmission with minimal absorption is possible [16, 17, 18]. It is generally classified as:

  1. NIR—Near infrared radiation falls in the range from 0.75 μm to 1 μm

  2. SWIR—Short wave Infrared Radiation falls in the range between 1 μm and 2.5 μm

  3. MWIR—Mid-wave infared radiation falls in the range between 3 μm and 5 μm

  4. LWIR—Long Wave Infrared Radiation falls in the range between 8 μm and 12 μm

A pictorial representation of the IR sensor types based on the operational frequency band is shown in Figure 7 and Table 2 summarizes the various specifications of IR sensor types. NIR and SWIR cameras are known as ‘reflective infrared’ like RGB cameras, which require an external light source for illumination. NIR is mainly used in the in-cabin applications for Driver monitoring systems. Whereas, SWIR provide more context information, like lane markings, traffic signs, etc. Unfortunately, SWIR camera applications are uncommon due to the high cost of indium gallium arsenide (InGaAs) detectors [19]. LWIRs are commonly referred to as ‘thermal infrared’ as they operate solely through thermal emissions and do not require any external sources for illumination [20].

Figure 7.

Infrared sensor types based on operational frequency bands.

NIRSWIRLWIR
Known asReflected IRReflected IRThermal IR
Wavelength, μ m0.7–1.41.4–38–14
ImagerSilicon-basedInGaAs detectorsPhoton detectors
Detection rangeShortShortLong (165 m)
UsageNight visionFeature extractionHeat detection
ApplicationsDriver monitoringRoad marking detectionPedestrian and object detection

Table 2.

Details of various types of infrared images used in automotive industry.

Representative images for visible camera (RGB), near infrared (NIR), short-wavelength infrared (SWIR), mid-wavelength infrared and long-wavelength infrared (LWIR) cameras are shown in Figure 8 [16, 17, 18].

Figure 8.

Representative images for visible camera (RGB), near-infrared (NIR), short-wavelength infrared (SWIR), mid-wavelength infrared and long-wavelength infrared (LWIR) cameras [20].

The basic components of the Infrared imaging system are shown in Figure 9. It measures the IR radiations emitted from the objects and then converts them into electrical impulses using an IR detector. The converted signal is transformed into temperature map considering the ambience and atmospheric effects. A temperature map is displayed as an image which can be color-coded to represent thermograms thermal images or IR images using an imaging algorithm. IR detector acts as a transducer, which converts radiation into electrical signals. Microbolometric detectors are the most widely used IR detectors as they can operate at room temperature. It is basically a resistor with a very small heat capacity and a high negative temperature coefficient of resistivity. IR radiations received by detectors change the resistance of the microbolometers and produce the corresponding electrical outputs. Considering the ambient and atmospheric effects between the object and IR detector, an infrared measurement model is used to represent the detected heat map to temperature map. The measurement model depends on the emissivity of the object, atmospheric and ambient temperature, relative humidity and distance between the object and detector as per the FLIR thermal camera. It varies between different manufacturers and IR detector characteristics. The measured temperature map is visually represented as grayscale images. For industrial applications, pseudo color-coded images named as thermograms can also be generated by the thermal imaging system for easy representation of the difference in temperature distribution. Figure 10 shows a representative RGB and the corresponding IR thermal images used in ADAS and AV applications [17].

Figure 9.

Basic components of IR imaging system.

Figure 10.

Representative RGB and IR images used in ADAS and AV applications from Kaist multispectral pedestrian detection dataset [21].

Advertisement

4. Role of infrared sensors in ADAS and AV applications

In automotive domain, it is a highly challenging requirement to achieve level 3 and above autonomous driving under all weather and lighting conditions and deliver an environment perception. ADAS and AV systems are expected to support all driving conditions, in high complex roads, totally unpredictable situations and vehicles must be mounted with cost-effective sensor suites that are capable of extracting maximum information possible to make an accurate decision. It is also expected that the perceived environment to represent the scene information adequately such that computer vision algorithms can detect and classify the objects. This will ensure precise autonomous navigation and control and provide safe advanced ADAS and AV systems. SAE automation level 2 systems in commercial market already have vision sensors, ultrasonics and radar sensors. The next level of SAE automation can be achieved by adding multiple sensors and lidar sensors to the existing sensor suite. Each sensor has their own limitations and advantages [16, 17, 18]. Thus, sensor fusion will come into play where the advantages of different sensor modalities can be utilized to address the various limitations. Even though, NHTSA shows that the existing ADAS and AV solutions are lacking in achieving the expected performance as there were crashes reported by every vehicle supported with ADAS and AV features [2]. Also, the Uber and Tesla accidents clearly show that the current SAE automation level 2 and 3 sensors suite do not provide accurate detection of cars and pedestrians. In particular, vulnerable road users (VRUs) such as pedestrians, animals and bicyclists are challenging to detect and classify accurately. Classification of these objects are challenging in poor lighting condition, low light and dark scenarios, direct sunlight in driving direction and extreme weather conditions such as fog, rain and snow. The performance of vision sensors does not meet the requirement of autonomous driving in such conditions. Other sensors performance was also more or less affected and failed to provide the complete environment perception for autonomous navigation. A combination of low light vision sensors, lidar and radar can help to some extent to perform night scenarios up to 50 meters. Beyond that, it will be challenging to drive the vehicle autonomously [15]. Infrared sensor technology overcomes the aforementioned challenges and reliably detects, classifies cars, vehicles, pedestrians, animals and other objects that are common in driving scenarios. Also, IR sensors are capable of performing equally in daylight conditions thereby they can provide redundant information for the existing sensor suite thereby increasing confidence in detection and classification algorithms. IR sensors can be used effectively to address the limitations of vision and other sensors. The real-time performance of IR camera is not affected by low light and dark scenarios, sun glare or vehicle headlight and brake light reflections. It can be considered as a potential solution in extreme weather conditions such as snow, fog and rain. Un-cooled thermal imaging systems that are available in the market are the most affordable low-cost IR sensors due to advancements in microbolometer technology. These sensors are capable of generating the temperature map as an image to visually analyze and further process the environmental information. In existing automotive sensor suites, these sensors can supplement or even replace existing technology due to the advantages of sensing the infrared emission of the objects, and operates independently of illumination conditions, thereby providing a promising and consistent technology in order to achieve more precise environment perception systems [15].

As per the NTSB report, the fatal incident of a pedestrian by Uber vehicle which is a level 3 autonomous car in Tempe, Arizona. This vehicle uses lidar, radar and vision sensors. The report shows that the incident happened at night time when there were only street lights lit. The system is first classified as an unknown object, after some time as a car, then as a bicycle and finally as a person. This scenario was recreated and tested by FLIR using a wide field of view thermal camera using a basic classifier. The system was capable of detecting the person at a distance of approximately 85.4 m which is twice the required stopping distance for a vehicle at 43 mph. When performed using narrow FOV cameras, FLIR IR cameras have demonstrated the performance of pedestrian detection even at a distance four times greater than the required by decision algorithms in autonomous vehicles [15].

The vehicle currently on the road with ADAS systems that support SAE level 2 with partial automation and level 3 with conditional automation does not include an IR sensor in the sensor suite. AWARE (All Weather All Roads Enhanced) vision project was executed in 2016 to test the potential of sensors operating in four different bands on the electromagnetic spectrum such as Visible RGB, near infared (NIR), short wave Infrared (SWIR) and long wave infared (LWIR), especially in challenging conditions such as fog, snow and rain. It was reported that LWIR camera performed well in detecting pedestrians in extreme fog (visibility range = 15 ± 4 m) compared to NIR and SWIR whereas the vision sensor reported with lowest detection performance comparatively. Similarly, LWIR camera was capable of detecting pedestrians in extreme dark scenarios and when reflections were present due to other vehicles headlights in the fog conditions whereas the other sensors failed to detect pedestrians as they were not seen due to headlight glare/reflections [15, 22]. Also, as per Wien’s displacement law, the peak radiation for the human skin with emissivity 0.99 is 9.8 μm at room temperature which falls in LWIR camera operating range. Therefore, being completely passive sensors, LWIR cameras take advantage of sensing the emitted IR radiations from objects irrespective of extreme weather conditions and illuminations [16]. The distance at which the IR sensor can detect and classify an object depends on the field of view (FOV) of the camera. Narrow FOV cameras are capable of detecting objects at far distances whereas wide FOV cameras are capable of detecting objects in the greater angle of view. Also, IR sensors require the target object of 20 x 8 pixels to reliably detect and classify the object. IR sensor with a narrow FOV lens is capable of detecting and classifying an object of 20 x 8 pixel size at a distance greater than 186 meters. Therefore, IR sensors with narrow FOV can be used on highways to detect far objects whereas wide FOV cameras can be used in urban or city driving scenarios [15]. In the automotive domain, IR sensors can be effectively used in both in-cabin sensing and driving applications. In cabin sensing applications, IR sensors will be mounted inside the vehicle in order to understand the driver’s drowsiness and fatigue detection, eye gaze localisation, face recognition, occupant gender classification. Facial expression, emotion detection etc. In ADAS and AV systems, IR sensors can be efficiently used to generate precise and accurate perception of the surrounding environment by providing one or more cameras mounted in the vehicle. It is commonly used to generate object detection, classification and semantic segmentation [23].

Advertisement

5. AI applications in the development of ADAS and AV systems

In the automotive domain, the challenges to achieve full level 5 autonomous driving seem to be possible due to the advancements in sensing technologies and artificial intelligence (AI). Advancements in AI advancements lead to achieving the requirement of AV systems such as perceiving, thinking and reasoning. In addition to that, advanced computing resources to process the huge amount of data sensed through multiple sensors of different modalities also play an important role for the research and development community to move towards autonomous vehicles. In ADAS and AV systems, AI becomes the gold standard approach in order to perceive the surrounding environment so that proper planning and vehicle motion can be achieved. In Level 3 and above ADAS and AV systems, AI is used primarily in perception, localization and mapping and decision-making. There is a need to understand the various aspects of AI applied in AV development and the current practices in bringing the AI system for autonomous driving. In particular, deep learning (DL) based algorithms are capable of handling various challenging issues such as accurate and precise object detection and classification, appropriate control of steering wheel, acceleration and deceleration etc., AI research field focuses on DL approaches including convolutional neural networks, LSTM and deep belief network for vehicle perception, motion planning, path planning and decision making [4, 7]. Generally, DL-based approaches are used to recognize the objects on the road that includes a perception module and localization and mapping module. On-road objects are detected, classified, fused and tracked in the perception module. It receives input from multiple sensors of different modalities such as lidars, radars, vision and IR cameras. In addition to on-road objects, vision sensor-based traffic sign detection, lane detection and parking lines detection are also achieved using DL approaches. It can also be used to generate the object-level information such as object position, size, class, distance, and orientation from ego vehicle and also as semantic information which generates pixel-wise object class. Mostly commonly convolutional neural networks (CNN) are used for object detection and classification tasks. A more generic representation of convolutional neural network and its components are shown in Figure 11. Any deep learning model has an input layer, few hidden layers and a final fully connected layer called an output layer. Input layer takes an image as input and final output layer define the detected objects along with their confidence score. A combination of the convolution layer, pooling and activation layer represents one hidden layer whereas in deep networks, multiple layers of these feature extraction layers are added to extract the coarse to detailed information. At last softmax function is used to classify more than one object detected with the corresponding confidence score based on the feature similarity. The object with the highest confidence score is recognized as the object class with the detected bounding box information [24]. CNN, RCNN, fast RCNN, faster R-CNN, SSD, Yolo, Yolov2, Yolov3 etc., are a few DL approaches that are most commonly used for the recognition of road objects. In case level 3 and above ADAS and AV systems, multi-task networks are predominantly used where a common network architecture will be trained to perform multiple tasks. End-to-end AI system can also be used to generate the complete perception of ADAS and AV systems that include perception, localization and mapping algorithms [7].

Figure 11.

Representative convolutional neural network and its components.

Advertisement

6. Survey on deep learning approaches using infrared images for ADAS and AV applications

Training, testing and validation are the most important steps to be considered for any deep learning-based perception algorithms. After proper training of the networks, it can deployed in real time to perceive the environment successfully. The captured dataset and its size used for training the model play a critical role while deploying the model in real time. Convolutional neural networks (CNN) have significantly improved the performance of many ADAS and AV applications, whereas it requires significant amounts of training data to obtain optimum performance and reliable validation outcomes. Fortunately, there are multiple large-scale publicly available thermal datasets with annotations which can be used for training CNN. However, when compared to visible imaging data, there are not many 2 dimensional thermal datasets available for automotive applications on the open internet. In Table 3, a list of available datasets captured by various types of thermal sensors in different environmental conditions is provided. These datasets are widely used in building various pre-trained CNN models for several applications like Pedestrian detection, Vehicle detection and classification and small object detection in the thermal spectrum on GPU/edge-GPU devices for the automotive sensor suite [31]. Whereas there are inherent challenges also associated with the training and validation of the thermal data for CNN models are as follows: A limited number of publicly available datasets, little variability in the scenes such as weather, lighting and heat conditions, difficulties while using RGB pre-trained CNN models for thermal data.

DatasetSizeResolutionThermal ImagesContent
C3I [25]0.5 GB640×48039,000Person, Vehicles, Bicycles, Bikes
LITIV [26]0.7 GB320×4306000Person
CVC [27]5 GB640×48011,000Person, Vehicles, Bicycles, Bikes, Poles
TIV [28]5.4 GB1024×102463,000Person, Vehicles, Bicycles
FLIR [29]16 GB640×51214,000Person, Vehicles, Bicycles, Bikes, Poles, Dogs
KAIST [30]37 GB320×25695,000Person, Vehicles, Bicycles, Bikes, Poles

Table 3.

Details of various open access infrared datasets available related to ADAS applications.

6.1 Object detection in the thermal spectrum

Object detection is an important part of autonomous driving, which is expected to accurately and rapidly detect vulnerable road users (VRUs) like pedestrians and cyclists, vehicles, traffic signals, sign boards, animals etc. during all lighting and weather conditions. These detection results are used for motion tracking and pose estimation of the objects and subsequently utilized to take appropriate actions while ADAS technologies like Cruise control, Lane departure systems and Emergency breaking are in action.

6.1.1 Classical detection techniques

The acquisition of frames, selection of region of Interest, feature extractions and classification are the generic steps followed for VRUs detection. Identifying the Region of Interest (ROI) [32] is the the first instance of detection of the desired objects and then feature extraction takes place such as edges, shapes, curvature etc. These extracted features are further used for object classification. Background subtraction is the most commonly used technique for detecting the moving objects, whereas there are more advanced techniques such as sliding window, objectiveness and selective search to counter adverse conditions. The Histogram of Oriented Gradients (HOG) features [33] and Local Binary Patterns (LBP) [34] are basic hand-crafted image processing techniques to extract features and classify objects, but they are limited when we have complex features are need to be extracted. Whereas deep learning-based techniques allow the network to extract the features, which can provide a higher level of information. Then support vector machine, a decision tree or a deep neural network are used to classify the object. DL-based techniques are found to be the out performing the traditional methods [35].

6.1.2 Deep learning approach for object detection

The two commonly used deep learning approaches for VRUs detection are two-stage detector (region proposal approach) and single-stage detector (non-region proposal approach). Two-stage or regional proposed approach is the hand-crafted technique that uses HOG, LBP or similar for feature extraction in the first stage and includes CNN networks for classification in the second stage such as region-CNN (R-CNN) [36], regional-fast convolution network (R-FCN) [37] and faster R-CNN [38]. Whereas single-stage detector is able to perform region proposal, feature extraction as well as classification in a single step. Some of the non-region proposal-based approaches include single shot detector (SSD) [39] and you only look once (YOLO) [40]. The advantages and disadvantages of two and single-stage detectors are presented in Table 4.

TypeExamplesAdvantagesTrade-Offs
Single stageSSD, YOLOHigher speedsInformation loss, Large number of False Positives
Two-stageR-CNN, R-FCNIncreased AccuracySlower Speeds
Faster R-CNNInformation richComplex computation

Table 4.

Deep learning-based object detection types.

6.1.3 Deep learning based sensor fusion techniques

Many studies are conducted around the most reliable approach to using both color and thermal information from visible cameras and thermal cameras [41, 42, 43]. Commonly, these studies highlight the illumination dependency of visible cameras as well as their limitations during adverse weather conditions and the benefits of including the thermal data for better performance. The fusion of visible and thermal cameras aids in reducing the inaccuracies of object detection mainly during nighttime. The fusion of this sensor information can be possible at various levels like pixel level, feature level or decision level. For pixel-level fusion, the thermal images that are intensity values, which can be fused with the visible images in the intensity (I) component and fused images can be reconstructed with the new I value. In general, the pixel-level fusion is done through the following methods: Wavelet-based transform, curvelet transform and laplacian pyramid transform fusion. Typically pixel level fusion is not used with deep learning-based sensor fusion as it takes place outside the neural network.

The typical architectures for deep learning-based sensor fusion are early fusion, late fusion and halfway fusion, as shown in Figure 12. Early fusion is also called feature-level fusion, in which visible and thermal images are combined together as a 4-channel (Red, Green, Blue, Intensity (RGBI)) input for the deep learning network to learn the relationship between the image sources. The late fusion is also called decision-level, in which feature extraction of visible and thermal images happens separately into subnetworks and fused just before the object classification layer. Halfway fusion is another approach which involves feeding the visible and thermal information separately into the same network and the fusion happens inside the network itself. There are various studies which demonstrated the benefits of using multispectral detection techniques, which produced the best results while combining the visible and thermal images. However, during night conditions thermal cameras performed much better than the fusion data. At low light conditions, fused data performed worse with an Average Miss rate of 3% and an overall decrease of 5% during daytime is observed. Also, the usage of multiple sensors causes an increase in system complexity due to differences in sensor positions, alignment, synchronisations and resolutions of the cameras used [35].

Figure 12.

Sensor fusion techniques [35].

By far, the halfway fusion is considered the most effective of the other two techniques with 3.5% lower miss rate. Also, using the stand-alone visible and thermal information was shown to be performing much worse than halfway fusion by 11% [44]. Wagner et al. [45] investigated finding the optimal fusion techniques for pedestrian detection using Faster R-CNN and KAIST datasets and found that the multispectral information with single-stage and halfway fusion can achieve better performance. Similarly, there are various studies conducted highlighting the importance of detection stages and sensor fusion techniques at various lighting conditions to draw similar conclusions [46]. However, these deep learning models should estimate the bounding box of the object appropriately and calculate the probability of the class it belongs via neural networks, which makes them unsuitable for real-time applications [40].

DenseFuse, a deep learning architecture to extract more useful features proposed by Li et al. [47] uses a combination of CNN, fusion layers and dense blocks, which creates a reconstructed fused image that outperforms all the existing fusion methods. SeAFusion network combines the image and semantic segmentation information and uses gradient residual blocks to enhance image fusion process [48]. An unsupervised fusion network called U2Fusion was proposed by Xu et al. [49] to best estimate the fusion process considering the source image importance. Similarly, multiple image fusion networks were proposed such as end-to-end fusion network (RFN-Net), the effective biletaral mechanism (BAM), bilateral ReLU residual network (BRRLNet), etc. [50, 51]. Despite so much progress in deep learning-based fusion architectures, they are no lightweight real-time running application, they are all limited by appropriate hyper-parameter selection and significant memory utilization. The advantages and disadvantages of image fusion model-related literatures are summarized in Table 5.

LiteratureAdvantagesDisadvantages
DenseFuse [47]Extract useful featuresLoss of contrast and brightness
SeAfuseion [48]Combines fusion and semantic segmentationCannot handle complex scenes
U2Fusion [49]Adapts to new fusion tasksNot robust to noise
Y-shaped net [50]Extracts local features and context infoIntroduce artifacts or blur
RFN-Net [51]Two-stage training strategyLarge amount of training data and time

Table 5.

Summary of image fusion model-related literature.

6.1.4 Real-time object detection

In YOLO framework [40], both creating the bounding box and classification of image are dealt as a single regressive problem to enhance the inference speed and train the neural network as a whole task. YOLO creates a grid of mxn of the input image then predicts the N number of bounding boxes and estimates the confidence score on each Bounding Boxes (BB) using the CNN. Each BB consists of (x,y) central coordinates, (w,h) is its width and height along with class probability value. The intersection of union (IOU) is calculated based on the overlap of the detected BB and the actual ground truth (GT). The width of the IOU indicates how accurately the BB is predicted. The probability of the BB is expressed as the multiplication of probability of the object and the width of the IOU. The central coordinates of the predicted BB and the GT exist in the IOU then it is assumed that successful detection and Pr(Object) is set to 1, else it is set to 0. If there are i number of classes that could be classified as Pr(Class | object. The BB with the highest probability of the classified object among all possible N numbers of BBs is considered as the best fit BB of the concerned object.

Yoon and Cho [52] have proposed a multimodel YOLO-based object detection method based on late fusion using non-maximum suppression to efficiently extract the features of an object using color information from visible cameras and boundary information from thermal cameras. The architectural block diagram is shown in Figure 13. The non-maximum suppression is generally employed towards the second half of the detection model to improve the object detection performance of models like YOLO and SSD.

Figure 13.

Block diagram of the multimodal YOLO-based object detection method based on late fusion [52].

Further, they have also proposed an improved deep multimodel object detection strategy by introducing dehazing network to enhance the performance of the model during reduced visibility. The dehaze network constitutes evaluation of haze level classification, light scattering coefficient estimation from visible images and depth estimation from thermal images. Detailed performance metrics for dense haze condition results from Yoon and Cho [52] are presented in Table 6 based on YOLO trained for (a) visible (b) IR/thermal (c) visible and IR and (d) visible, IR and dehaze network. Examples of output images from Yoon and Cho [52] of the vehicle detection results based on YOLO model trained for visible alone, IR/thermal alone, fused visible and IR and fused visible, IR and dehaze model are shown in Figure 14. Missed detection is marked as red box and correct detection is marked as blue box. The performance of vehicle detection model improved from 81.11% to 84.02% of accuracy from fusion model, but badly impacted by the run time, hence dehaze model is unfit for real-time applications.

MetricsVisibleIR/ThermalVisible+IRVisible+IR + Dehaze
Accuracy61.5578.681.1184.02
TP1264162616831747
FP30182238269
Precision0.970.900.870.86
Recall0.610.790.820.85
Run time (ms)27.8227.8227.89686.99

Table 6.

Performance of vehicle detection model during dense haze condition based on YOLO trained for (a) visible (b) IR/thermal (c) visible and IR and (d) visible, IR and dehaze network. Results from [52].

Figure 14.

Examples of vehicle detection results based on YOLO model trained for (a) visible (b) IR/thermal (c) visible and IR and (d) visible, IR and dehaze network. Missed detection is marked in red box and correct detection is marked in blue box. Results from [52].

6.1.5 Real-time pedestrian detection

Chen et al. [53] proposed a thermal based R-CNN model for pedestrian detection using VGG-16 as a backbone network as it has good network stability which enables the integration of any new branch network. To address the pedestrian occlusion problem, they have proposed a part model architecture with new aspect ratio and block model to strengthen the network’s generalization. The presence and resemblance of pedestrian will be completely lost if the occlusion rate is over 80%, henceforth, training pedestrians occlusion <80% only be considered. Figure 15 represents the possible types of pedestrian occlusion and the ground truth labelling and possible detection are represented as green and red rectangle boxes, respectively. Figure 16 shows the architecture of the thermal R-CNN fusion model for improved Pedestrian detection proposed by [53], which constitutes a full body and region decomposition branch to extract the full body features of pedestrian and segmentation head branch to extract the individual pedestrian from the crowded scenes. The loss function is defined considering five loss components such as BB loss, classification loss, segmentation loss, pixel level loss and fusion loss.

Figure 15.

Illustration of types of pedestrian occlusion. (a) the green BB rectangle represents full pedestrian and red BB rectangle represents detectable pedestrians. (b) Top six types of pedestrian occlusion types [53].

Figure 16.

The architecture of the thermal R-CNN fusion model for improved pedestrian detection [53].

In Table 7, the performance comparison results from [53] for thermal R-CNN fusion pedestrian detection model with state-of-the-art deep learning models found that the thermal R-CNN fusion model is effective and performing better. The thermal R-CNN fusion model is sensitive to regional features which may easily tend to misjudge the images, however, the semantic segmentation feature enhances the information of the complete pedestrian bounding box and the final output is accurate resulting in higher precision. Figure 17 shows the results from [53], the example images to demonstrate the improved pedestrian detection by the thermal R-CNN fusion model compared to ground truth and benchmarked modified R-CNN model. It is evident from the results that modified R-CNN results partially detect the occluded pedestrians and in some cases, double partial bounding boxes are created for single pedestrians, whereas this issue is addressed and the pedestrian detection is improved in the thermal R-CNN fusion results.

MetricsFaster R-CNNMask R-CNNlYOLOThermal R-CNN fusion
TP992105012501317
FP740720954334
FN995937734670
Precision0.5720.5930.5670.797
Recall0.4990.5280.6300.662
F1-score0.5330.5580.6610.724

Table 7.

Performance comparison of various pedestrian detection models. Results from [53].

Figure 17.

Example images to demonstrate the improved pedestrian detection by (c) the thermal R-CNN fusion model compared to (a) ground truth and (b) modified R-CNN. Results from [53].

6.1.6 Knowledge distillation

Knowledge distillation is widely known as a student-teacher approach for model compression where a student network is trained to match a large pre-trained teacher network. Here, the information is transferred from teacher network to the student network by reducing the loss function as well as GT labels. Chen et al. [54] proposed to use the low-level feature from the teacher network to supervise and train the deeper features for the student network resulting in improved performance. To address the low-resolution issues of IR-visible fused images, Xiao et al. [55] introduced a heterogeneous knowledge distillation network with multi-layer attention embedding. This technique consists of teacher network with high resolution fusion and a student network with low-resolution and super-resolution fusion. The perceptual distillation method used to train the image fusion neural networks without GTs is proposed by Liu et al. [56] to train the teacher network and a multi-autoencoder with self-supervision for a student network. Furthermore, there are multiple such studies conducted in recent times summarized in Table 8.

LiteratureAdvantagesDisadvantages
Original knowledge distillation [54]Compresses a large modelLoses information
Cross-stage connection path [55]Uses low-level features to supervise deeper featuresIncreases the complexity
Heterogeneous [56]Joint fusion and Super-resolutionDepends on Teacher network quality
Perceptual distillation [57]Trains image fusion networks without ground truthsdepends on teacher network quality
Depth-distilled multi-focus [58] image fusionTransfers depth knowledge to improve fusion accuracyDepends on accuracy of depth knowledge

Table 8.

Summary of knowledge-distillation network-related literature [57].

6.1.7 Adaptive mechanism

Over the years, many researchers proposed to incorporate adaptive mechanisms for image fusion networks as they enable the system to adjust its behavior according to varying environmental or operating conditions. For a better fusion effect, Xia et al. [59] introduced a parameter adaptive pulse coupled neural network, and Kong et al. [60] proposed an adaptive normalization mechanism-based fusion method. To extract features and to estimate similarity Leu et al. [61] integrated a new strategy for image retrieval using adaptive features. In brief, this adaptive feature mechanism blends the effectiveness of multiple features and outperforms the single feature retrieval techniques. This can enhance the image fusion precision, reduce the noise interference and enhance the real-time performance of the network. Conversely, there have been various adaptive mechanisms such as adaptive selection loss function, activation and sampling functions proposed to optimize the deep learning neural networks. The advantages and disadvantages of adaptive mechanism-related literature are summarized in Table 9.

LiteratureAdvantagesDisadvantages
Parameter adaptive pulse-coupled neural network [59]Better image fusionSensitive to Parameter settings
Adaptive features and information entropy [60]Effective feature extraction good similarity estimationPoor complex scene handling
Adaptive normalization mechanism-based fusion [61]Injects detailed features into structured fusionIntroduces artifacts or distortion
Adaptive loss, activation and sampling functions [62]Optimizes the performanceMore computational resources
Global group sparse coding [63]Automatic network depth estimation by learning inter-layer connectionsSuffer from sparsity or redundancy issues
Novel network structures [64]Outperforms traditional onesDifficulty to design

Table 9.

Summary of adaptive mechanism network-related literature.

6.1.8 Generative adversarial network

Ma et al. [65] introduced the generative adversarial network (GAN) based generation countermeasure network to the field of image fusion, making the generators synthesize the fused images from visible and thermal cameras with good textures using discriminators. A typical GAN-based image fusion framework is represented as shown in Figure 18. They have also proposed loss of detail and edge enhancement loss to enhance the quality of the minute details and sharpen the edges and features in the target images. Further, they have attempted to improve the GAN-based framework by including the double discriminator conditional generation countermeasure network. Li et al. [66] showcased improvement in capturing the interested region through integrating a multi-scale attention mechanism branch into the GAN-based Image fusion framework. These networks generate fused images with excellent quality, which are useful for entertainment and human perception, However, these networks are not suitable for demanding visual processing tasks like in automotive applications.

Figure 18.

Generative adversarial network based image fusion framework.

6.1.9 Hybrid model

Hybrid models [67] are created by combining the high-performing models in specific scenarios to enhance the performance in generic situations. Hybrid models generally include multi-scale transformation with expression detection, saliency detection, sparse representation or pulse-coupled neural networks. These hybrid models effectively improve the performance of fused models by enhancing the clarity and texture features of the fused images, whereas the model complexity and computational cost should be majorly considered while designing such models.

Advertisement

7. Conclusions

Level 3 and above ADAS and AV systems demand accurate and precise perception of the surrounding environment in order to drive the vehicle autonomously. This can be achieved using multiple sensors of different modalities such as vision, lidars and radars. ADAS and AV systems provided by various OEMs and Tier 1 companies show a lack of performance in extreme weather and lighting conditions, especially dark scenarios, sun glare, rain, fog and snow. The performance can be improved by adding another sensor to existing sensor suite that can provide complementary and redundant information during the extreme environments. The properties and characteristics of Infrared sensors look promising as it is capable of detecting the natural emission of IR radiations and representing that as an image which indicates relative temperature map. Recent advancements in AI paved the way to establish efficient algorithms that can detect and classify objects more accurately irrespective of weather and lighting conditions. Also, it is capable of detecting objects at long distances compared to other sensors. Literature shows that fusing the information from IR sensor with other sensor data results more precisely thereby ensuring the path towards autonomous driving. The research community looks for more datasets available for RGB images for quick and easy deployment of IR in ADAS and AV applications. Integration in the automotive domain is challenging currently as IR camera needs a separate setup for calibration and is an expensive technology due to sensor array. More intense research in IR technology and deep learning models may be highly beneficial to make use of IR cameras effectively in ADAS and AV systems.

Advertisement

Abbreviations

ADASadvanced driver assistance system
AVautonomous vehicle
ADautonomous driving
NHTSANational Highway Traffic Safety Administration
OEMoriginal equipment manufacturer
SAEsociety of automotive engineers
2D/3Dtwo dimenional/3 dimensional
GPSglobal positioning system
IMUinertial measurement unit
IRinfrared
NIRnear infrared
SWIRshort wave infrared
FLIRforward looking infrared
MWIRmid wave infrared
LWIRlong wave infrared
Lidarlight detection and ranging
Radarradio wave detection and ranging
FOVfield of view
AIartificial intelligence
VRUvulnerable road user
ROIregion of interest
BBbounding box
GTground truth
IoUintersection over union
CNNconvolutional neural network
DLdeep learning
GANgenerative adversarial network
RGBIred, green, blue, intensity
VRUvulnerable road user
HOGhistogram of oriented gradients
LBPlocal binary patterns
R-CNNregion-CNN
R-FCNregional-fast convolution network
SSDsingle shot detector
YOLOyou only look once

References

  1. 1. Williams M. The drive for autonomous vehicles: The DARPA grand challenge. Available from: https://www.herox.com/blog/159-the-drive-for-autonomous-vehicles-the-darpa-grand
  2. 2. US Department of Transportation. Standing general order on crash reporting: For incidents involving ADS and level 2 ADAS. Jun 2022. Available from: https://www.nhtsa.gov/laws-regulations/standing-general-order-crash-reporting
  3. 3. Kukkala VK, Tunnell J, Pasricha S, Bradley T. Advanced driver-assistance systems: A path toward autonomous vehicles. IEEE Consumer Electronics Magazine. 2018;7(5):18-25
  4. 4. Moller DP, Haas RE. Advanced driver assistance systems and autonomous driving. In: Guide to Automotive Connectivity and Cybersecurity: Trends, Technologies, Innovations and Applications. Cham: Springer; 2019. pp. 513-580
  5. 5. Rosique F, Navarro PJ, Fernãndez C, Padilla A. A systematic review of perception system and simulators for autonomous vehicles research. Sensors. 2019;19(3):648
  6. 6. Odukha O. How sensor fusion for autonomous cars helps avoid deaths on the road. Intellias; Aug 2023. Available from: https://intellias.com/sensor-fusion-autonomous-cars-helps-avoid-deaths-road/
  7. 7. Ma Y, Wang Z, Yang H, Yang L. Artificial intelligence applications in the development of autonomous vehicles: A survey. IEEE/CAA Journal of Automatica Sinica. 2020;7(2):315-329
  8. 8. Website blog. Available from: https://interplex.com/trends/proliferation-of-sensors-in-next-gen-automobiles-is-raising-the-b [Accessed: September 28, 2023]
  9. 9. Ondrus J, Kolla E, Vertaä P, Åaric Å. How do autonomous cars work? Transportation Research Procedia. 2020;44:226-233
  10. 10. Thakur R. Infrared sensors for autonomous vehicles. Recent Development in Optoelectronic Devices. 29 Aug 2018;84
  11. 11. Mohammed AS, Amamou A, Ayevide FK, Kelouwani S, Agbossou K, Zioui N. The perception system of intelligent ground vehicles in all weather conditions: A systematic literature review. Sensors. 2020;20:6532. DOI: 10.3390/s20226532
  12. 12. ASTM G173-03. Standard Tables for Reference Solar Spectral Irradiances: Direct Normal and Hemispherical on 37deg Tilted Surface. American Society for Testing Materials; 2012. Available from: https://www.astm.org/g0173-03r20.html
  13. 13. Uber Accident. 2018. Available from: https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg
  14. 14. Image Engineering. Challenges for cameras in automotive applications. Feb 2022. Available from: https://www.image-engineering.de/library/blog/articles/1157-challenges-for-cameras-in-automotive-applications
  15. 15. Why ADAS and autonomous vehicles need thermal infrared cameras. 2018. Available from: https://www.flir.com/ [Accessed: September 25, 2023]
  16. 16. Minkina W, Dudzik S. Infrared Thermography: Errors and Uncertainties. Hoboken, New Jersey, United States: John Wiley & Sons; 2009
  17. 17. Vollmer M. Infrared thermal imaging. In: Computer Vision: A Reference Guide. Cham: Springer International Publishing; 2021. pp. 666-670
  18. 18. Teledyne FLIR commercial System. The Ultimate Infrared Handbook for R & D Professionals. 2018. Available from: https://www.flir.com/ [Accessed: September 25, 2023]
  19. 19. Li Y, Moreau J, Ibanez-Guzman J. Emergent visual sensors for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems. 2023;24(5):4716-4737. Available from: https://ieeexplore.ieee.org/document/10092278
  20. 20. Nicolas Pinchon M, Ibn-Khedher OC, Nicolas A, Bernardin F, et al. All-weather vision for automotive safety: Which spectral band?. SIA Vision 2016. In: International Conference Night Drive Tests and Exhibition, Oct 2016, Paris, France.
  21. 21. Hwang S, Park J, Kim N, Choi Y, So Kweon I. Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pp. 1037-1045. Available from: https://soonminhwang.github.io/rgbt-ped-detection/
  22. 22. Nicolas Pinchon M, Ibn-Khedher OC, Nicolas A, Bernardin F, et al. All-weather vision for automotive safety: Which spectral band? In: SIA Vision 2016 - International Conference Night Drive Tests and Exhibition, Paris, France. 2016. p. 7. Available from: https://hal.science/hal-01406023/document
  23. 23. Farooq MA, Shariff W, O’Callaghan D, Merla A, Corcoran P. On the Role of Thermal Imaging in Automotive Applications: A Critical Review. IEEE Access; 2023
  24. 24. Shahriar N. What is convolutional neural network – CNN (Deep Learning). Available from: https://nafizshahriar.medium.com/what-is-convolutional-neural-network-cnn-deep-learning-b3921bdd82d5
  25. 25. Farooq MA, Shariff W, Khan F, Corcoran P, Rotariu C. C3I thermal automotive dataset. IEEE Dataport; 2022. DOI: 10.21227/ jf21-rt22. Available from: https://ieee-dataport.org/documents/c3i-thermal-automotivedataset
  26. 26. Torabi A, Masse G, Bilodeau G-A. An iterative integrated framework for thermal visible image registration, sensor fusion, and people tracking for video surveillance applications. Computer Vision and Image Understanding. 2021;116(2):210-221
  27. 27. Chen Y, Shin H. Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Applied Sciences. 23 Jan 2020;10(3):809
  28. 28. Wu Z et al. A thermal infrared video benchmark for visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, Ohio: IEEE; 2014. pp. 201-208
  29. 29. Krišto M, Ivašić-Kos M. Thermal imaging dataset for person detection. In: 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). FLIR thermal dataset; 20 May 2019. pp. 1126-1131. Available from: https://www.flir.com/oem/adas/adas-dataset-form/
  30. 30. Hwang S, Park J, Kim N, Choi Y, Kweon IS. Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE; 2015. pp. 1037-1045
  31. 31. Farooq MA, Shariff W, Ocallaghan D, Merla A, Corcoran P. On the Role of Thermal Imaging in Automotive Applications: A critical Review. Vol.11. IEEE Access; 2023. pp. 25152-25173. Available from: https://ieeexplore.ieee.org/document/10064306
  32. 32. Solichin A, Harjoko A, Eko A. A survey of pedestrian detection in video. International Journal of Advanced Computer Science and Applications. 2014:5. DOI: 10.14569/IJACSA.2014.051007. Available from: https://thesai.org/Publications/ViewPaper?Volume=5&Issue=10&Code=ijacsa&SerialNo=7
  33. 33. Chavez-Garcia RO, Aycard O. Multiple sensor fusion and classification for moving object detection and tracking. IEEE Transactions on Intelligent Transportation Systems. 2016;17:525-534. Available from: https://ieeexplore.ieee.org/document/7283636
  34. 34. Wang X, Han TX, Yan S. An HOG-LBP human detector with partial occlusion handling. In: Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September-2 October 2009. Japan: IEEE; 2009. pp. 32-39. Available from: https://ieeexplore.ieee.org/document/5459207
  35. 35. Ahmed S, Huda MN, Rajbhandari S, Saha C, Elshaw M, Kanarachos S. Pedestrian and cyclist detection and intent estimation for autonomous vehicles: A survey. Applied Sciences. 2019;9:2335. DOI: 10.3390/app9112335
  36. 36. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the EEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24-27 June 2014. Columbus, Ohio: IEEE; 2014. pp. 580-587
  37. 37. Dai J, Li Y, He K, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the IEEE conference on Advances in Neural Information Processing, Barcelona, Spain. Spain: IEEE; 2016. pp. 379-387
  38. 38. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015;39:1137-1149
  39. 39. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. SSD: Single shot multibox detector. In: European Conference on Computer Vision. Cham, Switzerland: Springer; 2016. pp. 21-37
  40. 40. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. arXiv 2015, arXiv:1506.02640
  41. 41. Geronimo D, Lopez AM, Sappa AD, Graf T. Survey of pedestrian detection for advanced driver assistance systems. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010;32:1239-1258
  42. 42. Enzweiler M, Gavrila DM. Monocular pedestrian detection: Survey and experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009;31:2179-2195
  43. 43. Dolã P, Wojek C, Schiele B, Perona P. Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011
  44. 44. Hou YL, Song Y, Hao X, Shen Y, Qian M. Multispectral Pedestrian Detection Based on Deep Convolutional Neural Networks. In: Proceedings of the IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xiamen, China. 2017. pp. 22-25
  45. 45. Wagner J, Fischer V, Herman M. Multispectral pedestrian detection using deep fusion convolutional neural networks. In: Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium. Belgium: ESANN; 2016. pp. 27-29
  46. 46. Du X, El-Khamy M, Lee J, Davis L. Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In: Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA. CA, USA: IEEE; 2017. pp. 953-961
  47. 47. Li H, Wu XJ. DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing. 2018;28:2614-2623
  48. 48. Tang L, Yuan J, Ma J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion. 2022;82:28-42
  49. 49. Xu H, Ma J, Jiang J, Guo X, Ling H. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020;44:502-518
  50. 50. Tang W, He F, Liu Y. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Transactions on Multimedia. 2023;25:5413-5428. DOI: 10.1109/TMM.2022.3192661
  51. 51. Hui L, Xjw A, Jk B. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion. 2021;73:72-86
  52. 52. Yoon S, Cho J. Deep multimodal detection in reduced visibility using thermal depth estimation for autonomous driving. Sensors. 2022;22:5084. DOI: 10.3390/s22145084
  53. 53. Chen Y, Shin H. Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Applied Sciences. 2020;10:809. DOI: 10.3390/app10030809
  54. 54. Chen P, Liu S, Zhao H, Jia J. Distilling knowledge via knowledge review. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA. IEEE; 2021. pp. 5006-5015
  55. 55. Xiao W, Zhang Y, Wang H, Li F, Jin H. Heterogeneous knowledge distillation for simultaneous infrared-visible image fusion and super-resolution. IEEE Transactions on Instrumentation and Measurement. 2022;71:1-15
  56. 56. Liu X, Hirota K, Jia Z, Dai Y. A multi-autoencoder fusion network guided by perceptual distillation. Information Sciences. 2022;606:1-20
  57. 57. Zhao Z, Su S, Wei J, Tong X, Gao W. Lightweight infrared and visible image fusion via adaptive DenseNet with knowledge distillation. Electronics. 2023;12:2773. DOI: 10.3390/electronics12132773
  58. 58. Mi J, Wang L, Liu Y, Zhang J. KDE-GAN: A multimodal medical image-fusion model based on knowledge distillation and explainable AI modules. Computers in Biology and Medicine. 2022;151:106273
  59. 59. Xia J, Lu Y, Tan L. Research of multimodal medical image fusion based on parameter-adaptive pulse-coupled neural network and convolutional sparse representation. Computational and Mathematical Methods in Medicine. 2020;2020:3290136
  60. 60. Lu X, Zhang L, Niu L, Chen Q , Wang J. A novel adaptive feature fusion strategy for image retrieval. Entropy. 2021;23:1670
  61. 61. Wang L, Hu Z, Kong Q , Qi Q , Liao Q. Infrared and visible image fusion via attention-based adaptive feature fusion. Entropy. 2023;25:407
  62. 62. Zeng S, Zhang Z, Zou Q. Adaptive deep neural networks methods for high-dimensional partial differential equations. Journal of Computational Physics. 2022;463:111232
  63. 63. Yuan J, Pan F, Zhou C, Qin T, Liu TY. Learning Structures for deep neural networks. 27 May 2021. arXiv arXiv:2105.13905
  64. 64. Li H, Yang Y, Chen D, Lin Z. Optimization algorithm inspired deep neural network structure design. In: Asian Conference on Machine Learning. PMLR; 4 Nov 2018. pp. 614-629. arXiv 2018, arXiv:1810.01638
  65. 65. Ma J, Yu W, Liang P, Li C, Jiang J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information Fusion. 2019;48:11-26
  66. 66. Li J, Huo H, Li C, Wang R, Feng Q. AttentionFGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Transactions on Multimedia. 2021;23:1383-1396
  67. 67. Ma W, Wang K, Li J, Yang SX, Li J, Song L, et al. Infrared and visible image fusion technology and application: A review. Sensors. 2023;23:599. DOI: 10.3390/s23020599

Written By

Suganthi Srinivasan, Rakesh Rajegowda and Eshwar Udhayakumar

Submitted: 05 October 2023 Reviewed: 11 October 2023 Published: 07 December 2023