Alternative Position Estimation Systems for Micro Air Vehicles Alternative Position Estimation Systems for Micro Air Vehicles

Micro air vehicles (MAVs) is a technology that is becoming more and more important and popular nowadays. It is used as a tool to deal with different tasks that were not possible in the past. For most MAV models, the GPS sensor is the only way of estimating its pose in the environment. However, besides the fact of not having a secondary position estimation system besides the GPS, this is also risky because the GPS may fail like any other sensor. To overcome this weakness and make the MAVs more robust to autonomous tasks, the research community proposed many different localization systems for different con- straints. In this chapter, the most popular, recent, and important MAV localization systems are reviewed, as well as the promising future works in this field.


Introduction
One of the first uses of micro air vehicles (MAVs) was during the World War I [1]. Since then, MAVs have been considered as a promising technology, and nowadays they are being used in several different tasks, such as agriculture [2], patrolling [3], mapping [4], and delivering [5]. Compared to conventional human-crewed aerial vehicles, MAVs are a low-cost and entirely suitable alternative for repetitive or high-precision demanding tasks. Besides, they are also recommended for low-altitude flights and for those that demand a high range of maneuvers.
The estimation of the MAVs' position, i.e., its localization in the world, is the main common requirement between all the before mentioned tasks, even if they would be addressed by other types of mobile robots rather than MAVs. For such complex tasks, localization and navigation are fundamental capabilities that allow MAVs to accomplish their mission [6]. The localization for MAVs is usually solved by the global positioning system (GPS) [7], in which an embedded GPS sensor communicates with different satellites that are orbiting the Earth to estimate the position. Other MAV models rely on different ways of estimating their position, such as inertial sensors or visual odometry algorithms. However, most of these models do not have a second position estimation system in addition to the primary system, i.e., the GPS [7]. Therefore, the MAVs that depend exclusively on GPS to estimate their position are more likely to fail on their tasks or missions; once like any other sensor, the GPS might fail, and they do not have a redundant position estimation system.
Even though it is widely used in different situations and for distinct goals, the GPS sensor is vulnerable to some problems [8,9]. The amount of satellites that are available to establish a communication with the GPS influences the position estimation certainty, as well as the signal quality between them. The signal might be affected by the weather, such as cloudy and rainy days, and by obstacles, like high buildings or hills. Hence, the higher is the number of connected satellites and the stronger is the signal, the lower is the GPS position estimation error. In addition to this GPS weakness, there is another problem that might disturb the GPS position estimation, the so-called Jamer guns. While the GPS sensor is reading the satellite signal to estimate the position, these guns jam the signal, and hence, the estimation becomes unreliable [10,11].
The mobile robotics research community has investigated the MAV position estimation problem, and valuable works have been proposed. In general, it is addressed by them as the localization problem from the mobile robotics field, in which the goal is to estimate the pose of a robot, based on readings of its sensors, in an a priori known map [12]. The works proposed by the community covers a considerable variety of approaches, in which the main differences are the kind of data used to represent the environment and the technique used to estimate the pose. Despite this diversity, one characteristic that most of them share is the use of visual data from cameras to estimate the localization. This choice is made due to the advantages of cameras to deal with this problem in comparison to the other sensors, such as the low weight for MAVs, the distinct information from one image (color, depth, intensity, etc.), and the longdistance range for the readings.
This chapter covers the most important proposed works that aimed to deal with visual MAV localization problem. As aforementioned, there are two main topics that are worthy to be covered when presenting this kind of works, which are the data used as a map and how the estimation is calculated. Therefore, this chapter first presents a discussion about different maps used so far, followed by the review of the localization itself. In addition to detailing and comparing them, it also presents what the next trends or future work for this problem are.

Localization problem
Mobile robots aiming to perform tasks without human interference, i.e., autonomously, must know their pose within the environment. The same necessity applies for MAVs that have only GPS sensor as position estimation. Estimating the robot's pose would be a simple task, but only if all the sensors of the robot were perfect and the environment fully static. Given that this scenario is not realistic, in which the sensor readings are not precise and many agents are moving through the environment, the difficulty level of the localization problem increases, and hence, it is necessary to estimate the robot's pose.
Despite its difficulty level, localization is a fundamental problem in the mobile robotics field [13]. It is defined as the robot's pose estimation relative to a previously known environmental map [12]. Even though its definition is simple, this problem has two main variations: local and global estimations. In the former, the initial robot pose in the map is known, and the local localization approach only tracks the robot as it moves through the environment. The error of the first pose estimation is low, and the goal is to keep it low using the sensor readings and the motion information from the robot. In contrast to the former variation, the second one is significantly harder. In this case, the initial robot pose is unknown, and hence, the error of the pose estimation is originally high. Instead of considering just a small part of the map at the beginning, in the global localization, the whole map must be considered for the estimation since the initial pose is unknown [14]. Figure 1 illustrates the differences between local and global localization. The global localization is illustrated in Figure 1(a), in which the error estimation is high at the begging, and the goal is to reduce it as the robot moves through the environment. The opposite happens in Figure 1(b), which depicts the local localization. The error estimation begins considerably low, and even though it increases through time, as well as the global localization, the goal is to reduce it.
The most popular approaches that deal with localization in the mobile robotics field are grouped either as probabilistic or as deterministic. In the first group, there are two main approaches that are worth it to be mentioned, Kalman filter [15] and particle filter (Monte Carlo) [16]. Even though both implement the Bayes filter, each one has its specific advantages, and therefore, they are suitable for different situations and constraints. On the other hand, the most popular approach for the second group is based on interval analysis, and the estimation is defined through boxes that must be minimized [17]. As the goal of this chapter is not to go deep into these approaches, the reader is invited to look at the references for more details about them.
Independent of the approach used to deal with the localization problem, all of them have the same characteristics: as input, they require an environment representation, a sensor to read the environment, and odometry data; and as output, the robot's position in the environment representation that was estimated, as shown by Figure 2. Hence, localization systems try to find the best pose in the map that fits both the sensor reading and the odometry information. The best the system is, the more accurate is the pose estimation. Figure 2 presents an example of using Lidar and 2D map, but it is important to highlight that the same idea also applies to other sensors or types of maps.
Even though the setup of the mobile robot localization problem seems quite simple, with input and output well defined, the difficulty level is considerably high. The robot's pose is not sensed directly, it must be estimated, and that is where the problem lies. Usually, the robot's sensors, both to read the environment and to measure the odometry, are noisy, and hence, the data that they provide does not correctly represent the real world. In addition to that, some types of robots have restrictions about which kind of sensor they support, and they can not have the  best sensor for their tasks. In MAVs, for example, a camera is the most popular sensor used for this purpose, since they are smaller, lighter, and cheaper than most range-finder lasers, for instance. However, using images to estimate the odometry information is not ideal, although there are algorithms that compute this estimation.
The localization approaches overcome the noise data problem modeling the error of the sensors. Besides, since just one reading is insufficient to the pose estimation, these approaches also have to integrate the data over time to reduce the error estimation. In environments that have different regions that look alike, such as a building with many corridors and doors, it is quite impossible to estimate the robot's pose considering just one reading. For example, imagine that at some point, the robot is observing a door after having observed a wall and a frame. Then, instead of searches for all the spots in the map that contain a door, the localization system searches for spots that also matches with the wall and the frame. In this way, the past observations are also considered when estimating the robot's pose.
Despite the generic localization problem explanation presented so far, the research community has explored the UAV localization problem throughout the years, and many different approaches have been proposed. The most significant difference between them relies on the map representation and also on the method that they use to compare the sensor readings and the small parts of the map. The next section covers the most popular proposed approaches aiming to deal with this problem and how they differ from each other.

Alternatives for UAV localization systems
In this section, we review the most important works proposed to deal with UAV localization problem. First, this section presents an analysis of the environment representation that these works used as a map, their advantages and disadvantages. Second, what these works use to compare the sensor readings with different parts of the map. Then, Table 1 in this section introduces other qualitative comparisons between these works, such as the single or multi MAV pose estimation and indoor or outdoor localization.

Environment representation
The first environment representation presented here is the 2D satellite image map. In general, it is downloaded beforehand from any imagery map source, either entirely or divided into many small images to be stitched later, and then used by the localization approaches to the pose estimation [18,21,26,31]. Even though the research community does not that much explore it, it is also possible to fly the MAV over the region of interest and build the 2D image from the environment, to then use it as a map for the localization estimation. Also, another common choice between the approaches that adopt this kind of map is to point the MAV camera downwards. Then, the MAVs images are compared to different patches from the satellite image map by different comparison methods. The advantages of using the 2D satellite image as a map are the free access to this kind of data through Google Maps or any geographical imagery system (GIS), the excellent representation of the environment by colorful images, and the world coverage. Usually, a GIS also provides the geographic coordinates of satellite images, and then, it is possible to infer the latitude and longitude of each pixel of that image. Therefore, at the same time that a localization system estimates in which pixel from the satellite image map the MAV's pose is, it also estimates the pose in relation to the world, due to the latitude and longitude information in the pixel. In contrast, the disadvantages of such kind of map are the limited point of view (2D) and that some places of the world are not often visited by satellites, and hence, the images are not updated. The comparison methods from the works mentioned above are proposed aiming to be robust against such differences between the outdated 2D satellite images and the MAV image [18,21,26,31].
The disadvantage of a limited point of view from the 2D satellite images motivated the researches to investigate the benefits of 3D maps [19,20,23,30]. The authors argue that by using 3D maps, it is possible to take advantage of the environment structures to estimate the localization, besides the color of the map. Usually, the localization estimation is made based on the 3D structure alignment or even the point cloud matching. For the 3D map case, the MAV camera can be set in different angles, exploring different sights. As well as the 2D satellite images, this kind of 3D representation can be either built right before the localization estimation, as done by the works [20,23], or downloaded from a GIS, as the case of [19]. Despite these advantages in comparison to the 2D maps, 3D maps generally allocate more computational resources than the 2D one, both to be stored and manipulated, and is not as easy to be found as the 2D maps, what limits the places that it is possible to estimate the MAV's pose.
It is important to highlight that even though flying the MAV before the localization estimation to build the map provides a certainly updated map, for both 2D and 3D ones, this option presents a trade-off. First, a human must pilot the MAV to gather 2D or 3D data from the environment, to then submit it to a mapping approach. Second, it demands more time to start the localization algorithm, since flying the MAV over an area takes more time than downloading a map from a GIS.
In contrast to these two types of maps that represent the whole environment, other map options are more straightforward in terms of details and what is represented. Instead of having a map illustrating all the obstacles, free spaces, and etc., these simple maps only show the position of a few markers. In this case, the idea is to measure the distance between the MAV and all the markers within the map and then estimate the MAV's pose. The type of the markers also varies considerably, such as the case of WLAN access points [28], which are fixed in some spots of the environment and whose received signal strength is measured as part of the localization estimation, and ultraviolet LED markers [24], which emit light in frequencies that are less common in nature than the visible light or infrared radiation. Then it increases the precision of the distance measurement. In this work, the LED markers are not fixed, they are embedded in every MAVs, and they have a mutual relative localization [24]. In more details, they estimate a MAV's pose to another MAV, instead of the global coordinate system. Another marker that it is worth to be mention is the use of tether [27]. The tether reel is fixed in a specific position, and the tether is attached to the MAV. For this case, the MAV is localized to the tether reel by using mechanics model. In general, the use of markers map is adopted for indoor localization, since the sensors that measure the markers have a more limited range than cameras for the 2D or 3D maps presented earlier. The work that relies on ultraviolet LED markers is one exception for this indoor limitation, but this occurs because the MAV's pose is the estimation concerning other MAVs, not to the environment.
Besides the maps presented in this section, other types were tested in the MAV localization problem by the research community. However, they are really specific for a kind of sensor or configuration, and our goal here is to cover the most popular and recent ones. About the types of maps presented here, each one has its advantages and disadvantages, as well as its specific constraints that fit better in some situations. For instance, the 2D satellite image is available online and free but is not suitable for indoor localization. On the other hand, markers map is the option that is most used for indoor localization, but usually, it requires many markers spread through the environment, and it has a short range to be detected. Given that the map of a MAV localization system is essential for the estimation, the type of the MAV, the environment, and the embedded sensor must be taken into account to choose the type of map that fits the constraints better.

Localization estimation methods
In addition to the environment representation, i.e., the map, the methods that estimate the MAV's pose also play an important role in localization systems. Usually, they receive as input the map of the environment and the sensor reading, and then the goal is to find the part of the map that best matches with the sensor reading. In Figure 2, the localization estimation method is within the localization system, Figure 2(b), and together with the motion model, it is possible to properly estimate the MAV's pose, Figure 2(c).
Different from the previous section, which introduced the maps used by the MAV localization systems in big groups, such as the 2D satellite images or the marker maps, the localization estimation methods do not share the same similarity between themselves. Then, here they are discussed individually, and as well as in the previous section, their advantages and disadvantages are highlighted.
It is natural that the works that rely on 2D satellite image as a map have an estimation method that is based on image comparison, since their sensor readings are images. The general idea is to compare every MAV image with different patches from different poses within the map, and the most similar one probably represents the MAV's pose in the map. To extract the so-called patches from the map, some global localization works use the Monte Carlo algorithm to extract patches from the whole map [18,31], whereas the local localization ones, given that the MAV's initial pose is known, only extract a patch from the initial pose and keep extracting them as it moves through the environment [21,26]. When comparing the MAV images against the patches, each Monte Carlo-based work proposed a novel measurement model, in which one has a new image descriptor called abBRIEF to robustly compute an image signature to the comparison [31], and the other used SURF descriptors [32] and machine learning to compare MAV images and the patches [18]. Both approaches can compute the similarity between all pairs of MAV image and patch and find the most similar pair. On the other hand, the other local localization approaches have a reduced search space, since they know the initial patch from the satellite image. In this case, the image comparison is mainly made by two methods: either by template matching [22], in which a pixel-by-pixel comparison is performed between two images, or by feature matching [26], that involves the detection, extraction and matching of the features from the two images. In general, given that all these works rely on an image, they use either features or image descriptors to represent the images and then perform the comparison. In their proposal, they aim to overcome the problem of illumination and color changes that occur when dealing with images and outdoor environment, as the case of MAV localization systems.
The alignment technique is also used by other approaches to estimate the MAV's pose, even the ones that rely on 3D maps. In [23], the alignment is made considering the 2D keypoints from the MAV image and the 3D landmarks of the 3D map. To do so, the authors cluster the landmarks into visual words to speed up the matching and alignment with a nearest neighbour search. This 2D to 3D alignment, or transformation, is also applied in another work [19]. Given that the map in this work is a 3D representation of the environment, but the MAV image is a simple 2D RGB image, they have to transform the MAV image into a 3D data, to then align the lines and edges detected in both. As these both works perform local localization, they have an initial reduced search space, which helps them to have a good alignment at the beginning.
Besides the estimation methods presented so far, other ones are even more specific. In [25], for instance, a robust and quick response landing pattern is designed to be visually detected through images and then assist the MAV to its landing. In such a case, the pattern is the map, and the computer vision method proposed by the authors of this work can detect the scale of the map and then estimate the MAV localization. In [24] a markers detection-based approach is also proposed to estimate the MAV's pose. However, in contrast to [25], in [24] the markers are ultraviolet LED that are embedded in the MAVs. Hence, the estimation, in this case, is a mutual one, i.e. one MAV estimates its pose in relation to another one and vice versa. First, their algorithm detects the size of the markers in the image, and then it estimates the internal distance between a pair of markers. Therefore, they can calculate the distance between two MAVs and their pose. In addition to that, in [27] tether-based feedback and inertia sensing are used to estimate the MAV's pose. In more details, the length, azimuth, and elevation angle of the tether are the input for a mechanics model that calculates the absolutely straight tether between the origin and the MAV. The work [28] also relies on a non-popular sensor to estimate the MAV's pose, and the goal is to detect access points (AP) and measure the received signal strength. Then, they can estimate the MAV's pose relative to the APs that have their positions well defined in the map. A similar approach is proposed in [30], in which the MAV's pose is estimated in an urban environment by the transmission of beacons. They are located in different buildings, and they provide a local frame of reference, supporting the MAVs for their location estimation by providing the details of the area and height of the buildings. It is also possible to say that sonar is another type of sensor not easily found embedded on MAVs, and this is the sensor used in [29]. To estimate the MAV's pose, the authors proposed a multi-ray model based on the four sonars sensors embedded in a MAV. This model approximates a beam pattern accurately, and it does not require high computational power.
In general, the localization estimation methods are responsible for comparing the sensor reading and a sample of the map. In another way, it is also known as a transformation from the local coordinate system, i.e., the robot sensor reading, to a global one, i.e., the map. Given the works presented in this section, we can notice that sometimes the sensor reading and the map are different data, such as 2D images from a regular RGB camera and a 3D map. Also, in some cases, these methods have to estimate the MAV's pose based on an outdated map. Because of that, they have to be robust against differences between the real and the mapped environment, and even though they might not represent the same area, the pose should be estimated. Table 1 compiles the information presented in Section 2.2. This review shows how the type of map and the sensor changes from system to system. Even though some approaches seems very similar, such as [18,31], they are still different. Hence, they have their own advantages and disadvantages.

The future of UAV localization systems
Despite the great effort of the research community to deal with the MAV localization problem, there is no solution that works for all the possible environments and constraints. This problem varies considerably, such as the environment, which can be either indoor or outdoor, the knowledge about the initial position, whether it is known or not, and the amount of MAVs that are being localized, which can be single or multiple MAVs. Despite these difficulties, there is also the map issue, which is caused either by a low updating frequency, such as a satellite that takes some time to revisit a specific area, or by a quick environmental change, like a snowy day that makes the whole environment become white.
To deal with the problem of the type of environment, the approaches that seem more likely to work are the one that recognizes the objects within the environment and the other that deals with 3D structures. Hence, independently of being indoor or outdoor, it would be possible to recognize objects or detect the environment's shape in both scenarios, to then continue the pose estimation calculation. On the other hand, the problem of the outdated map could be overcome by using deep learning, which provides robust solutions for different seasons or illumination changes in images [33]. Another solution that is possible through the use of deep learning is to teach a net to differentiate roads, buildings, and forest, to then segment both the map and the MAV sensor readings. Hence, instead of, for instance, matching the color of different pixels from MAV images and patches from the 2D satellite image map, the matching would be done considering the classes of the environment, avoiding the problems caused by color or illumination changes.
Due to the fact that MAV is a certainly popular technology and that it is being used in many different tasks, another promising matter that should be investigated is the localization system for multiple MAVs. As there will be even more MAVs flying and cooperating in the future, it is essential to have localization approaches that take advantage of the high amount of MAVs available in the air and, therefore, improve the pose estimation.