Abstract
The sensitivity of global navigation satellite systems to disruptions precludes their use in conditions of armed conflict with an opponent possessing comparable technical capabilities. In military unmanned aerial vehicles (UAVs) the aim is to obtain navigational data to determine the location and correction of flight routes by means of other types of navigational systems. To correct the position of an UAV relative to a given trajectory, the systems that associate reference terrain maps with image information can be used. Over the last dozen or so years, new, effective algorithms for matching digital images have been developed. The results of their performance effectiveness are based on images that are fragments taken from source files, and therefore their qualitatively identical counterparts exist in the reference images. However, the differences between the reference image stored in the memory of navigation system and the image recorded by the sensor can be significant. In this paper modern methods of image registration and matching to UAV position refinement are compared, and adaptation of available methods to the operating conditions of the UAV navigation system is discussed.
Keywords
- digital image processing
- image matching
- terrain-aided navigation
- unmanned aerial vehicle
- cruise missile
1. Introduction
Global navigation satellite systems are widely used in both civil and military technology areas. The advantage of such systems is very high accuracy in determining the coordinates, however, the possibility of easy interference precludes their use in conditions of armed conflict with an opponent equipped with comparable technical capabilities. In the case of military autonomous unmanned aerial vehicles (UAVs), in particular cruise missiles (CM), the aim is therefore to determine navigation data for specifying the position and correcting the flight paths by means of other types of navigation and self-guidance systems.
Such systems are usually based on inertial navigation systems (INS) which use accelerometers, angular rate gyroscopes and magnetometers to provide relatively accurate tracking of an object’s position and orientation in space. However, they are exposed to drift and systematic errors of sensors, hence the divergence between the actual and the measured position of the object is constantly increasing with time. This results in a significant navigational error.
Therefore, two types of systems designed to correct the position of an object in relation to a given trajectory are normally used in the solutions of the UAV/CM navigation and self-guidance systems. The first group contains systems whose task is to determine the position on the basis of data obtained from radio altimeters, related to reference height maps. Such systems include, for example: TERCOM (terrain contour matching), used in Tomahawk cruise missiles, SITAN (Sandia inertial terrain-aided navigation), using terrain gradients as input for the modified extended Kalman filter (EKF) estimating the position of the object, and VATAN (Viterbi-algorithm terrain-aided navigation), a version of the system based on the Viterbi algorithm and characterised – in relation to the SITAN system – with lower mean square error of position estimation [1, 2, 3, 4, 5]. The main disadvantage of these solutions is the active operation of measuring devices, which reveals the position of the object in space and eliminates the advantages associated with the use of a passive (and therefore undetectable) inertial navigation system. The second group consists of systems associating reference terrain maps with image information obtained by means of visible, residual or infrared light cameras [6, 7]. Such systems include the American DSMAC (digital scene matching area correlator), also used in Tomahawk missiles [8, 9], and its Russian counterpart used in Kalibr (aka Club) missiles. Their advantage is both the accuracy of positioning and the secrecy (understood as passivity) of operation.
Due to the dynamic development of UAVs/CMs equipped with navigation systems operating independently of satellite systems and a number of problems associated with the implementation of the discussed issue, the assessment on the sensitivity of the selected methods to environmental conditions and constraints in the measurement systems, which often negatively affect the results obtained, has been carried out. The essence of the work is to consider issues related to the processing of image information obtained from optical sensors carried by UAV/CM and its association with terrain reference images. In particular, issues of the correctness of image data matching and the limitations of the possibilities of their similarities’ assessment are considered. The article compares modern image matching methods assuming real conditions for obtaining information. The main goal set by the authors is to verify selected algorithms, identify the key aspects determining the effectiveness of their operation and indicate potential directions of their development.
2. State of the art
The operation of classic object identification algorithms, indicating the similarities between the recorded and reference images (the so-called patterns), is mainly based on the use of correlation methods. These algorithms, although effectively implemented in solutions to typical technical problems, are insufficiently effective in the case of topographic navigation. It is related to, inter alia, the limitations and conditions in the measurement system, environmental conditions and characteristics of the detected objects, which have a strong negative impact on the obtained correlation results. This disqualifies the possibility of their direct use in the tasks of matching reference terrain maps with the acquired image information.
A particularly significant obstacle is the fact that the sensory elements of navigation systems installed on UAV/CM record image data in various environmental and lighting conditions [10]. Frequently, reference data of high informative value, due to various conditions, constitute a pattern of little use or even lead to incorrect results. This is the case, for example, when the reconnaissance is conducted in different weather conditions than those in which the UAV/CM mission takes place (Figure 1). Therefore, image feature matching becomes a complex issue. The conditions related to the image recording parameters, e.g. variable view angle, maintaining scale or using various types of sensors, turn out to be equally important.

Figure 1.
Images of the same fragment of the Earth’s surface taken under different weather and lighting conditions.
Image matching methods began to be strongly developed with the dissemination of digital image in technology. Initially, the classical Fourier and correlation methods were used. However, these methods did not allow for successful multi-modal, multi-temporal, and multi-perspective matching of different images. The taxonomy of the classical methods used in the image matching process was presented in the early 1990s [11]. The image feature space, considered as a source of information necessary for image matching, was defined and local variations in the image were identified as the greatest difficulty in the matching process. In the 21st century, further development of methods based on the features of the image continued [12]. It should be emphasised that most image matching methods based on image features include four consecutive basic steps: feature detection, feature matching, model estimation of mutual image transformation and final transformation of the analysed image. These methods became an alternative to the correlation and Fourier methods. For over a dozen years, new, effective algorithms for processing and matching digital images have been developed, using statistical methods based on matching local features in images [11, 12, 13, 14], cf. Figure 2. Their authors point to the greater invariance of the proposed algorithms to perspective distortions, rotation, translation, scaling and lighting changes. Given their high reliability under static conditions, as well as their low sensitivity to changes in the optical system’s position, including translation, orientation and scale, it is justified to conduct studies in order to verify their usefulness and effectiveness. The paper focuses on modern image matching algorithms, which can potentially be used in topographic navigation issues. It should be stressed that the problem is completely different in the indicated context. This is due to the fact that although the matched images represent the same area of the terrain, the manner and time of the recording differ significantly from each other. This is not a typical application of these algorithms, hence a limited effectiveness of their operation can be expected.

Figure 2.
Classification of the selected methods of image feature matching.
The common feature of all methods is the use of the so-called scale space described in [15], allowing the decimation of image data and examination of similarities between images of different scales. A significant step in the development of image matching methods based on local features was the development of the Scale-Invariant Feature Transform (SIFT) algorithm [16]. In this algorithm, the characteristics are selected locally and their position does not change while the image is scaled. Their indication is done by determining the local extremes of function
where
A more numerically efficient version of the SIFT algorithm, called Speeded-Up Robust Features (SURF) is based on the so-called integral images [17]. Both methods use the basic processing steps described in [12]. Additionally, in order to ensure the effectiveness of feature detection in images of different resolutions, a scale space, consisting of octaves which represent the series of responses of a convolutional filter with a variable size, was introduced.
Simply put, the detection of the characteristic point is based on the use of the determinant of a Hessian matrix
where
The determinant of the Hessian matrix after approximation using box filters and the Frobenius norm is given as
After detecting the local extremes of
In 2011, an alternative method to SIFT and SURF, Oriented FAST and Rotated BRIEF (ORB), was proposed [19]. The method was based on the modified Features from Accelerated Segment Test (FAST) detector [20, 21], enabling corner and edge detection, and a modified Binary Robust Independent Elementary Features (BRIEF) descriptor [22]. This approach involves changing the scale of the image on the basis of blurring with an increasing value of Gaussian filter. Despite the noise reduction and enhancing the uniformity of areas interpreted by human beings as unique (e.g. surface of the lake, wall of a building, shape of a vehicle, etc.), it causes blurring of their edges. This often leads to the inability to indicate the boundaries between areas and to define characteristic points in their neighbourhood.
The solution to this problem was proposed in the KAZE method (Japanese for “wind”) [23]. Unlike the SIFT and SURF methods, which use the Gaussian function causing isotropic diffusion of luminance to generalise the image, in the KAZE method the generalisation is based on nonlinear diffusion in consecutive octaves of the scale [24]. The anisotropic image blurring in this method depends on the local luminance distribution. Nonlinear diffusion can be presented in the following equation:
The blur intensity can be adapted by the introduced conductivity function
where
This function allows for blurring the image while maintaining the edges of structures. As a result, more features can be detected at different image scales. However, it involves the use of a gradient, which in the case of intense image disturbance, e.g. in the form of a shadow, may cause an unfavourable (due to the subsequent detection of features) distribution of diffusion in the image.
An important stage of the considered methods is the description of a characteristic point by means of a vector containing information about its surroundings. The SIFT method uses a luminance gradient and in the SURF method the image response to horizontally and vertically oriented Haar wavelet is applied. In general, around the characteristic point in the area with a defined radius dependent on the
where
The SIFT method creates thereunder a gradient histogram that sums up the determined values in four cells. In analogous cells, according to the SURF method, the responses to Haar wavelets distributed along the radii in the neighbourhood of the point with an interval of
where
In the KAZE method the procedure is similar as for the SURF method with the difference that the first order derivatives from the image function are used. The point description operation is performed for all levels in the adopted scale space, thereby creating a pyramid of vectors assigned to subsequent levels containing an increasingly generalised image.
The Maximally Stable Extremal Regions (MSER) method introduced in [27] has a different approach to the detection and description of local features. In this method, regions (shapes), referred to as maximally stable, are selected as the characteristics of the image. The image in this method is treated as a function
where
Regions (areas, shapes) with a specific (typically average) luminance level can be determined in the image. Region
In the feature description stage, a vector using image moments is determined for each region. Based on the moments
The orientation
The use of moments and the centre of gravity is also a feature of ORB method which uses machine learning approach for corner detection. After their detection, based on the image moments, the centre of gravity
where
On the basis of the corner’s position and centre of gravity, the orientation of the feature is determined as shown in the equation:
The feature description step uses the assigned orientation to complete the binary BRIEF descriptor [22], with the condition of verifying the belonging of the point
The matrix
The common element for the described methods is the stage of comparing the distinguished features detected on the reference and registered images. It is of fundamental importance in the field of absolute terrain position designation, because the location of the matched features is the source of determining the matrix of mutual image transformation. In this comparison, vectors describing the features in a given method, e.g. feature metric and its orientation, are taken into account.
The determination of the similarity between the feature description vectors
The third frequently used norm for binary vectors is the Hamming distance given as:
Another approach for matching two features is the nearest neighbour algorithm based on the ratio of the distances
The final step in all the discussed methods is the statistical verification of a set of matched local features. It happens that, as a result of the initial comparison of the vectors which describe the features, mismatches resulting from the acquisition conditions described above are indicated. Therefore, after the pre-processing step, additional criteria are applied to distinguish matches from mismatches, e.g. based on the Random Sample Consensus (RANSAC) method [29]. This method allows for the estimation of a mathematical model describing the location of local features in the image provided that most of the matched points fit into this model (with the assumed maximum error). Then those points that do not fit into the estimated model are discarded in the step of determining the image transformation matrix.
3. Problem formulation
The following set is considered:
Elements of
The mean square error is determined by the formula:
in which
where
in which
For such defined initial conditions, the best match of the subsequent elements of the set
The term best match is understood as defining certain vectors
4. Performance analysis
In order to verify the sensitivity of the selected methods to limitations in the measurement system and environmental changes, a number of studies taking into account the actual conditions of obtaining information were conducted. Due to their difficult nature, they were performed with the use of computer simulation methods. The research was carried out in three stages. In the first stage, a detailed analysis of the test sets, using the values of the similarity indexes of the elements defined in the article, was completed. On the basis of the performed tests, special cases were selected and subjected to detailed analysis. In the further part of the study, the methods and verification of the correctness of image data matching in the scope of mutual matching of the sets presented to the analysis were compared. Finally, the influence of changes in the contrast of the acquired image on the number of features detected and the subsequent matching results was examined.
4.1 Analysis of test set elements
For the initial numerical tests, the test set

Figure 3.
Test image set I: (a) reference image I0, (b)-(d) test images I1,I2,I3 (source: Google Earth).
The object is located in a natural environment characteristic for tundra and therefore distinguished by a rocky ground with a very low plant cover, dominated by mosses and lichens. Image
0 | 2.23E03 | 1.46E04 | 1.19E04 | 0 | 2.99E03 | |
14.65 | 6.47 | 7.36 | 13.37 | |||
1 | 0.4373 | 0.0742 | 0.0755 | 1 | 0.3378 |
Table 1.
Similarity index measures for selected pairs of the set
Based on the obtained results, it can be shown that the elements constituting the test set
It should be noted that
4.2 Comparison of the selected methods of image feature matching
The test set

Figure 4.
Image pair I0I1 matching result for: (a) SURF, (b) KAZE, and (c) MSER method.

Figure 5.
Image pair I0I1 matching result for ORB method.

Figure 6.
Image pair I0I2 matching result for: (a) SURF, (b) KAZE, and (c) MSER method.

Figure 7.
Image pair I0I3 matching result for: (a) SURF, (b) KAZE, and (c) MSER method.

Figure 8.
Image pair I2I3 matching result for: (a) SURF, (b) KAZE, and (c) MSER method.
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 9 | 489 | 2 | 11 |
Mismatches | 0 | 0 | 1 | 0 |
Percentage of correct matches | 100% | 100% | 67% | 100% |
Table 2.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 6 | 0 | 0 |
Mismatches | 2 | 0 | 1 | 0 |
Percentage of correct matches | 0% | 100% | 0% | 0% |
Table 3.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 2 | 1 | 0 |
Mismatches | 9 | 7 | 2 | 0 |
Percentage of correct matches | 0% | 29% | 33% | 0% |
Table 4.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 3 | 2 | 0 |
Mismatches | 8 | 1 | 1 | 0 |
Percentage of correct matches | 0% | 75% | 67% | 0% |
Table 5.
Image pair
Analysis of the matching results has shown that the selected algorithms are not effective when the matched images, despite the same content, differ significantly, cf. pair
In general, the KAZE method proved to be the most effective, while the ORB method showed the least processing efficiency of the set
4.3 Effect of contrast change on the number of the detected features
The research focused on the analysis of the effect of contrast change on the number of features detected in the image. For this purpose, the contrast of the image

Figure 9.
Effect of contrast change on the number of the detected image features.
On the basis of the results obtained, it can be concluded that the number of features detected by the examined methods decreases with image contrast reducing, which results in a smaller statistical sample processed in each subsequent step of these methods. This may be the cause of the lower matching efficiency of the methods considered for images that are significantly different from each other.
5. Conclusions and final remarks
The results of the algorithms presented in the literature are usually related to images that are fragments of source images, i.e. have their qualitatively identical counterparts in Ref. images. In the analysed cases, the differences between the reference image stored in the memory of the navigation system and that recorded by the sensor are significant. As a result, there are certain consequences that often prevent the image representing the same field object from being effectively matched. This is due to real environmental conditions and restrictions on obtaining information. The measurement system parameters and the quality of the images taken have a direct impact on the number of detected features. For example, the lack of complete information about the accuracy of field object’s image mapping makes it impossible to properly select the size of the filters. This results in the detection of objects that are completely irrelevant to the issue considered, such as bushes, leaves or grass blades, which are highly variable over time. Consequently, it has a significant impact on the performance of individual algorithms.
The study concluded that the use of statistical algorithms such as RANSAC improves the effectiveness of the selected methods. However, the results obtained strongly depend on the size of the set taken into consideration and the match/mismatch ratio. Therefore, in the terrain image processing, it is necessary to conduct an analysis of the informational characteristics of the examined objects and the conditions of acquisition. This allows for extracting characteristic points whose description does not significantly change due to atmospheric conditions.
The results of the simulation tests enable a general conclusion that the methods considered are often insufficient to determine the coordinates of a UAV/CM flying under unfavourable environmental conditions. The greatest development potential, in the context of the implementations examined in this work, is characterised by methods based on anisotropic diffusion, which in the course of simulation studies showed the highest effectiveness. Therefore, it seems justified to focus the research effort on further development of new image processing methods within the group of anisotropic diffusion methods. In particular, it is proposed to take the informative character of terrain images as determinants of the input parameters of the designed processing methods into account, to apply pre-processing methods aimed at decimation of the input data, their segmentation and determination of the main components, and to extend the definition of the designed methods with additional criteria increasing the effectiveness of detection and image feature matching. The newly developed methods should be aimed at the improvement of feature detection efficiency in terrain images and the selection of processing parameters taking into account environmental conditions as well as limitations and conditions in the measurement system.
Acknowledgments
This work is financed by the National Centre of Research and Development of the Republic of Poland as part of the scientific research program for the defence and security named Future Technologies for Defence – Young Scientist Contest (Grant No. DOB-2P/03/06/2018).
Conflict of interest
The authors declare no conflict of interest.