Similarity index measures for selected pairs of the set
Abstract
The sensitivity of global navigation satellite systems to disruptions precludes their use in conditions of armed conflict with an opponent possessing comparable technical capabilities. In military unmanned aerial vehicles (UAVs) the aim is to obtain navigational data to determine the location and correction of flight routes by means of other types of navigational systems. To correct the position of an UAV relative to a given trajectory, the systems that associate reference terrain maps with image information can be used. Over the last dozen or so years, new, effective algorithms for matching digital images have been developed. The results of their performance effectiveness are based on images that are fragments taken from source files, and therefore their qualitatively identical counterparts exist in the reference images. However, the differences between the reference image stored in the memory of navigation system and the image recorded by the sensor can be significant. In this paper modern methods of image registration and matching to UAV position refinement are compared, and adaptation of available methods to the operating conditions of the UAV navigation system is discussed.
Keywords
- digital image processing
- image matching
- terrain-aided navigation
- unmanned aerial vehicle
- cruise missile
1. Introduction
Global navigation satellite systems are widely used in both civil and military technology areas. The advantage of such systems is very high accuracy in determining the coordinates, however, the possibility of easy interference precludes their use in conditions of armed conflict with an opponent equipped with comparable technical capabilities. In the case of military autonomous unmanned aerial vehicles (UAVs), in particular cruise missiles (CM), the aim is therefore to determine navigation data for specifying the position and correcting the flight paths by means of other types of navigation and self-guidance systems.
Such systems are usually based on inertial navigation systems (INS) which use accelerometers, angular rate gyroscopes and magnetometers to provide relatively accurate tracking of an object’s position and orientation in space. However, they are exposed to drift and systematic errors of sensors, hence the divergence between the actual and the measured position of the object is constantly increasing with time. This results in a significant navigational error.
Therefore, two types of systems designed to correct the position of an object in relation to a given trajectory are normally used in the solutions of the UAV/CM navigation and self-guidance systems. The first group contains systems whose task is to determine the position on the basis of data obtained from radio altimeters, related to reference height maps. Such systems include, for example: TERCOM (terrain contour matching), used in Tomahawk cruise missiles, SITAN (Sandia inertial terrain-aided navigation), using terrain gradients as input for the modified extended Kalman filter (EKF) estimating the position of the object, and VATAN (Viterbi-algorithm terrain-aided navigation), a version of the system based on the Viterbi algorithm and characterised – in relation to the SITAN system – with lower mean square error of position estimation [1, 2, 3, 4, 5]. The main disadvantage of these solutions is the active operation of measuring devices, which reveals the position of the object in space and eliminates the advantages associated with the use of a passive (and therefore undetectable) inertial navigation system. The second group consists of systems associating reference terrain maps with image information obtained by means of visible, residual or infrared light cameras [6, 7]. Such systems include the American DSMAC (digital scene matching area correlator), also used in Tomahawk missiles [8, 9], and its Russian counterpart used in Kalibr (aka Club) missiles. Their advantage is both the accuracy of positioning and the secrecy (understood as passivity) of operation.
Due to the dynamic development of UAVs/CMs equipped with navigation systems operating independently of satellite systems and a number of problems associated with the implementation of the discussed issue, the assessment on the sensitivity of the selected methods to environmental conditions and constraints in the measurement systems, which often negatively affect the results obtained, has been carried out. The essence of the work is to consider issues related to the processing of image information obtained from optical sensors carried by UAV/CM and its association with terrain reference images. In particular, issues of the correctness of image data matching and the limitations of the possibilities of their similarities’ assessment are considered. The article compares modern image matching methods assuming real conditions for obtaining information. The main goal set by the authors is to verify selected algorithms, identify the key aspects determining the effectiveness of their operation and indicate potential directions of their development.
2. State of the art
The operation of classic object identification algorithms, indicating the similarities between the recorded and reference images (the so-called
A particularly significant obstacle is the fact that the sensory elements of navigation systems installed on UAV/CM record image data in various environmental and lighting conditions [10]. Frequently, reference data of high informative value, due to various conditions, constitute a pattern of little use or even lead to incorrect results. This is the case, for example, when the reconnaissance is conducted in different weather conditions than those in which the UAV/CM mission takes place (Figure 1). Therefore, image feature matching becomes a complex issue. The conditions related to the image recording parameters, e.g. variable view angle, maintaining scale or using various types of sensors, turn out to be equally important.

Figure 1.
Images of the same fragment of the Earth’s surface taken under different weather and lighting conditions.
Image matching methods began to be strongly developed with the dissemination of digital image in technology. Initially, the classical Fourier and correlation methods were used. However, these methods did not allow for successful multi-modal, multi-temporal, and multi-perspective matching of different images. The taxonomy of the classical methods used in the image matching process was presented in the early 1990s [11]. The image feature space, considered as a source of information necessary for image matching, was defined and local variations in the image were identified as the greatest difficulty in the matching process. In the 21st century, further development of methods based on the features of the image continued [12]. It should be emphasised that most image matching methods based on image features include four consecutive basic steps: feature detection, feature matching, model estimation of mutual image transformation and final transformation of the analysed image. These methods became an alternative to the correlation and Fourier methods. For over a dozen years, new, effective algorithms for processing and matching digital images have been developed, using statistical methods based on matching local features in images [11, 12, 13, 14], cf. Figure 2. Their authors point to the greater invariance of the proposed algorithms to perspective distortions, rotation, translation, scaling and lighting changes. Given their high reliability under static conditions, as well as their low sensitivity to changes in the optical system’s position, including translation, orientation and scale, it is justified to conduct studies in order to verify their usefulness and effectiveness. The paper focuses on modern image matching algorithms, which can potentially be used in topographic navigation issues. It should be stressed that the problem is completely different in the indicated context. This is due to the fact that although the matched images represent the same area of the terrain, the manner and time of the recording differ significantly from each other. This is not a typical application of these algorithms, hence a limited effectiveness of their operation can be expected.

Figure 2.
Classification of the selected methods of image feature matching.
The common feature of all methods is the use of the so-called
where
A more numerically efficient version of the SIFT algorithm, called Speeded-Up Robust Features (SURF) is based on the so-called
Simply put, the detection of the characteristic point is based on the use of the determinant of a Hessian matrix
where
The determinant of the Hessian matrix after approximation using box filters and the Frobenius norm is given as
After detecting the local extremes of
In 2011, an alternative method to SIFT and SURF, Oriented FAST and Rotated BRIEF (ORB), was proposed [19]. The method was based on the modified Features from Accelerated Segment Test (FAST) detector [20, 21], enabling corner and edge detection, and a modified Binary Robust Independent Elementary Features (BRIEF) descriptor [22]. This approach involves changing the scale of the image on the basis of blurring with an increasing value of Gaussian filter. Despite the noise reduction and enhancing the uniformity of areas interpreted by human beings as unique (e.g. surface of the lake, wall of a building, shape of a vehicle, etc.), it causes blurring of their edges. This often leads to the inability to indicate the boundaries between areas and to define characteristic points in their neighbourhood.
The solution to this problem was proposed in the KAZE method (Japanese for “wind”) [23]. Unlike the SIFT and SURF methods, which use the Gaussian function causing isotropic diffusion of luminance to generalise the image, in the KAZE method the generalisation is based on nonlinear diffusion in consecutive octaves of the scale [24]. The anisotropic image blurring in this method depends on the local luminance distribution. Nonlinear diffusion can be presented in the following equation:
The blur intensity can be adapted by the introduced conductivity function
where
This function allows for blurring the image while maintaining the edges of structures. As a result, more features can be detected at different image scales. However, it involves the use of a gradient, which in the case of intense image disturbance, e.g. in the form of a shadow, may cause an unfavourable (due to the subsequent detection of features) distribution of diffusion in the image.
An important stage of the considered methods is the description of a characteristic point by means of a vector containing information about its surroundings. The SIFT method uses a luminance gradient and in the SURF method the image response to horizontally and vertically oriented Haar wavelet is applied. In general, around the characteristic point in the area with a defined radius dependent on the
where
The SIFT method creates thereunder a gradient histogram that sums up the determined values in four cells. In analogous cells, according to the SURF method, the responses to Haar wavelets distributed along the radii in the neighbourhood of the point with an interval of
where
In the KAZE method the procedure is similar as for the SURF method with the difference that the first order derivatives from the image function are used. The point description operation is performed for all levels in the adopted scale space, thereby creating a pyramid of vectors assigned to subsequent levels containing an increasingly generalised image.
The Maximally Stable Extremal Regions (MSER) method introduced in [27] has a different approach to the detection and description of local features. In this method, regions (shapes), referred to as
where
Regions (areas, shapes) with a specific (typically average) luminance level can be determined in the image. Region
In the feature description stage, a vector using image moments is determined for each region. Based on the moments
The orientation
The use of moments and the centre of gravity is also a feature of ORB method which uses machine learning approach for corner detection. After their detection, based on the image moments, the centre of gravity
where
On the basis of the corner’s position and centre of gravity, the orientation of the feature is determined as shown in the equation:
The feature description step uses the assigned orientation to complete the binary BRIEF descriptor [22], with the condition of verifying the belonging of the point
The matrix
The common element for the described methods is the stage of comparing the distinguished features detected on the reference and registered images. It is of fundamental importance in the field of absolute terrain position designation, because the location of the matched features is the source of determining the matrix of mutual image transformation. In this comparison, vectors describing the features in a given method, e.g. feature metric and its orientation, are taken into account.
The determination of the similarity between the feature description vectors
The third frequently used norm for binary vectors is the Hamming distance given as:
Another approach for matching two features is the nearest neighbour algorithm based on the ratio of the distances
The final step in all the discussed methods is the statistical verification of a set of matched local features. It happens that, as a result of the initial comparison of the vectors which describe the features, mismatches resulting from the acquisition conditions described above are indicated. Therefore, after the pre-processing step, additional criteria are applied to distinguish matches from mismatches, e.g. based on the Random Sample Consensus (RANSAC) method [29]. This method allows for the estimation of a mathematical model describing the location of local features in the image provided that most of the matched points fit into this model (with the assumed maximum error). Then those points that do not fit into the estimated model are discarded in the step of determining the image transformation matrix.
3. Problem formulation
The following set is considered:
Elements of
The mean square error is determined by the formula:
in which
where
in which
For such defined initial conditions, the best match of the subsequent elements of the set
The term
4. Performance analysis
In order to verify the sensitivity of the selected methods to limitations in the measurement system and environmental changes, a number of studies taking into account the actual conditions of obtaining information were conducted. Due to their difficult nature, they were performed with the use of computer simulation methods. The research was carried out in three stages. In the first stage, a detailed analysis of the test sets, using the values of the similarity indexes of the elements defined in the article, was completed. On the basis of the performed tests, special cases were selected and subjected to detailed analysis. In the further part of the study, the methods and verification of the correctness of image data matching in the scope of mutual matching of the sets presented to the analysis were compared. Finally, the influence of changes in the contrast of the acquired image on the number of features detected and the subsequent matching results was examined.
4.1 Analysis of test set elements
For the initial numerical tests, the test set

Figure 3.
Test image set
The object is located in a natural environment characteristic for tundra and therefore distinguished by a rocky ground with a very low plant cover, dominated by mosses and lichens. Image
0 | 2.23E03 | 1.46E04 | 1.19E04 | 0 | 2.99E03 | |
14.65 | 6.47 | 7.36 | 13.37 | |||
1 | 0.4373 | 0.0742 | 0.0755 | 1 | 0.3378 |
Table 1.
Based on the obtained results, it can be shown that the elements constituting the test set
It should be noted that
4.2 Comparison of the selected methods of image feature matching
The test set

Figure 4.
Image pair

Figure 5.
Image pair

Figure 6.
Image pair

Figure 7.
Image pair

Figure 8.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 9 | 489 | 2 | 11 |
Mismatches | 0 | 0 | 1 | 0 |
Percentage of correct matches | 100% | 100% | 67% | 100% |
Table 2.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 6 | 0 | 0 |
Mismatches | 2 | 0 | 1 | 0 |
Percentage of correct matches | 0% | 100% | 0% | 0% |
Table 3.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 2 | 1 | 0 |
Mismatches | 9 | 7 | 2 | 0 |
Percentage of correct matches | 0% | 29% | 33% | 0% |
Table 4.
Image pair
SURF | KAZE | MSER | ORB | |
---|---|---|---|---|
Correct matches | 0 | 3 | 2 | 0 |
Mismatches | 8 | 1 | 1 | 0 |
Percentage of correct matches | 0% | 75% | 67% | 0% |
Table 5.
Image pair
Analysis of the matching results has shown that the selected algorithms are not effective when the matched images, despite the same content, differ significantly, cf. pair
In general, the KAZE method proved to be the most effective, while the ORB method showed the least processing efficiency of the set
4.3 Effect of contrast change on the number of the detected features
The research focused on the analysis of the effect of contrast change on the number of features detected in the image. For this purpose, the contrast of the image

Figure 9.
Effect of contrast change on the number of the detected image features.
On the basis of the results obtained, it can be concluded that the number of features detected by the examined methods decreases with image contrast reducing, which results in a smaller statistical sample processed in each subsequent step of these methods. This may be the cause of the lower matching efficiency of the methods considered for images that are significantly different from each other.
5. Conclusions and final remarks
The results of the algorithms presented in the literature are usually related to images that are fragments of source images, i.e. have their qualitatively identical counterparts in Ref. images. In the analysed cases, the differences between the reference image stored in the memory of the navigation system and that recorded by the sensor are significant. As a result, there are certain consequences that often prevent the image representing the same field object from being effectively matched. This is due to real environmental conditions and restrictions on obtaining information. The measurement system parameters and the quality of the images taken have a direct impact on the number of detected features. For example, the lack of complete information about the accuracy of field object’s image mapping makes it impossible to properly select the size of the filters. This results in the detection of objects that are completely irrelevant to the issue considered, such as bushes, leaves or grass blades, which are highly variable over time. Consequently, it has a significant impact on the performance of individual algorithms.
The study concluded that the use of statistical algorithms such as RANSAC improves the effectiveness of the selected methods. However, the results obtained strongly depend on the size of the set taken into consideration and the match/mismatch ratio. Therefore, in the terrain image processing, it is necessary to conduct an analysis of the informational characteristics of the examined objects and the conditions of acquisition. This allows for extracting characteristic points whose description does not significantly change due to atmospheric conditions.
The results of the simulation tests enable a general conclusion that the methods considered are often insufficient to determine the coordinates of a UAV/CM flying under unfavourable environmental conditions. The greatest development potential, in the context of the implementations examined in this work, is characterised by methods based on anisotropic diffusion, which in the course of simulation studies showed the highest effectiveness. Therefore, it seems justified to focus the research effort on further development of new image processing methods within the group of anisotropic diffusion methods. In particular, it is proposed to take the informative character of terrain images as determinants of the input parameters of the designed processing methods into account, to apply pre-processing methods aimed at decimation of the input data, their segmentation and determination of the main components, and to extend the definition of the designed methods with additional criteria increasing the effectiveness of detection and image feature matching. The newly developed methods should be aimed at the improvement of feature detection efficiency in terrain images and the selection of processing parameters taking into account environmental conditions as well as limitations and conditions in the measurement system.
Acknowledgments
This work is financed by the National Centre of Research and Development of the Republic of Poland as part of the scientific research program for the defence and security named
References
- 1.
Boozer DD, Fellerhoff JR. Terrain-Aided Navigation Test Results in the AFTI/F-16 Aircraft. Navigation – Journal of The Institute of Navigation. 1988;35(2):161–175. DOI: 10.1002/j.2161-4296.1988.tb00949.x - 2.
Enns R. Terrain-aided navigation using the Viterbi algorithm. Journal of Guidance, Control, and Dynamics. 1995;18(6):1444–1449. DOI: 10.2514/3.21566 - 3.
Han Y, Wang B, Deng Z, Fu M. An improved TERCOM-based algorithm for gravity-aided navigation. IEEE Sensors Journal. 2016;16(8):2537–2544. DOI: 10.1109/JSEN.2016.2518686 - 4.
Hua Z, Xiulin H. A height-measuring algorithm applied to TERCOM radar altimeter. In: Proc. of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE); 20–22 August 2010; Chengdu (China). New York: IEEE, 2010. p. (V5–43)-(V5–46). DOI: 10.1109/ICACTE.2010.5579215 - 5.
Wei E, Dong C, Liu J, Tang S. An improved TERCOM algorithm for gravity-aided inertial navigation system. Journal of Geomatics. 2017;42(6):29–31. DOI: 10.14188/j.2095-6045.2016190 - 6.
Naimark L, Webb H, Wang T. Vision-Aided Navigation for Aerial Platforms. In: Proc. of the ION 2017 Pacific PNT Meeting; 1–4 Mai 2017; Honolulu (USA). Manassas: ION, 2017. p. 70–76. DOI: 10.33012/2017.15051 - 7.
Yang C, Vadlamani A, Soloviev A, Veth M, Taylor C. Feature matching error analysis and modeling for consistent estimation in vision-aided navigation. Navigation. 2018;65:609–628. DOI: 10.1002/navi.265628 - 8.
Carr JR, Sobek JS. Digital Scene Matching Area Correlator (DSMAC). In: Proc. of 24th Annual Technical Symposium, SPIE 0238, Image Processing for Missile Guidance; 23 December 1980; San Diego (USA). Bellingham: SPIE, 1980. DOI: 10.1117/12.959130 - 9.
Irani GB, Christ JP. Image processing for Tomahawk scene matching. Johns Hopkins APL Technical Digest. 1994;15(3):250–264. - 10.
Turek P, Bużantowicz W. Image matching constraints in unmanned aerial vehicle terrain-aided navigation. In: Proc. of the 2nd Aviation and Space Congress; 18–20 September 2019; Cedzyna (Poland). p. 206–208. - 11.
Brown LG. A survey of image registration techniques. ACM Computing Surveys. 1992;24(4):325–376. DOI: 10.1145/146370.146374 - 12.
Zitová B, Flusser J. Image registration methods: A survey. Image and Vision Computing. 2003;21(11):977–1000. DOI: 10.1016/S0262-8856(03)00137-9 - 13.
Bouchiha R, Besbes K. Automatic Remote-Sensing Image Registration Using SURF. International Journal of Computer Theory and Engineering. 2013;5(1):88–92. DOI: 10.7763/IJCTE.2013.V5.653 - 14.
Kashif M, Deserno TM, Haak D, Jonas S. Feature description with SIFT, SURF, BRIEF, BRISK, or FREAK? A general question answered for bone age assessment. Computers in Biology and Medicine. 2016;68:67–75. DOI: 10.1016/j.compbiomed.2015.11.006 - 15.
Lindeberg T. Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics. 1994;21(2):224–270. DOI: 10.1080/757582976 - 16.
Löwe DG. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision. 2004;60:91–110. DOI: 10.1023/B:VISI.0000029664.99615.94 - 17.
Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding. 2008;110(3):346–359. DOI: 10.1016/j.cviu.2007.09.014 - 18.
Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Proc. of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 8–14 December 2001; Kauai (USA). p. (I-511)-(I-518). DOI: 10.1109/CVPR.2001.990517 - 19.
Rublee E, Rabaud V, Konolige K, Bradski G. ORB: An efficient alternative to SIFT or SURF. In: Proc. of the 13th International Conference on Computer Vision; 6–13 November 2011; Barcelona (Spain). p. 2564–2571. DOI: 10.1109/ICCV.2011.6126544 - 20.
Rosten E, Drummond T. Machine Learning for High-Speed Corner Detection. In: Leonardis A, Bischof H, Pinz A, editors. Proc. of the 9th European Conference on Computer Vision 2006 – Lecture Notes in Computer Science, vol. 3951. Berlin-Heidelberg: Springer; 2006. p. 430–443. DOI: 10.1007/11744023_34 - 21.
McIlroy P, Rosten E, Taylor S, Drummond T. Deterministic sample consensus with multiple match hypotheses. In: Proc. of the 21st British Machine Vision Conference; 31 August – 3 September 2010; Aberystwyth (UK). pp. 111.1–111.11. DOI: 10.5244/C.24.111 - 22.
Calonder M, Lepetit V, Strecha C, Fua P. BRIEF: Binary Robust Independent Elementary Features. In: Daniilidis K, Maragos P, Paragios N, editors. Proc. of the 11th European Conference on Computer Vision 2010 – Lecture Notes in Computer Science, vol. 6314. Berlin-Heidelberg: Springer; 2010. p. 778–792. DOI: 10.1007/978-3-642-15561-1_56 - 23.
Alcantarilla PF, Bartoli A, Davison AJ. KAZE Features. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, editors. Proc. of the 13th European Conference on Computer Vision 2010 – Lecture Notes in Computer Science, vol. 7577. Berlin-Heidelberg: Springer; 2012. p. 214–227. DOI: 10.1007/978-3-642-33783-3_16 - 24.
Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12(7):629–639. DOI: 10.1109/34.56205 - 25.
Weickert J. Efficient image segmentation using partial differential equations and morphology. Pattern Recognition. 2001;34:1813–1824. DOI: 10.1016/S0031-3203(00)00109-6 - 26.
Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M. Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing. 1997;6(2): 298–311. DOI: 10.1109/83.551699 - 27.
Matas J, Chum O, Urban M, Pajdla T. Robust wide baseline stereo from maximally stable extremal regions. In: Proc. of the 13th British Machine Vision Conference; 2–5 September 2002; Cardiff (UK). pp. 384–396. - 28.
Chaumette F. Image moments: a general and useful set of features for visual servoing. IEEE Transactions on Robotics. 2004;20(4):713–723. DOI: 10.1109/TRO.2004.829463 - 29.
Fischler MA, Bolles RC. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM. 1981;24(6):381–395. - 30.
Wang Z, Bovik AC. Mean squared error: love it or leave it?. IEEE Signal Processing Magazine. 2009;26(1):98–117. DOI: 10.1109/MSP.2008.930649 - 31.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing. 2004;13(4):600–612. - 32.
Horé A, Ziou D. Image quality metrics: PSNR vs. SSIM. In: Proc. of the 20th IAPR International Conference on Pattern Recognition; 23–26 August 2010; Istanbul (Turkey). p. 2366–2369. DOI: 10.1109/ICPR.2010.579