Real-time visual object tracking is an open problem in computer vision, with multiple applications in the industry, such as autonomous vehicles, human-machine interaction, intelligent cinematography, automated surveillance, and autonomous social navigation. The challenge of tracking a target of interest is critical to all of these applications. Recently, tracking algorithms that use siamese neural networks trained offline on large-scale datasets of image pairs have achieved the best performance exceeding real-time speed on multiple benchmarks. Results show that siamese approaches can be applied to enhance the tracking capabilities by learning deeper features of the object’s appearance. SiamMask utilized the power of siamese networks and supervised learning approaches to solve the problem of arbitrary object tracking in real-time speed. However, its practical applications are limited due to failures encountered during testing. In order to improve the robustness of the tracker and make it applicable for the intended real-world application, two improvements have been incorporated, each addressing a different aspect of the tracking task. The first one is a data augmentation strategy to consider both motion-blur and low-resolution during training. It aims to increase the robustness of the tracker against a motion-blurred and low-resolution frames during inference. The second improvement is a target template update strategy that utilizes both the initial ground truth template and a supplementary updatable template, which considers the score of the predicted target for an efficient template update strategy by avoiding template updates during severe occlusion. All of the improvements were extensively evaluated and have achieved state-of-the-art performance in the VOT2018 and VOT2019 benchmarks. Our method (VPU-SiamM) has been submitted to the VOT-ST 2020 challenge, and it is ranked 16th out of 38 submitted tracking methods according to the Expected average overlap (EAO) metrics. VPU_SiamM Implementation can be found from the VOT2020 Trackers repository1.
Part of the book: Information Extraction and Object Tracking in Digital Video