Recently, Siamese neural networks have been widely used in visual object tracking to leverage the template matching mechanism. Siamese network architecture contains two parallel streams to estimate the similarity between two inputs and has the ability to learn their discriminative features. Various deep Siamese-based tracking frameworks have been proposed to estimate the similarity between the target and the search region. In this chapter, we categorize deep Siamese networks into three categories by the position of the merging layers as late merge, intermediate merge and early merge architectures. In the late merge architecture, inputs are processed as two separate streams and merged at the end of the network, while in the intermediate merge architecture, inputs are initially processed separately and merged intermediate well before the final layer. Whereas in the early merge architecture, inputs are combined at the start of the network and a unified data stream is processed by a single convolutional neural network. We evaluate the performance of deep Siamese trackers based on the merge architectures and their output such as similarity score, response map, and bounding box in various tracking challenges. This chapter will give an overview of the recent development in deep Siamese trackers and provide insights for the new developments in the tracking field.
Part of the book: Visual Object Tracking with Deep Neural Networks
Tracking with the siamese network has recently gained enormous popularity in visual object tracking by using the template-matching mechanism. However, using only the template-matching process is susceptible to robust target tracking because of its inability to learn better discrimination between target and background. Several attention-learning are introduced to the underlying siamese network to enhance the target feature representation, which helps to improve the discrimination ability of the tracking framework. The attention mechanism is beneficial for focusing on the particular target feature by utilizing relevant weight gain. This chapter presents an in-depth overview and analysis of attention learning-based siamese trackers. We also perform extensive experiments to compare state-of-the-art methods. Furthermore, we also summarize our study by highlighting the key findings to provide insights into future visual object tracking developments.
Part of the book: Information Extraction and Object Tracking in Digital Video