Object tracking is still a critical and challenging problem in computer vision. More and more researchers pay attention to applying deep learning to obtain the powerful feature for robust tracking. Nowadays, feature fusion is an essential part of Siamese tracking architectures. However, the existing feature fusion methods usually provide a fixed linear aggregation of feature maps, and this combination may not be appropriate for a specific object. In this paper, a twofold Siamese network, named SD-Siam, is proposed to extract the features of the object effectively. The template branch and the search branch are both composed of a deep layer sub-network and a shallow layer sub-network, which is used for feature fusion of the different network layers. Moreover, an attentional feature fusion scheme is employed to better fuse scale-inconsistent features, where a multi-scale channel attention module is used to fuse different scales of features. In addition, we respectively evaluate similarity measures for the features of deep layer sub-networks and the fused features of the template branch and the search branch, and then these two similarity response maps are added to obtain the tracking result. Experiments show the proposed SD-Siam outperforms representative trackers on several challenging benchmarks.
Qiongrui LiuXiyi WangWenjie WuXilin Zhu
Junyan GaoZhenguo YangWenyin Liu
Lijun ZhouHongyun LiJianlin Zhang
Da LiYabing KangXing XiangWensheng TaoJiwei Hu