Object tracking is an important issue in computer vision, and fully-convolutional Siamese networks have recently received much research attentions. It uses two offline deep convolutional networks with shared parameters to solve the general similarity problem. However, in fully-convolutional Siamese networks, not all channels of the feature map contain useful information for tracking. In this paper, we propose to introduce the channel-wise attention mechanism to help the network learn to select the most informative and discriminative channels in feature map. At the same time, a novel multi-scale feature fusion method is proposed which uses a top-down structure with horizontal connections to construct advanced semantic feature maps at multiple scales. Experiments have shown that the proposed method has achieved remarkable improvement in both successful rate and accuracy in tracking.
Xue ShangjieWenjin YaoYang Wenjun