Visual object tracking with siamese networks has recently attracted much attention because of its favourable performances. Although spatial and temporal constraints have been widely utilized under other networks, the existing siamese networks usually ignore the spatial-temporal correlation in tracking process. So as these models are usually sensitive to complex background interference and target deformation, which will lead to tracking deviations. To this end, we propose a novel measure of similarity metric into siamese network which integrates both spatial and temporal correlations simultaneously. In spatial domain, a distractor-aware module is introduced to generate strong semantic based hard negative samples in each frame according to object detection and segmentation result. In temporal domain, an online target templates updated strategy is devised to capture the appearance and scale variations of the target in consecutive frames. Qualitative and quantitative results on VOT2016, OTB50 and OTB100 datasets demonstrate that our proposed method is able to achieve superior tracking results.
Shishun TianZixi ChenBolin ChenWenbin ZouXia Li
Junxu WeiLifeng YangTian PuJian LiZhenming Peng
Jianlong ZhangQiao LiBin WangChen ChenTianhong WangYang ZhouJi Li
Ke LiangXiaoying LiaoGuangming Liang