Abstract

Recently, deep Siamese matching networks have attracted increasing attention for visual tracking. Despite the demonstrated successes, Siamese trackers do not take full advantage of the structural information of target objects. They tend to drift in the presence of non-rigid deformation or partly occlusion. In this paper, we propose to advance Siamese trackers with graph convolutional networks, which pay more attention to the structural layout of target objects, to learn features robust to large appearance changes over time. Specifically, we divide the target object into several sub-parts and design an attentive graph convolutional network to model the relationship between parts. We incrementally update the attention coefficients of the graph with the attention scheme at each frame in an end-to-end manner. To further improve localization accuracy, we propose a learnable cascade regression algorithm based on deep reinforcement learning to refine the predicted bounding boxes. Extensive experiments on seven challenging benchmark datasets, i.e., OTB-100, TC-128, VOT2018, VOT2019, TrackingNet, GOT-10k and LaSOT, demonstrate that the proposed tracking method performs favorably against state-of-the-art approaches.

Keywords:
Computer science Graph Artificial intelligence Theoretical computer science

Metrics

6
Cited By
0.63
FWCI (Field Weighted Citation Impact)
50
Refs
0.70
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.