Xinglong SunHaijiang SunBo LiuShan JiangJiacheng WangDaqun Li
The trackers based on lightweight neural networks have achieved great success in the field of aerial object tracking, most of which aggregate multilayer deep features to improve the tracking quality. However, existed algorithms generally combine expression features only in a unidirectional manner, which ignore that diverse kinds of features are required for identifying and locating the object simultaneously, limiting the robustness and precision of tracking. In this article, we propose a novel target-aware bidirectional fusion transformer for UAV tracking. Specifically, we first present a two-stream fusion model based on linear separable attentions, which is able to combine the shallow and the deep features from both forward and backward directions, providing the adjusted local cues for location and global semantics for identification, respectively. In addition, a target-aware positional encoding strategy is designed for the above-mentioned fusion model, which is helpful to extract the object-related attributes during fusion phase. Finally, we evaluate the proposed method in several popular UAV benchmarks, including UAV123, UAV20 L, UAVTrack112, DTB70, and UAVDT. Abundant experimental results demonstrate that our approach has stronger tracking capability than other state-of-the-art trackers and can run at an average speed of 30.5 FPS on embedded platform, which is appropriate for practical drone deployments.
Kai HuangJun ChuLu LengXingbo Dong
Pujian LaiMeili ZhangGong ChengShengyang LiXiankai HuangJunwei Han
Ye WangYuheng LiuGe ZhangYuru SuShun ZhangShaohui Mei
Zhuojun ZouXuexin LiuYuanpei ZhangLin ShuJie Hao