RGB-T tracking has attracted more and more attention due to its excellent performance. However, how to make full use of the complementary advantages of visible light images and thermal infrared images in RGB-T tracking without losing this advantage in deep feature learning is still a challenge. This paper proposes a Cross-modal Attention Network, which is corrected by triple attention after each feature information is extracted to obtain richer modal feature information. Then a parallel and layer-by-layer interactive network is used to realize the feature complementarity between the two modalities and ensure that the complementary advantages are not lost in deep learning. A large number of experiments on two RGB-T benchmark datasets verify the effectiveness of this algorithm.
Jun LiuWei KeShuai WangDa YangSizhe Wang
Chaoqun WangChunyan XuZhen CuiLing ZhouTong ZhangXiaoya ZhangJian Yang
Chunyan XuZhen CuiChaoqun WangChuanwei ZhouJian Yang
Xiao GuoHangfei LiYufei ZhaPeng Zhang
Hang ZhengYuanjing ZhuPeng HuHonglin WangHongwei Ding