Abstract

RGB-T tracking has attracted more and more attention due to its excellent performance. However, how to make full use of the complementary advantages of visible light images and thermal infrared images in RGB-T tracking without losing this advantage in deep feature learning is still a challenge. This paper proposes a Cross-modal Attention Network, which is corrected by triple attention after each feature information is extracted to obtain richer modal feature information. Then a parallel and layer-by-layer interactive network is used to realize the feature complementarity between the two modalities and ensure that the complementary advantages are not lost in deep learning. A large number of experiments on two RGB-T benchmark datasets verify the effectiveness of this algorithm.

Keywords:
Computer science RGB color model Artificial intelligence Feature (linguistics) Modal Benchmark (surveying) Computer vision Deep learning Tracking (education) Complementarity (molecular biology) Pattern recognition (psychology)

Metrics

4
Cited By
0.41
FWCI (Field Weighted Citation Impact)
40
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.