BOOK-CHAPTER

Unidirectional Cross-Modal Fusion for RGB-T Tracking

Abstract

The key issue of RGB-T tracking is to obtain an effective multimodal representation of targets by utilizing complementary RGB and TIR modality information. Previous methods of template fusion or bidirectional search-template interaction potentially diminish the target representation, resulting from noise information of both templates and search regions. Meanwhile, the direct fusion of sole search features without interacting with templates cannot fully utilize target-relevant contextual information. To mitigate these issues, we present UCTrack, which fuses complementary multimodal search features conditioned on undisturbed RGB and TIR template features. Specifically, we design a Unidirectional Cross-modal Fusion (UCF) module to effectively minimize the influence of background noise on templates by pruning the unnecessary template-to-search cross-modal interaction and to mutually enhance RGB and TIR search features with target-relevant information through multimodal spatial fusion. Furthermore, this module is seamlessly integrated into different layers of a ViT backbone to facilitate feature extraction and cross-modal fusion for RGB-T tracking. Benefiting from the UCF module, UCTrack can effectively and accurately represent multimodal target features without unnecessary template-to-search interaction flow and direct template fusion, making the first proposal of unidirectional cross-modal fusion paradigm for RGB-T tracking to our best knowledge. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves state-of-the-art performance.

Keywords:
Modal Tracking (education) Fusion Computer vision Computer science Artificial intelligence Psychology Materials science Philosophy Linguistics

Metrics

1
Cited By
2.77
FWCI (Field Weighted Citation Impact)
0
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Industrial Vision Systems and Defect Detection
Physical Sciences →  Engineering →  Industrial and Manufacturing Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.