JOURNAL ARTICLE

Efficient RGB-T Tracking via Cross-Modality Distillation

Abstract

Most current RGB-T trackers adopt a two-stream structure to extract unimodal RGB and thermal features and complex fusion strategies to achieve multi-modal feature fusion, which require a huge number of parameters, thus hindering their real-life applications. On the other hand, a compact RGB-T tracker may be computationally efficient but encounter non-negligible performance degradation, due to the weakening of feature representation ability. To remedy this situation, a cross-modality distillation framework is presented to bridge the performance gap between a compact tracker and a powerful tracker. Specifically, a specific-common feature distillation module is proposed to transform the modality-common information as well as the modality-specific information from a deeper two-stream network to a shallower single-stream network. In addition, a multi-path selection distillation module is proposed to instruct a simple fusion module to learn more accurate multi-modal information from a well-designed fusion mechanism by using multiple paths. We validate the effectiveness of our method with extensive experiments on three RGB-T benchmarks, which achieves state-of-the-art performance but consumes much less computational resources.

Keywords:
Computer science RGB color model Artificial intelligence Modality (human–computer interaction) BitTorrent tracker Feature (linguistics) Distillation Representation (politics) Computer vision Tracking (education) Fusion mechanism Pattern recognition (psychology) Eye tracking Fusion

Metrics

82
Cited By
14.92
FWCI (Field Weighted Citation Impact)
45
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning

Mingliang ZhouXinwen ZhaoFuting LuoJun LuoHuayan PuTao Xiang

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2023 Vol: 20 (4)Pages: 1-20
JOURNAL ARTICLE

Cross-Modality Distillation for Multi-Modal Tracking

Tianlu ZhangQiang ZhangKurt DebattistaJungong Han

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2025 Vol: 47 (7)Pages: 5847-5865
JOURNAL ARTICLE

Dual-Level Modality De-Biasing for RGB-T Tracking

Yufan HuZekai ShaoBin FanHongmin Liu

Journal:   IEEE Transactions on Image Processing Year: 2025 Vol: 34 Pages: 2667-2679
JOURNAL ARTICLE

AMNet: Learning to Align Multi-Modality for RGB-T Tracking

Tianlu ZhangXiaoyi HeQiang JiaoQiang ZhangJungong Han

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2024 Vol: 34 (8)Pages: 7386-7400
© 2026 ScienceGate Book Chapters — All rights reserved.