Efficient RGB-T Tracking via Cross-Modality Distillation

Tianlu Zhang; Hongyuan Guo; Qiang Jiao; Qiang Zhang; Jungong Han

doi:10.1109/cvpr52729.2023.00523

ScienceGate Book Chapters

JOURNAL ARTICLE

Efficient RGB-T Tracking via Cross-Modality Distillation

Tianlu Zhang Hongyuan Guo Qiang Jiao Qiang Zhang Jungong Han

Year: 2023 Pages: 5404-5413

DOI: 10.1109/cvpr52729.2023.00523

Get Full-Text PDF Get Analytical Report

Abstract

Most current RGB-T trackers adopt a two-stream structure to extract unimodal RGB and thermal features and complex fusion strategies to achieve multi-modal feature fusion, which require a huge number of parameters, thus hindering their real-life applications. On the other hand, a compact RGB-T tracker may be computationally efficient but encounter non-negligible performance degradation, due to the weakening of feature representation ability. To remedy this situation, a cross-modality distillation framework is presented to bridge the performance gap between a compact tracker and a powerful tracker. Specifically, a specific-common feature distillation module is proposed to transform the modality-common information as well as the modality-specific information from a deeper two-stream network to a shallower single-stream network. In addition, a multi-path selection distillation module is proposed to instruct a simple fusion module to learn more accurate multi-modal information from a well-designed fusion mechanism by using multiple paths. We validate the effectiveness of our method with extensive experiments on three RGB-T benchmarks, which achieves state-of-the-art performance but consumes much less computational resources.

Keywords:

Computer science RGB color model Artificial intelligence Modality (human–computer interaction) BitTorrent tracker Feature (linguistics) Distillation Representation (politics) Computer vision Tracking (education) Fusion mechanism Pattern recognition (psychology) Eye tracking Fusion

Metrics

Cited By

14.92

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Efficient RGB-T Tracking via Cross-Modality Distillation

Abstract

Metrics

Citation History

Topics

Related Documents

Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning

Cross-Modality Distillation for Multi-Modal Tracking

Dual-Level Modality De-Biasing for RGB-T Tracking

CMRFusion: Efficient Feature Decomposition for RGB-T Fusion via Cross Modality Mask Reconstruction

AMNet: Learning to Align Multi-Modality for RGB-T Tracking