JOURNAL ARTICLE

Exploring Multi-Modal Spatial–Temporal Contexts for High-Performance RGB-T Tracking

Tianlu ZhangQiang JiaoQiang ZhangJungong Han

Year: 2024 Journal:   IEEE Transactions on Image Processing Vol: 33 Pages: 4303-4318   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information. Specifically, a Multi-modal Transformer Encoder (MMTE) is designed to achieve the encoding of reliable multi-modal spatial contexts as well as the fusion of multi-modal features. Furthermore, a Quality-aware Transformer Decoder (QATD) is proposed to effectively propagate the tracking cues from historical frames to the current frame, which facilitates the object searching process. Moreover, the proposed MMSTC network can be easily extended to various tracking frameworks. New state-of-the-art results on five prevalent RGB-T tracking benchmarks demonstrate the superiorities of our proposed trackers over existing ones.

Keywords:
Computer science Computer vision Artificial intelligence Modal Tracking (education) RGB color model Pattern recognition (psychology)

Metrics

11
Cited By
7.49
FWCI (Field Weighted Citation Impact)
74
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Industrial Vision Systems and Defect Detection
Physical Sciences →  Engineering →  Industrial and Manufacturing Engineering
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Optical Sensing Technologies
Physical Sciences →  Physics and Astronomy →  Instrumentation

Related Documents

JOURNAL ARTICLE

Multi-modal adapter for RGB-T tracking

He WangTianyang XuZhangyong TangXiao-Jun WuJosef Kittler

Journal:   Information Fusion Year: 2025 Vol: 118 Pages: 102940-102940
JOURNAL ARTICLE

RGB-T Tracking via Multi-expert Correlation Filters using Spatial-temporal Robustness

Fei ZhangShiping MaZhijun LiYule Zhang

Journal:   2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) Year: 2020 Pages: 360-364
JOURNAL ARTICLE

RGB-T tracking network based on multi-modal feature fusion

Jing JinJian‐Qin LiuFengwen ZHAI

Journal:   Optics and Precision Engineering Year: 2025 Vol: 33 (12)Pages: 1940-1954
© 2026 ScienceGate Book Chapters — All rights reserved.