JOURNAL ARTICLE

Generative-Based Fusion Mechanism for Multi-Modal Tracking

Zhangyong TangTianyang XuXiao‐Jun WuXuefeng ZhuJosef Kittler

Year: 2024 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 38 (6)Pages: 5189-5197   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we combine these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. Based on this, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and four challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance by setting new records on GTOT, LasHeR and RGBD1K. Code will be available at https://github.com/Zhangyong-Tang/GMMT.

Keywords:
Mechanism (biology) Generative grammar Modal Computer science Fusion Artificial intelligence Tracking (education) Fusion mechanism Lipid bilayer fusion Psychology Physics Materials science Linguistics Philosophy

Metrics

43
Cited By
30.22
FWCI (Field Weighted Citation Impact)
80
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Industrial Technology and Control Systems
Physical Sciences →  Engineering →  Control and Systems Engineering
Advanced Measurement and Detection Methods
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Tracking Humans using Multi-modal Fusion

Xiaotao ZouBir Bhanu

Year: 2006 Vol: 3 Pages: 4-4
JOURNAL ARTICLE

Multi-feature fusion tracking algorithm based on generative compression network

Peng WangHuitong FuXiaoyan LiJia GuoZhigang LvRuohai Di

Journal:   Future Generation Computer Systems Year: 2021 Vol: 124 Pages: 206-214
JOURNAL ARTICLE

RGB-T tracking network based on multi-modal feature fusion

Jing JinJian‐Qin LiuFengwen ZHAI

Journal:   Optics and Precision Engineering Year: 2025 Vol: 33 (12)Pages: 1940-1954
BOOK-CHAPTER

Speaker Tracking Using Multi-modal Fusion Framework

Saeed AnwarAyoub Al-HamadiMichael Heuer

Lecture notes in computer science Year: 2012 Pages: 539-546
© 2026 ScienceGate Book Chapters — All rights reserved.