JOURNAL ARTICLE

Tracking Humans using Multi-modal Fusion

Abstract

Human motion detection plays an important role in automated surveillance systems. However, it is challenging to detect non-rigid moving objects (e.g. human) robustly in a cluttered environment. In this paper, we compare two approaches for detecting walking humans using multi-modal measurements- video and audio sequences. The first approach is based on the Time-Delay Neural Network (TDNN), which fuses the audio and visual data at the feature level to detect the walking human. The second approach employs the Bayesian Network (BN) for jointly modeling the video and audio signals. Parameter estimation of the graphical models is executed using the Expectation-Maximization (EM) algorithm. And the location of the target is tracked by the Bayes inference. Experiments are performed in several indoor and outdoor scenarios: in the lab, more than one person walking, occlusion by bushes etc. The comparison of performance and efficiency of the two approaches are also presented.

Keywords:
Modal Fusion Tracking (education) Computer science Sensor fusion Artificial intelligence Computer vision Materials science

Metrics

42
Cited By
4.54
FWCI (Field Weighted Citation Impact)
16
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Speaker Tracking Using Multi-modal Fusion Framework

Saeed AnwarAyoub Al-HamadiMichael Heuer

Lecture notes in computer science Year: 2012 Pages: 539-546
JOURNAL ARTICLE

Multi-modal multi-task feature fusion for RGBT tracking

Yujue CaiXiubao SuiGuohua Gu

Journal:   Information Fusion Year: 2023 Vol: 97 Pages: 101816-101816
JOURNAL ARTICLE

Generative-Based Fusion Mechanism for Multi-Modal Tracking

Zhangyong TangTianyang XuXiao‐Jun WuXuefeng ZhuJosef Kittler

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (6)Pages: 5189-5197
JOURNAL ARTICLE

Hierarchical multi-modal feature fusion for RGBT tracking

Na LiKai HuangZihang WangYuquan GanJinglu He

Journal:   Signal Image and Video Processing Year: 2025 Vol: 19 (13)
© 2026 ScienceGate Book Chapters — All rights reserved.