JOURNAL ARTICLE

3D Audio-Visual Speaker Tracking with A Two-Layer Particle Filter

Abstract

Audio-visual speaker tracking in 3D space is a challenging problem. Although the classical particle filter based methods have shown effectiveness in audio-visual speaker tracking, the performance degrades considerably when the measurements are disturbed by noise. To this end, a novel two-layer particle filter is proposed for 3D audio-visual speaker tracking. Firstly, two groups of particles, which are generated from the audio and video streams respectively, are propagated independently in the audio layer and visual layer. Then, the audio and visual likelihoods are combined in an adaptive sigmoid function, which can adjust particle weights according to the confidence of two modalities. Finally, an optimal particle set selected from two groups of particles is proposed to determine the speaker position and reset the particle positions in the next frame. Experiments on AV16.3 database show that our method outperforms the trackers using individual modalities and the existing approaches in the 3D space and on the image plane.

Keywords:
Particle filter Computer science Computer vision Artificial intelligence Tracking (education) Filter (signal processing) Set (abstract data type) Speech recognition Eye tracking

Metrics

13
Cited By
0.99
FWCI (Field Weighted Citation Impact)
25
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Indoor and Outdoor Localization Technologies
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.