Abstract

This paper proposes a system for tracking people in three dimensions, utilizing audiovisual information from multiple acoustic and video sensors. The proposed system comprises a video and an audio subsystem combined using a Kalman filter. The video subsystem combines in 3D a number of 2D trackers based on a variation of Stauffer's adaptive background algorithm with spatio-temporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. The audio subsystem uses an information theoretic metric upon a pair of microphones to estimate the direction from which sound is arriving from. Combining measurements from a series of pairs the actual coordinate of the speaker in space is derived. Experiments show that gains are to be expected when fusion of the separate tracking systems is performed

Keywords:
Computer science Kalman filter BitTorrent tracker Tracking (education) Computer vision Artificial intelligence Metric (unit) Tracking system Sensor fusion Audio signal Eye tracking Speech recognition Engineering Speech coding

Metrics

12
Cited By
0.32
FWCI (Field Weighted Citation Impact)
21
Refs
0.52
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.