JOURNAL ARTICLE

Audio-visual multi-person tracking and identification for smart environments

Abstract

This paper presents a novel system for the automatic and unobtrusive tracking and identification of multiple persons in an indoor environment. Information from several fixed cameras is fused in a particle filter framework to simultaneously track multiple occupants. A set of steerable fuzzy-controlled pan-tilt-zoom cameras serves to smoothly track persons of interest and opportunistically capture facial close-ups for face identification. In parallel, speech segmentation, sound source localization and speaker identification are performed using several far-field microphones and arrays. The information coming asynchronously and sporadically from several sources, such as track updates and spatio-temporally localized visual and acoustic identification cues, is fused at higher level to gradually refine the global scene model and increase the system's confidence in the set of recognized identities. The system has been trained on a small set of users' faces and/or voices and showed good performance in natural meeting scenarios at quickly acquiring their identities and complementing the ID information missing in single modalities.

Keywords:
Computer science Identification (biology) Computer vision Zoom Particle filter Artificial intelligence Set (abstract data type) Track (disk drive) Segmentation Tracking (education) Filter (signal processing) Engineering

Metrics

38
Cited By
5.90
FWCI (Field Weighted Citation Impact)
26
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Indoor and Outdoor Localization Technologies
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.