JOURNAL ARTICLE

Audio-visual human tracking for active robot perception

Abstract

In this paper, a multimodal system is designed in the form of an active audio-vision in order to improve the perceptual capability of a robot in a noisy environment. The system running in real-time consists of 1) audition modality, 2) a complementary vision modality and 3) motion modality incorporating intelligent behaviors based on the data obtained from both modalities. The tasks of audition and vision are to detect, localize and track a speaker independently. The aim of motion modality is to enable a robot to have intelligent and human-like behaviors by using localization results from the sensor fusion. The system is implemented on a mobile robot platform in a real-time environment and the speaker tracking performance of the fusion is confirmed to be improved compared to each of sensory modalities.

Keywords:
Modality (human–computer interaction) Computer vision Computer science Artificial intelligence Modalities Mobile robot Sensor fusion Robot Perception Stimulus modality Active perception Audio visual Tracking (education) Multimedia

Metrics

3
Cited By
0.54
FWCI (Field Weighted Citation Impact)
0
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.