In this paper, a multimodal system is designed in the form of an active audio-vision in order to improve the perceptual capability of a robot in a noisy environment. The system running in real-time consists of 1) audition modality, 2) a complementary vision modality and 3) motion modality incorporating intelligent behaviors based on the data obtained from both modalities. The tasks of audition and vision are to detect, localize and track a speaker independently. The aim of motion modality is to enable a robot to have intelligent and human-like behaviors by using localization results from the sensor fusion. The system is implemented on a mobile robot platform in a real-time environment and the speaker tracking performance of the fusion is confirmed to be improved compared to each of sensory modalities.
Gaurav SinghPaul GhanemTaşkın Padır