Michael HeuerAyoub Al-HamadiBernd MichaelisAndreas Wendemuth
This paper describes a methodology for fusing multimodal data meaningful together, in order to detect and track a speaker with a conventional sensor setup. We use Gaussian mixtures to combine the sensor information within a particle filter, such that a single speaker can be identified in the presence of multiple visual observations. The major advantages are design considerations that let the system perform in real time, while using an easily extensible framework. Besides, we highly reduce noise which gives us a more dependable prediction. Results illustrate the localization estimations in a two- and a three-person scenario.
Saeed AnwarAyoub Al-HamadiMichael Heuer
Yoon Seob LimJongSuk ChoiMunsang Kim
Zhiyu ZhouDichong WuXiaomei PengZefei ZhuChuanyu WuJinbin Wu