Audio-visual speaker tracking in 3D space is a challenging problem. Although the classical particle filter based methods have shown effectiveness in audio-visual speaker tracking, the performance degrades considerably when the measurements are disturbed by noise. To this end, a novel two-layer particle filter is proposed for 3D audio-visual speaker tracking. Firstly, two groups of particles, which are generated from the audio and video streams respectively, are propagated independently in the audio layer and visual layer. Then, the audio and visual likelihoods are combined in an adaptive sigmoid function, which can adjust particle weights according to the confidence of two modalities. Finally, an optimal particle set selected from two groups of particles is proposed to determine the speaker position and reset the particle positions in the next frame. Experiments on AV16.3 database show that our method outperforms the trackers using individual modalities and the existing approaches in the 3D space and on the image plane.
Yidi LiHong LiuBing YangRunwei DingYang Chen
Yidi LiHong LiuBing YangRunwei DingYang Chen
Yidi LiHong LiuBing YangRunwei DingYang Chen
Xinyuan QianAlessio BruttiMaurizio OmologoAndrea Cavallaro
Hong LiuYongheng SunYidi LiBing Yang