In this paper, we present an algorithm for real-time multi-person tracking with a humanoid sensor head featuring a stereo camera and multiple microphones. The proposed algorithm works with a dynamic combination of simple but fast features, which allow us to cope with limited on-board resources. By using a combination of democratic integration and layered sampling it can deal with deficiencies of single features as well as partial occlusion using the very same dynamic fusion mechanism. Both audio and video signals are processed to form a joint attention map of the surroundings. This map allows us to initialize tracks automatically and to control the robot's focus of attention dynamically.
Nickel, KaiStiefelhagen, Rainer
Fotios TalantzisAristodemos PnevmatikakisA.G. Constantinides
Andrew ChenMorteza Biglari-AbhariKevin I‐Kai Wang
Xiaofeng WangLilian ZhangDuo WangXiao Hu