Humans locate and track objects and other humans in their surroundings using audio, vision, or a combination of the two sensory modalities. A common strategy for humans searching for others in an indoor environment is to rely on a sound's Direction of Arrival (DoA), as well as their knowledge of whether a room was previously occupied. In this paper, a similar search behavior is implemented on a mobile robot for the purpose of tracking other humans. To produce this search behavior, we develop an algorithm that performs probabilistic inference of human presence in a specific map region using two sensory cues: DoA of sound and a vision-based estimate of human proximity. A key characteristic of this approach is that the robot can navigate towards a human irrespective of whether the sound signal is continuous, sporadic, or absent altogether. We deploy the proposed search behavior on a robot and evaluate its efficacy at finding a target person across multiple rooms, considering varying levels of human sound (e.g., calling out occasionally, once, or not at all). Our experimental findings indicate that while audio signals are not vital in localizing the target person, they greatly reduce the search time.
Chenguang HuangOier MeesAndy ZengWolfram Burgard