Modeling the user's attention is useful for responsive and interactive systems. This paper proposes a method for establishing joint visual attention between an experimenter and an intelligent agent. A rapid procedure is described to track the 3D head pose of the experimenter, which is used to approximate the gaze direction. The head is modeled with a sparse grid of points sampled from the surface of a cylinder. We then propose to employ a bottom-up saliency model to single out interesting objects in the neighborhood of the estimated focus of attention. We report results on a series of experiments, where a human experimenter looks at objects placed at different locations of the visual field, and the proposed algorithm is used to locate target objects automatically. Our results indicate that the proposed approach achieves high localization accuracy and thus constitutes a useful tool for the construction of natural human-computer interfaces.
Yanxiang ChenGang TaoQiangqiang XieMinglong Song
Yusuke SuganoYasuyuki MatsushitaYoichi Sato
Zhongxu HuYuxin CaiQinghua LiKui SuChen Lv
Eunji ChongNataniel RuizYongxin WangYun ZhangAgata RozgaJames M. Rehg