Gaze estimation is well established as a significant research topic in computer vision given its importance for different applications. Recent studies demonstrate that other regions of the face beyond the two eyes contain valuable information for gaze estimation. Motivated by these works, we propose a novel and powerful deep convolutional network with multi-scale channel and spatial attention, which only takes the full-face image as input without additional modules to detect the eyes and estimate the head pose, to handle the gaze estimation task. It uses multi-scale channel and spatial information to adaptively select and increase important features and suppress some unnecessary facial regions which may not contribute to estimate gaze. By rigorously evaluating our module, we show that our method significantly outperforms the state-of-the-art for 3D gaze estimation on multiple public datasets.
Yuanyuan ZhangJing LiGaoxiang Ouyang
Ce LiK. WeiShaolong RenFan HuangHangfei JiangJialin Ma
Zhiming ZhuShu XuJiexin ZhangChunguo LiYongming HuangLüxi Yang
Haohan ChenHongjia LiuShiyong LanW WangYixin QiaoYiming LiGuonan Deng
Zheng GaoPuneet KumarHao ZouXiaobai Li