Guojing RenYang ZhangQingjuan Feng
Due to the continuous and dynamic nature of gaze estimation, the true gaze point at each moment is closely related to the previous moment. Simply detecting individual frames of facial images cannot yield accurate gaze information. In current CNN-based gaze estimation methods, the effective utilization of eye movement temporal information and the ability to capture global relationships in the feature extraction process remain problematic. Addressing these concerns, this paper proposes a novel gaze estimation framework, named FE-net, which incorporates a temporal network. This framework introduces channel attention modules and self-attention modules, enhancing the comprehensive utilization of extracted features and reinforcing the contribution of valuable regions to gaze estimation. We further integrate an RNN structure to learn the temporal dynamics of eye movement processes, significantly improving gaze direction prediction accuracy. This framework predicts the gaze directions of left and right eyes separately using monocular and facial features and computes the overall gaze direction. FE-net achieves state-of-the-art accuracy of 3.19° and 3.16° on the EVE dataset and the MPIIFaceGaze dataset, respectively.
Xinmei WuLin LiHaihong ZhuGang ZhouLinfeng LiFei SuShen HeYang‐Gang WangXue Long
Chenwei ZhaoHui XuBicai YinJingyi Zhao
XU Jinlong, DONG Mingrui, LI Yingying, LIU Yanqing, HAN Lin
Zhao ZichenLai WeiXiaofeng LuLiu Zhi
L. R. D. MurthyPradipta Biswas