Changli LiElizabeth TongKao ZhangNingxin ChengZhongyuan LaiZhigeng Pan
Recently, with the widespread application of deep learning networks, appearance-based gaze estimation has made breakthrough progress. However, most methods focus on feature extraction from the facial region while neglecting the critical role of the eye region in gaze estimation, leading to insufficient eye detail representation. To address this issue, this paper proposes a multi-stream multi-input network architecture (MSMI-Net) based on appearance. The model consists of two independent streams designed to extract high-dimensional eye features and low-dimensional features, integrating both eye and facial information. A parallel channel and spatial attention mechanism is employed to fuse low-dimensional eye and facial features, while an adaptive weight adjustment mechanism (AWAM) dynamically determines the contribution ratio of eye and facial features. The concatenated high-dimensional and fused low-dimensional features are processed through fully connected layers to predict the final gaze direction. Extensive experiments on the EYEDIAP, MPIIFaceGaze, and Gaze360 datasets validate the superiority of the proposed method.
Zichen ZhaoWeitao KeQingsong YanXiaofeng Lu
Zhangfang HuYanling XiaYuan LuoLan Wang
Wei CaoYuqin SongWenzhuo GaoY. F. LyuHe Gao