Abstract

Gaze estimation is the task of estimating eye gaze from facial features. People tend to infer gaze by considering different facial properties from the whole image and their relations. However, existing methods rarely consider these various properties. In this paper, we propose a novel GazeCaps framework that represents various facial properties as different capsules. The capsules respond sensitively to transforms of facial properties by vectorial expression, which is effective for gaze estimation in which many facial components are nonlinearly transformed according to the direction of the head in addition to the perspective. Furthermore, we propose a Self-Attention Routing (SAR) module which can dynamically allocate attention to different capsules that contain important information and can be optimized as a single process without iterations. Through rigorous experiments, we confirm that the proposed method achieves state-of-the-art performance on various benchmarks. We also detail the generalization performance of the proposed model through a cross-dataset evaluation.

Keywords:
Gaze Computer science Generalization Task (project management) Facial expression Face (sociological concept) Perspective (graphical) Artificial intelligence Computer vision Process (computing) Human–computer interaction Pattern recognition (psychology) Mathematics

Metrics

25
Cited By
6.10
FWCI (Field Weighted Citation Impact)
29
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Gaze Tracking and Assistive Technology
Physical Sciences →  Computer Science →  Human-Computer Interaction
Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies
Retinal Imaging and Analysis
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
© 2026 ScienceGate Book Chapters — All rights reserved.