JOURNAL ARTICLE

Gaze estimation based on swin transformer

Abstract

The direction of human eye gaze is an important human behavior information that reflects the level of attention and cognitive state of the gazer towards various visual information in the environment. Eye gaze estimation has wide application value in multiple fields such as medical care, market research, and human-computer interaction. In recent years, some studies have introduced Transformer into the task of eye gaze estimation and achieved advanced performance. Although Transformer has better global modeling ability, its structural characteristics are not suitable for multi-scale feature learning in visual tasks. In addition, the global self-attention calculation for images has high complexity. This paper introduces Swin Transformer into the field of eye gaze estimation, using self-attention mechanism to perform more flexible and effective global modeling of images. The self-attention calculation uses Windows Multi-head Self-Attention(W-MSA) and Shifted Windows Multi-head Self-Attention (SW-MSA), which greatly reduces the calculation of image self-attention. The experimental results demonstrate that the Swin Transformer can obtain good results in the task of eye gaze estimation

Keywords:
Gaze Computer science Transformer Artificial intelligence Computer vision Eye tracking Human–computer interaction Engineering Voltage

Metrics

1
Cited By
0.24
FWCI (Field Weighted Citation Impact)
12
Refs
0.48
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Gaze Tracking and Assistive Technology
Physical Sciences →  Computer Science →  Human-Computer Interaction
Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.