JOURNAL ARTICLE

Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism

Yujie LiJiahui ChenJiaxin MaXiwen WangWei Zhang

Year: 2023 Journal:   Sensors Vol: 23 (13)Pages: 6226-6226   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capability is limited, making it difficult to further improve prediction performance. In recent years, transformer models have been introduced for gaze estimation and have achieved state-of-the-art performance. However, their slicing-and-mapping mechanism for processing local image patches can compromise local spatial information. Moreover, the single down-sampling rate and fixed-size tokens are not suitable for multiscale feature learning in gaze estimation tasks. To overcome these limitations, this study introduces a Swin Transformer for gaze estimation and designs two network architectures: a pure Swin Transformer gaze estimation model (SwinT-GE) and a hybrid gaze estimation model that combines convolutional structures with SwinT-GE (Res-Swin-GE). SwinT-GE uses the tiny version of the Swin Transformer for gaze estimation. Res-Swin-GE replaces the slicing-and-mapping mechanism of SwinT-GE with convolutional structures. Experimental results demonstrate that Res-Swin-GE significantly outperforms SwinT-GE, exhibiting strong competitiveness on the MpiiFaceGaze dataset and achieving a 7.5% performance improvement over existing state-of-the-art methods on the Eyediap dataset.

Keywords:
Gaze Computer science Convolutional neural network Artificial intelligence Transformer Sliding window protocol Pattern recognition (psychology) Eye tracking Slicing Computer vision Machine learning Window (computing) Voltage Engineering

Metrics

11
Cited By
2.68
FWCI (Field Weighted Citation Impact)
40
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Gaze Tracking and Assistive Technology
Physical Sciences →  Computer Science →  Human-Computer Interaction
Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies
Retinal Imaging and Analysis
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
© 2026 ScienceGate Book Chapters — All rights reserved.