SwinGaze: Egocentric Gaze Estimation with Video Swin Transformer

Yujie Li; Xinghe Wang; Zihang Ma; Yifu Wang; Michael C. Meyer

doi:10.1109/mcsoc60832.2023.00026

ScienceGate Book Chapters

JOURNAL ARTICLE

SwinGaze: Egocentric Gaze Estimation with Video Swin Transformer

Yujie Li Xinghe Wang Zihang Ma Yifu Wang Michael C. Meyer

Year: 2023 Pages: 123-127

DOI: 10.1109/mcsoc60832.2023.00026

Get Full-Text PDF Get Analytical Report

Abstract

Egocentric gaze estimation represents a challenging and immensely significant task which has promising future applications in areas such as human-computer interaction and AR/VR. In this work, we propose a novel model based on the Video Swin Transformer architecture. Through the introduction of localized inductive bias, our model extracts essential local features from first person videos during the windowed self-attention computation process. Additionally, we approximate the modeling of the global context within the gaze region using a shift window approach. We evaluate our approach on the EGTEA Gaze+ dataset, a publicly available dataset for egocentric activity videos. Experimental results unequivocally demonstrate that our model achieves state-of-the-art performance.

Keywords:

Gaze Computer science Transformer Artificial intelligence Computer vision Computation Human–computer interaction Algorithm Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Gaze Tracking and Assistive Technology

Physical Sciences → Computer Science → Human-Computer Interaction

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Retinal Imaging and Analysis

Health Sciences → Medicine → Radiology, Nuclear Medicine and Imaging

SwinGaze: Egocentric Gaze Estimation with Video Swin Transformer

Abstract

Metrics

Topics

Related Documents

Gaze estimation based on swin transformer

Hybrid Swin Transformer for Appearance Gaze Estimation

Gaze-Swin: Enhancing Gaze Estimation with a Hybrid CNN-Transformer Network and Dropkey Mechanism

Video Swin Transformer

Functional gaze prediction in egocentric video