S.C. HuangTianzhong WangWeiquan LiuYingchao PiaoJinhe SuGuorong CaiHuilin Xu
Gaze estimation is a cornerstone of applications such as human–computer interaction and behavioral analysis, e.g., for intelligent transport systems. Nevertheless, existing methods predominantly rely on coarse-grained features from deep layers of visual encoders, overlooking the critical role that fine-grained details from shallow layers play in gaze estimation. To address this gap, we propose a novel Hierarchical Fine-Grained Attention Decoder (HFGAD), a lightweight fine-grained decoder that emphasizes the importance of shallow-layer information in gaze estimation. Specifically, HFGAD integrates a fine-grained amplifier MSCSA that employs multi-scale spatial-channel attention to direct focus toward gaze-relevant regions, and also incorporates a shallow-to-deep fusion module SFM to facilitate interaction between coarse-grained and fine-grained information. Extensive experiments on three benchmark datasets demonstrate the superiority of HFGAD over existing methods, achieving a remarkable 1.13° improvement in gaze estimation accuracy for in-car scenarios.
Daosong HuMingyue CuiK. X. Huang
Chenglin WuHuanqiang HuKean LinQing WangTianjian LiuGuannan Chen
Chenxu WangDanpei ZhaoXinhu QiZhuoran LiuZhenwei Shi
Ahmed A. AbdelrahmanThorsten HempelAly KhalifaAyoub Al-HamadiLaslo Dinges
Yerim JungNur Suriza SyazwanySujeong KimSang‐Chul Lee