Jiahui ChenJiaxin MaXiwen WangLongzhao HuangYujie Li
The direction of human eye gaze is an important human behavior information that reflects the level of attention and cognitive state of the gazer towards various visual information in the environment. Eye gaze estimation has wide application value in multiple fields such as medical care, market research, and human-computer interaction. In recent years, some studies have introduced Transformer into the task of eye gaze estimation and achieved advanced performance. Although Transformer has better global modeling ability, its structural characteristics are not suitable for multi-scale feature learning in visual tasks. In addition, the global self-attention calculation for images has high complexity. This paper introduces Swin Transformer into the field of eye gaze estimation, using self-attention mechanism to perform more flexible and effective global modeling of images. The self-attention calculation uses Windows Multi-head Self-Attention(W-MSA) and Shifted Windows Multi-head Self-Attention (SW-MSA), which greatly reduces the calculation of image self-attention. The experimental results demonstrate that the Swin Transformer can obtain good results in the task of eye gaze estimation
Zhang ChengYanxia WangXinliang LiuWei LiangFenglin Huang
Yujie LiXinghe WangZihang MaYifu WangMichael C. Meyer
Zinan XiongChenxi WangYing LiYan LuoYu Cao
Gongpu WuChangyuan WangLina GaoJinna Xue
Ruijie ZhaoYuhuan WangSihui LuoSuyao ShouPinyan Tang