Gaze tracking has numerous applications in fields such as medicine, psychology, virtual reality, marketing, and safety. However, predicting gaze accurately is challenging in real-world situations where images are degraded by motion blur, video compression, and noise. Super-resolution has been shown to improve image quality and is examined here for its usefulness in improving appearance-based gaze tracking. The proposed method, a two-step framework based on the SwinIR super-resolution model, consistently outperforms the state-of-the-art, especially in low-resolution or degraded image scenarios. Furthermore, we examine the use of super-resolution through the lens of self-supervised learning for gaze prediction. A novel architecture called “SuperVision” is proposed, which uses a SR backbone network fused to a ResNet18. This method uses 5x less labeled data and outperforms the state-of-the-art method by 15.5%, which uses 100% of training data. The proposed methods have potential to create cost-efficient and high-performing gaze tracking software.
Kar-Han TanDavid KriegmanNarendra Ahuja
Jinsoo ChoiByungtae AhnJaesik ParlIn So Kweon