JOURNAL ARTICLE

Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding

Abstract

Human gaze data offer cognitive information that reflects natural language comprehension. Indeed, augmenting language models with human scanpaths has proven beneficial for a range of NLP tasks, including language understanding. However, the applicability of this approach is hampered because the abundance of text corpora is contrasted by a scarcity of gaze data. Although models for the generation of human-like scanpaths during reading have been developed, the potential of synthetic gaze data across NLP tasks remains largely unexplored. We develop a model that integrates synthetic scanpath generation with a scanpath-augmented language model, eliminating the need for human gaze data. Since the model's error gradient can be propagated throughout all parts of the model, the scanpath generator can be fine-tuned to downstream tasks. We find that the proposed model not only outperforms the underlying language model, but achieves a performance that is comparable to a language model augmented with real human gaze data. Our code is publicly available.

Keywords:
Computer science Artificial intelligence Language model Natural language generation Gaze Natural language processing Generator (circuit theory) Natural language

Metrics

9
Cited By
2.30
FWCI (Field Weighted Citation Impact)
31
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

PLM-AS: Pre-trained Language Models Augmented with Scanpaths for Sentiment Classification

Duo YangNora Hollenstein

Journal:   Proceedings of the Northern Lights Deep Learning Workshop Year: 2023 Vol: 4
JOURNAL ARTICLE

Robustness of Pre-trained Language Models for Natural Language Understanding

Utama, Prasetya Ajie

Journal:   TUbilio (Technical University of Darmstadt) Year: 2024
BOOK-CHAPTER

Pre-trained Language Models

Huaping ZhangJianyun Shang

Year: 2025 Pages: 73-90
© 2026 ScienceGate Book Chapters — All rights reserved.