JOURNAL ARTICLE

Context-Aware Driver Attention Estimation Using Multi-Hierarchy Saliency Fusion With Gaze Tracking

Zhongxu HuYuxin CaiQinghua LiKui SuChen Lv

Year: 2024 Journal:   IEEE Transactions on Intelligent Transportation Systems Vol: 25 (8)Pages: 8602-8614   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Accurate vision-based driver attention estimation is a challenging task due to the limitations of the visual sensor, and it is a critical and fundamental function of building a human-centered intelligent driving system. Unlike previous investigations which consider it a classification task, this study newly introduces scenario contextual information to improve the accuracy and obtain a fine-grained estimation. Therefore, a data-driven hybrid architecture for context-aware driver attention estimation is proposed to jointly model the scene and state of the driver during driving. A visual saliency map is typically assumed to highlight a distinct area that can capture human attention. To leverage this characteristic, a multi-hierarchy fusion network is proposed to extract effectively saliency features of a scene image. A gaze-tracking network is employed to estimate the potential focus zone of the driver, and this coarse estimation is optimized subsequently using the extracted saliency information to obtain a fine-grained estimation. Three related and commonly used task-agnostic and task-driven datasets are adopted to evaluate the proposed saliency estimation model, and experimental results show that it can achieve state-of-the-art performance. To verify the joint modeling methodology, two new driving attention datasets supplemented with driver information are collected based on the existing ones. The results of comparative experiments indicate that the consideration of saliency features can significantly improve the estimation performance of gaze fixation, demonstrating the feasibility and efficiency of the proposed method.

Keywords:
Gaze Artificial intelligence Computer science Sensor fusion Context (archaeology) Computer vision Hierarchy Tracking (education) Eye tracking Estimation Fusion Context model Human–computer interaction Machine learning Psychology Engineering Geography Object (grammar)

Metrics

9
Cited By
4.77
FWCI (Field Weighted Citation Impact)
102
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gaze Tracking and Assistive Technology
Physical Sciences →  Computer Science →  Human-Computer Interaction
© 2026 ScienceGate Book Chapters — All rights reserved.