A Survey on Multimodal Large Language Models for Autonomous Driving

Can Cui; Yunsheng Ma; Xu Cao; Wenqian Ye; Yang Zhou; Kaizhao Liang; Jintai Chen; Juanwu Lu; Zichong Yang; Kuei-Da Liao; Tianren Gao; Erlong Li; Tang Kun; Zhipeng Cao; Tong Zhou; Ao Liu; Xinrui Yan; Shuqi Mei; Jianguo Cao; Ziran Wang; Chao Zheng

doi:10.1109/wacvw60836.2024.00106

ScienceGate Book Chapters

JOURNAL ARTICLE

A Survey on Multimodal Large Language Models for Autonomous Driving

Get Full-Text PDF Get Analytical Report

Abstract

With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry.

Keywords:

Computer science Human–computer interaction

Metrics

227

Cited By

120.35

FWCI (Field Weighted Citation Impact)

223

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

A Survey on Multimodal Large Language Models for Autonomous Driving

Abstract

Metrics

Citation History

Topics

Related Documents

Large Language Models for Human-Like Autonomous Driving: A Survey

Multimodal Large Language Models: A Survey

Towards Robust Autonomous Driving: Conditional Multimodal Large Language Models for Fine-Grained Perception

Efficient multimodal large language models: a survey

Facilitating Autonomous Driving Tasks With Large Language Models