JOURNAL ARTICLE

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models

Abstract

Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for zero-shot visual question-answering (VQA) remains challenging, primarily due to the modality disconnect and task disconnect between the LLM and VQA tasks. End-to-end training on multimodal data may bridge the disconnects, but is inflexible and computationally expensive. To address this issue, we propose Img2LLM, a plug-and-play module that provides LLM prompts to enable LLMs to perform zeroshot VQA tasks without end-to-end training. We develop LLM-agnostic models describe image content as exemplar question-answer pairs, which prove to be effective LLM prompts. Img2LLM offers the following benefits: 1) It achieves comparable or better performance than methods relying on end-to-end training. For example, we outperform Flamingo [3] by 5.6% on VQAv2. On the challenging A-OKVQA dataset, our method outperforms few-shot methods by as much as 20%. 2) It flexibly interfaces with a wide range of LLMs to perform VQA. 3) It eliminates the need to specialize LLMs using end-to-end finetuning and serve highly specialized LLMs to end users, thereby reducing cost. Code is available via the LAVIS [28] framework at https://github.com/salesforce/LAVIS/tree/main/projects/img2llm-vqa.

Keywords:
Computer science Shot (pellet) Task (project management) Question answering Code (set theory) Generalization Artificial intelligence Tree (set theory) Natural language processing Programming language Engineering

Metrics

117
Cited By
21.29
FWCI (Field Weighted Citation Impact)
89
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.