JOURNAL ARTICLE

Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

Jing LiuLizong ZhangCao ChenYinong ShiChong MuJiaxin Li

Year: 2025 Journal:   Big Data Mining and Analytics Vol: 8 (6)Pages: 1418-1431   Publisher: Tsinghua University Press

Abstract

Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities. However, LLMs are commonly perceived as implicit knowledge bases, and their generative and in-context learning potential remains underutilized. Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts. In light of this, we propose Knowledge Generation with Frozen Language Models (KGFLM), a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA. Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order, thereby activating the frozen LLM to produce more useful knowledge statements for better predictions. The generated knowledge statements can also serve as interpretable rationales. In our method, the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question, without requiring additional annotations. Furthermore, a series of experiments are conducted on A-OKVQA and OKVQA datasets. The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
36
Refs
0.37
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.