Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

B. Reichman; Larry Heck

doi:10.1109/iccvw60793.2023.00304

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

B. Reichman Larry Heck

Year: 2023 Pages: 2829-2834

DOI: 10.1109/iccvw60793.2023.00304

Get Full-Text PDF Get Analytical Report

Abstract

In many language processing tasks including most notably Large Language Modeling (LLM), retrieval augmentation improves the performance of the models by adding information during inference that may not be present in the model's weights. This technique has been shown to be particularly useful in multimodal settings. For some tasks, like Outside Knowledge Visual Question Answering (OK-VQA), retrieval augmentation is required given the open nature of the knowledge. In many prior works for the OK-VQA task, the retriever is either a unimodal language retriever or an untrained cross-modal retriever. In this work, we present a weakly supervised training approach for cross-modal retrievers. Our method takes inspiration from the natural language modeling task of information retrieval and extends those methods to cross-modal retrieval. Since the OK-VQA task does not typically have consistent ground truth retrieval labels, we evaluate our model using lexical overlap between the ground truth and the retrieved passage. Our approach showed an average recall improvement of 28% across a large range of retrieval sizes compared to a baseline backbone network.

Keywords:

Question answering Computer science Modal Information retrieval Artificial intelligence Natural language processing

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.56

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Passage Retrieval for Outside-Knowledge Visual Question Answering

Cross-Modal Retrieval for Knowledge-Based Visual Question Answering

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

Retrieval Augmented Visual Question Answering with Outside Knowledge