Outside Knowledge Visual Question Answering Version 2.0

Benjamin Z. Reichman; Anirudh Sundar; Christopher Richardson; Tamara Zubatiy; Prithwijit Chowdhury; Aaryan Shah; Jack Truxal; Micah Grimes; Dristi Shah; Woo Ju Chee; Saif Punjwani; Atishay Jain; Larry Heck

doi:10.1109/icassp49357.2023.10096074

JOURNAL ARTICLE

Outside Knowledge Visual Question Answering Version 2.0

Benjamin Z. Reichman Anirudh Sundar Christopher Richardson Tamara Zubatiy Prithwijit Chowdhury Aaryan Shah Jack Truxal Micah Grimes Dristi Shah Woo Ju Chee Saif Punjwani Atishay Jain Larry Heck

Year: 2023 Pages: 1-5

DOI: 10.1109/icassp49357.2023.10096074

Get Full-Text PDF Get Analytical Report

Abstract

Visual question answering (VQA) lies at the intersection of language and vision research. It functions as a building block for multimodal conversational AI and serves as a testbed for assessing a model's capability for open-domain scene understanding. While progress in this area was initially accelerated with the 2015 release of the popular and large dataset "VQA", new datasets are required to continue this research momentum. For example, the 2019 Outside Knowledge VQA dataset "OKVQA" extends VQA by adding more challenging questions that require complex, factual, and commonsense knowledge. However, in our analysis, we found that 41.4% of the dataset needed to be corrected and 10.6% needed to be removed. This paper describes the analysis, corrections, and removals completed and presents a new dataset: OK-VQA Version 2.0. To gain insights into the impact of the changes on OK-VQA research, the paper presents results on state-of-the-art models retrained with this new dataset. The side-by-side comparisons show that one method in particular, Knowledge Augmented Transformer for Vision-and-Language, extends its relative lead over competing methods. The dataset is available online. ¹

Keywords:

Question answering Computer science Information retrieval Natural language processing

Metrics

Cited By

0.55

FWCI (Field Weighted Citation Impact)

Refs

0.59

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Outside Knowledge Visual Question Answering Version 2.0

Abstract

Metrics

Citation History

Topics

Related Documents

Outside-Knowledge Visual Question Answering for Visual Impaired People

Interpretable Visual Question Answering Referring to Outside Knowledge

Retrieval Augmented Visual Question Answering with Outside Knowledge

Increasing Interpretability in Outside Knowledge Visual Question Answering

Passage Retrieval for Outside-Knowledge Visual Question Answering