Finetuning Language Models for Multimodal Question Answering

Xin Zhang; Wen Xie; Ziqi Dai; Jun Rao; Haokun Wen; Xuan Luo; Meishan Zhang; Min Zhang

doi:10.1145/3581783.3612837

ScienceGate Book Chapters

JOURNAL ARTICLE

Finetuning Language Models for Multimodal Question Answering

Xin Zhang Wen Xie Ziqi Dai Jun Rao Haokun Wen Xuan Luo Meishan Zhang Min Zhang

Year: 2023 Pages: 9420-9424

DOI: 10.1145/3581783.3612837

Get Full-Text PDF Get Analytical Report

Abstract

To achieve multi-modal intelligence, AI must be able to process and respond to inputs from multimodal sources. However, many current question answering models are limited to specific types of answers, such as yes/no and number, and require additional human assessments. Recently, Visual-Text Question Answering (VQTA) dataset has been proposed to fix this gap. In this paper, we conduct an exhaustive analysis and exploration of this task. Specifically, we implement a T5-based multi-modal generative network that overcomes the limitations of traditional labeling space and provides more freedom in responses. Our approach achieve the best performance in both English and Chinese tracks in the VTQA challenge.

Keywords:

Computer science Question answering Generative grammar Artificial intelligence Task (project management) Process (computing) Machine learning Natural language processing Modal

Metrics

Cited By

0.73

FWCI (Field Weighted Citation Impact)

Refs

0.67

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Finetuning Language Models for Multimodal Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

MedQAS: A Medical Question Answering System Based on Finetuning Large Language Models

AcaQAS: An Academic Question Answering System Based on Finetuning Large Language Models

Evaluating Multimodal Large Language Models on Educational Textbook Question Answering

Finetuning XLM-Roberta Pretrained Models For Question Answering In Hindi

Question Answering with Language Models