Distillation-Centric Approaches in Visual Question Answering with Mixture of Experts

H. Hoang; Tung D. Le; Nguyen Tien Huy

doi:10.32913/mic-ict-research.v2025.n2.1352

ScienceGate Book Chapters

JOURNAL ARTICLE

Distillation-Centric Approaches in Visual Question Answering with Mixture of Experts

H. Hoang Tung D. Le Nguyen Tien Huy

Year: 2025 Journal: Research and Development on Information and Communication Technology Pages: 5-5

DOI: 10.32913/mic-ict-research.v2025.n2.1352

Get Full-Text PDF Get Analytical Report

Abstract

Recent advancements in computer vision and natural language processing were applied to the Visual Question Answering task. Nonetheless, a significant proportion of models exhibiting high accuracy possess extensive architectural components. This has a significant impact on the process of bringing the technology to practical applications such as assistive devices for the blind and visually impaired, and other related fields. Our research focuses on compressing the Visual Question Answering model on the Vietnamese dataset by utilizing the knowledge distillation method. Furthermore, in order to enhance precision, we have also developed a Mixture of ViVQA Experts system that will adapt to each type of question for improving accuracy while increasing only a few parameters and not wasting time retraining the entire system from scratch. With a total of 204M parameters, this approach has reduced the size by 24.51% compared to the original model while only reducing accuracy by 6.59\% on the overall test set. More specifically, we have made accuracy improvements on each question type: "number" increased by 1.35% and "color" increased by 0.48\% compared to our distillation model. The code and pretrained models are available at: anonymous.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Distillation-Centric Approaches in Visual Question Answering with Mixture of Experts

Abstract

Metrics

Topics

Related Documents

Distillation-Centric Approaches in Visual Question Answering with Mixture of Experts

Adaptive Momentum Mixture-of-Experts for Continual Visual Question Answering

Large-small model collaboration for medical visual question answering with task aware mixture of experts and relation knowledge distillation

Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering

Answer Distillation for Visual Question Answering