JOURNAL ARTICLE

Deep Multimodal Neural Architecture Search

Abstract

Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks. In this paper, we devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks. Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone, where each encoder or decoder block corresponds to an operation searched from a predefined operation pool. On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks. By using a gradient-based NAS algorithm, the optimal architectures for different tasks are learned efficiently. Extensive ablation studies, comprehensive analysis, and comparative experimental results show that the obtained MMnasNet significantly outperforms existing state-of-the-art approaches across three multimodal learning tasks (over five datasets), including visual question answering, image-text matching, and visual grounding.

Keywords:
Computer science Artificial intelligence Deep learning Task (project management) Encoder Block (permutation group theory) Artificial neural network Set (abstract data type) Construct (python library) Multimodality Matching (statistics) Autoencoder Multimodal learning Machine learning

Metrics

90
Cited By
5.77
FWCI (Field Weighted Citation Impact)
42
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Self-Supervised Neural Architecture Search for Multimodal Deep Neural Networks

S SuzukiSatoshi Ono

Journal:   IEICE Transactions on Information and Systems Year: 2024 Vol: E108.D (6)Pages: 640-643
JOURNAL ARTICLE

BM-NAS: Bilevel Multimodal Neural Architecture Search

Yihang YinSiyu HuangX. D. Zhang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2022 Vol: 36 (8)Pages: 8901-8909
BOOK-CHAPTER

Architecture Search for Deep Neural Network

Xiangyu GaoMeikang QiuHui Zhao

Lecture notes in computer science Year: 2023 Pages: 581-596
© 2026 ScienceGate Book Chapters — All rights reserved.