MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Yi Xin; Junlong Du; Qiang Wang; Ke Yan; Shouhong Ding

doi:10.1609/aaai.v38i14.29540

ScienceGate Book Chapters

JOURNAL ARTICLE

MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Yi Xin Junlong Du Qiang Wang Ke Yan Shouhong Ding

Year: 2024 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 38 (14)Pages: 16076-16084 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v38i14.29540

Get Full-Text PDF Get Analytical Report

Abstract

Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored with CLIP for adapting to downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these methods solely fine-tune a single modality (text or visual), disrupting the modality structure of CLIP. In this paper, we first propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during fine-tuning process. Building upon MmAP, we develop an innovative multi-task prompt learning framework. On the one hand, to maximize the complementarity of tasks with high similarity, we utilize a gradient-driven task grouping method that partitions tasks into several disjoint groups and assign a group-shared MmAP to each group. On the other hand, to preserve the unique characteristics of each task, we assign an task-specific MmAP to each task. Comprehensive experiments on two large multi-task learning datasets demonstrate that our method achieves significant performance improvements compared to full fine-tuning while only utilizing approximately ~ 0.09% of trainable parameters.

Keywords:

Modal Task (project management) Computer science Domain (mathematical analysis) Artificial intelligence Engineering Mathematics Systems engineering

Metrics

Cited By

14.79

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Intelligent Tutoring Systems and Adaptive Learning

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Context-Aware Activity Recognition Systems

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Abstract

Metrics

Citation History

Topics

Related Documents

LAMM: Label Alignment for Multi-Modal Prompt Learning

Multi-Task Learning Optimization Algorithm Research for Cross-Modal Feature Alignment

Multi-modal Deepfake Detection via Multi-task Audio-Visual Prompt Learning

Prompt Learning with Cross-Modal Feature Alignment for Visual Domain Adaptation

ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-domain Incremental Learning in CLIP