LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao; Jiacheng Ruan; Suncheng Xiang; Zefang Yu; Ke Ji; Mingye Xie; Ting Liu; Yuzhuo Fu

doi:10.1609/aaai.v38i3.27950

ScienceGate Book Chapters

JOURNAL ARTICLE

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao Jiacheng Ruan Suncheng Xiang Zefang Yu Ke Ji Mingye Xie Ting Liu Yuzhuo Fu

Year: 2024 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 38 (3)Pages: 1815-1823 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v38i3.27950

Get Full-Text PDF Get Analytical Report

Abstract

With the success of pre-trained visual-language (VL) models such as CLIP in visual representation tasks, transferring pre-trained models to downstream tasks has become a crucial paradigm. Recently, the prompt tuning paradigm, which draws inspiration from natural language processing (NLP), has made significant progress in VL field. However, preceding methods mainly focus on constructing prompt templates for text and visual inputs, neglecting the gap in class label representations between the VL models and downstream tasks. To address this challenge, we introduce an innovative label alignment method named \textbf{LAMM}, which can dynamically adjust the category embeddings of downstream datasets through end-to-end training. Moreover, to achieve a more appropriate label distribution, we propose a hierarchical loss, encompassing the alignment of the parameter space, feature space, and logits space. We conduct experiments on 11 downstream vision datasets and demonstrate that our method significantly improves the performance of existing multi-modal prompt learning models in few-shot scenarios, exhibiting an average accuracy improvement of 2.31(\%) compared to the state-of-the-art methods on 16 shots. Moreover, our methodology exhibits the preeminence in continual learning compared to other prompt tuning methods. Importantly, our method is synergistic with existing prompt tuning methods and can boost the performance on top of them. Our code and dataset will be publicly available at https://github.com/gaojingsheng/LAMM.

Keywords:

Modal Computer science Artificial intelligence Mathematics Materials science Composite material

Metrics

Cited By

3.64

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

LAMM: Label Alignment for Multi-Modal Prompt Learning

Abstract

Metrics

Citation History

Topics

Related Documents

MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Multi-modal Contextual Prompt Learning for Multi-label Classification with Partial Labels

MaPLe: Multi-modal Prompt Learning

Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning

Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation