MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Yongzhu Miao; Shasha Li; Jintao Tang; Ting Wang

doi:10.1109/icme55011.2023.00013

ScienceGate Book Chapters

JOURNAL ARTICLE

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Yongzhu Miao Shasha Li Jintao Tang Ting Wang

Year: 2023 Pages: 25-30

DOI: 10.1109/icme55011.2023.00013

Get Full-Text PDF Get Analytical Report

Abstract

Prompt tuning, like CoOp, has recently shown promising vision recognizing and transfer learning ability on various downstream tasks with the emergence of large pre-trained vision-language models like CLIP. However, we identify that existing uni-modal prompt tuning approaches may result in sub-optimal performance since this uni-modal design breaks the original alignment of textual and visual representations in the pre-trained model. Inspired by the nature of pre-trained vision-language models, we aim to achieve completeness in prompt tuning and propose a novel approach called Multi-modal Deep-symphysis Prompt Tuning, dubbed as MuDPT, which extends independent multi-modal prompt tuning by additionally learning a model-agnostic transformative network to allow deep hierarchical bi-directional prompt fusion. We evaluate the effectiveness of MuDPT on few-shot vision recognition and out-of-domain generalization tasks. Compared with the state-of-the-art methods, MuDPT achieves better recognition and generalization ability with an apparent margin thanks to synergistic alignment of textual and visual representations. Our code is available at: https://github.com/Mechrev0/MuDPT.

Keywords:

Computer science Artificial intelligence Margin (machine learning) Modal Generalization Deep learning Language understanding Machine learning

Metrics

Cited By

0.77

FWCI (Field Weighted Citation Impact)

Refs

0.72

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

MPT4LM: Multi-Modal Prompt Tuning Makes Pre-Trained Large Language Models Better Vision-Language Learners

CPT: Colorful Prompt Tuning for pre-trained vision-language models

Prompt Tuning for Discriminative Pre-trained Language Models

Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model

Constraint embedding for prompt tuning in vision-language pre-trained model