JOURNAL ARTICLE

Few-Shot Composition Learning for Image Retrieval with Prompt Tuning

Junda WuRui WangHandong ZhaoRuiyi ZhangChaochao LuShuai LiRicardo Henao

Year: 2023 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (4)Pages: 4729-4737   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

We study the problem of composition learning for image retrieval, for which we learn to retrieve target images with search queries in the form of a composition of a reference image and a modification text that describes desired modifications of the image. Existing models of composition learning for image retrieval are generally built with large-scale datasets, demanding extensive training samples, i.e., query-target pairs, as supervision, which restricts their application for the scenario of few-shot learning with only few query-target pairs available. Recently, prompt tuning with frozen pretrained language models has shown remarkable performance when the amount of training data is limited. Inspired by this, we propose a prompt tuning mechanism with the pretrained CLIP model for the task of few-shot composition learning for image retrieval. Specifically, we regard the representation of the reference image as a trainable visual prompt, prefixed to the embedding of the text sequence. One challenge is to efficiently train visual prompt with few-shot samples. To deal with this issue, we further propose a self-upervised auxiliary task via ensuring that the reference image can retrieve itself when no modification information is given from the text, which facilitates training for the visual prompt, while not requiring additional annotations for query-target pairs. Experiments on multiple benchmarks show that our proposed model can yield superior performance when trained with only few query-target pairs.

Keywords:
Computer science Task (project management) Embedding Artificial intelligence Image retrieval Image (mathematics) One shot Shot (pellet) Representation (politics) Composition (language) Pattern recognition (psychology) Information retrieval Visual Word Machine learning Computer vision

Metrics

5
Cited By
0.72
FWCI (Field Weighted Citation Impact)
47
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Ontology-enhanced Prompt-tuning for Few-shot Learning

Hongbin YeNingyu ZhangShumin DengXiang ChenHui ChenFeiyu XiongXi ChenHuajun Chen

Journal:   Proceedings of the ACM Web Conference 2022 Year: 2022 Pages: 778-787
JOURNAL ARTICLE

PPT: Pre-trained Prompt Tuning for Few-shot Learning

Yuxian GuXu HanZhiyuan LiuMinlie Huang

Journal:   Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Year: 2022 Pages: 8410-8423
JOURNAL ARTICLE

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

Jintao RongHao ChenLinlin OuLinlin OuXinyi YuYifan Liu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2026 Pages: 1-1
© 2026 ScienceGate Book Chapters — All rights reserved.