JOURNAL ARTICLE

Image Retrieval with Composed Query by Multi-Scale Multi-Modal Fusion

Abstract

Image retrieval with composed query (IR-CQ) is a challenging task since it aims to retrieve the target image according to a hybrid-modality query which consists of a reference image and a text modifier. Previous approaches mainly focus on designing various multi-modal fusion modules to fuse the hybrid-modality query, but these fusion modules are often suboptimal without considering sufficient fusion between the two modalities. In this paper, we propose a general fusion block by taking three fusion strategies: weighted summing, concatenating, and bilinear pooling. Importantly, this general fusion block can be deployed to fuse not only the hybrid-modality query but also the multi-scale features of the reference image. Specifically, we first fuse the multi-scale features of the reference image with the Multi-Scale Fusion (MSF) block and then fuse the features of the reference image and text modifier with the Multi-Modal Fusion (MMF) block, where both MSF and MMF are instantiations of our general fusion block. Extensive experiments on three benchmark datasets show that our proposed model significantly outperforms existing approaches.

Keywords:
Fuse (electrical) Computer science Image fusion Fusion Modality (human–computer interaction) Artificial intelligence Block (permutation group theory) Modal Pooling Fusion rules Benchmark (surveying) Scale (ratio) Image (mathematics) Pattern recognition (psychology) Computer vision Mathematics Engineering

Metrics

5
Cited By
2.65
FWCI (Field Weighted Citation Impact)
22
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval

Yahui XuYi BinJiwei WeiYang YangGuoqing WangHeng Tao Shen

Journal:   IEEE Transactions on Multimedia Year: 2023 Vol: 25 Pages: 8346-8357
JOURNAL ARTICLE

Privacy-Preserving Image Retrieval with Multi-Modal Query

Fucai ZhouZongye ZhangRuiwei Hou

Journal:   The Computer Journal Year: 2023 Vol: 67 (5)Pages: 1979-1992
BOOK-CHAPTER

Fusion Strategies for Large-Scale Multi-modal Image Retrieval

Petra BudíkováMichal BatkoPavel Zezula

Lecture notes in computer science Year: 2017 Pages: 146-184
JOURNAL ARTICLE

Deep multi query image retrieval

Cabir VuralEnver Akbacak

Journal:   Signal Processing Image Communication Year: 2020 Vol: 88 Pages: 115970-115970
© 2026 ScienceGate Book Chapters — All rights reserved.