Enhancing Image-Text Matching with Adaptive Feature Aggregation

Zuhui Wang; Yunting Yin; I. V. Ramakrishnan

doi:10.1109/icassp48485.2024.10446913

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing Image-Text Matching with Adaptive Feature Aggregation

Zuhui Wang Yunting Yin I. V. Ramakrishnan

Year: 2024 Pages: 8245-8249

DOI: 10.1109/icassp48485.2024.10446913

Get Full-Text PDF Get Analytical Report

Abstract

Image-text matching aims to find matched cross-modal pairs accurately. While current methods often rely on projecting cross-modal features into a common embedding space, they frequently suffer from imbalanced feature representations across different modalities, leading to unreliable retrieval results. To address these limitations, we introduce a novel Feature Enhancement Module that adaptively aggregates single-modal features for more balanced and robust image-text retrieval. Additionally, we propose a new loss function that overcomes the shortcomings of original triplet ranking loss, thereby significantly improving retrieval performance. The proposed model has been evaluated on two public datasets and achieves competitive retrieval performance when compared with several state-of-the-art models. Implementation codes can be found here.

Keywords:

Computer science Embedding Ranking (information retrieval) Feature (linguistics) Modal Image (mathematics) Matching (statistics) Pattern recognition (psychology) Artificial intelligence Image retrieval Feature extraction Feature vector Data mining Information retrieval Mathematics

Metrics

Cited By

2.12

FWCI (Field Weighted Citation Impact)

Refs

0.78

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Enhancing Image-Text Matching with Adaptive Feature Aggregation

Abstract

Metrics

Citation History

Topics

Related Documents

Text semantic-guided adaptive feature aggregation for image-text retrieval

Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching

Enhancing Separate Encoding with Multi-layer Feature Alignment for Image-Text Matching

High Feature Distinguishability for Adaptive Image-text Matching with Dual-stream Transformers

Enhancing Electric Power Industry Image-Text Matching with Image Properties