JOURNAL ARTICLE

Consensus Knowledge-Guided Semantic Enhanced Interaction for Image-Text Retrieval

Hongbin WangHui WangFan Li

Year: 2025 Journal:   Journal of Advanced Computational Intelligence and Intelligent Informatics Vol: 29 (4)Pages: 956-967   Publisher: Fuji Technology Press Ltd.

Abstract

Image–text retrieval, as a fundamental task in the cross-modal domain, centers on exploring semantic consistency and achieving precise alignment between related image–text pairs. Existing approaches primarily depend on co-occurrence frequency to construct coherent representations of commonsense knowledge introduction patterns, thereby facilitating high-quality semantic alignment across the two modalities. However, these methods often overlook the conceptual and syntactic correspondences between cross-modal fragments. To overcome these limitations, this work proposes a consensus knowledge-guided semantic enhanced interaction method, referred to as CSEI, for image–text retrieval. This method correlates both intra-modal and inter-modal semantics between image regions or objects and sentence words, aiming to minimize cross-modal discrepancies. Specifically, the initial step involves constructing visual and textual corpus sets that encapsulate rich concepts and relationships derived from commonsense knowledge. Subsequently, to enhance intra-modal relationships, a semantic relation-aware graph convolutional network is employed to capture more comprehensive feature representations. For inter-modal similarity reasoning, local and global similarity features are extracted through two cross-modal semantic enhancement mechanisms. In the final stage, the approach integrates commonsense knowledge with internal semantic correlations to enrich concept representation and further optimize semantic consistency by regularizing the importance disparities among association-enhanced concepts. Experiments conducted on MS-COCO and Flickr30K validate the effectiveness of the proposed method.

Keywords:
Computer science Natural language processing Artificial intelligence Consistency (knowledge bases) Modal Semantics (computer science) Information retrieval Sentence Similarity (geometry) Representation (politics) Semantic similarity Image (mathematics)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
51
Refs
0.27
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval

Duoduo FengXiangteng HeYuxin Peng

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2022 Vol: 19 (5)Pages: 1-21
JOURNAL ARTICLE

Causal image-text retrieval embedded with consensus knowledge

Yanpeng LIANGXueer LIUZhonggui MAZhuo LI

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2024
JOURNAL ARTICLE

Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval

An-An LiuBo YangWenhui LiDan SongZhengya SunTongwei RenZhiqiang Wei

Journal:   IEEE Geoscience and Remote Sensing Letters Year: 2024 Vol: 21 Pages: 1-5
© 2026 ScienceGate Book Chapters — All rights reserved.