JOURNAL ARTICLE

Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval

Wenhui LiSong Soo YangQiang LiXuanya LiAn-An Liu

Year: 2023 Journal:   IEEE Transactions on Multimedia Vol: 26 Pages: 1867-1880   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Image-text retrieval, as a fundamental task in the cross-modal field, aims to explore the relationship between visual and textual modalities. Recent methods address this task only by learning the conceptual and syntactical correspondences between cross-modal fragments, but these correspondences inevitably contain noise without considering external knowledge. To solve this issue, we propose a novel C ommonsense-Guided S emantic and R elational C onsistencies (CSRC) for image-text retrieval that can simultaneously expand the semantics and relations to reduce the cross-modal differences under the assumption that the semantics and relations of the true image-text pair should be consistent between two modalities. Specifically, we first explore commonsense knowledge to expand the specific concepts for visual and textual graphs and optimize the semantic consistency by minimizing the differences in cross-modal semantic importance. Then, we extend the same relations for cross-modal concept pairs with semantic consistency, which serves to implement relational consistency. After that, we combine external commonsense knowledge with internal correlation to enhance concept representation and further optimize relational consistency by regularizing the importance differences between association-enhanced concepts. Extensive experimental results on two popular image-text retrieval datasets demonstrate the effectiveness of our proposed method.

Keywords:
Computer science Consistency (knowledge bases) Semantics (computer science) Natural language processing Information retrieval Task (project management) Artificial intelligence Modal Commonsense knowledge Representation (politics) Knowledge representation and reasoning Programming language

Metrics

16
Cited By
2.91
FWCI (Field Weighted Citation Impact)
54
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Consensus Knowledge-Guided Semantic Enhanced Interaction for Image-Text Retrieval

Hongbin WangHui WangFan Li

Journal:   Journal of Advanced Computational Intelligence and Intelligent Informatics Year: 2025 Vol: 29 (4)Pages: 956-967
JOURNAL ARTICLE

Visual and semantic guided scene text retrieval

Hailong LuoMayire IbrayimAskar HamdullaQilin Deng

Journal:   The Journal of Supercomputing Year: 2024 Vol: 80 (14)Pages: 21394-21411
© 2026 ScienceGate Book Chapters — All rights reserved.