Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval

Wenhui Li; Song Soo Yang; Qiang Li; Xuanya Li; An-An Liu

doi:10.1109/tmm.2023.3289753

ScienceGate Book Chapters

JOURNAL ARTICLE

Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval

Wenhui Li Song Soo Yang Qiang Li Xuanya Li An-An Liu

Year: 2023 Journal: IEEE Transactions on Multimedia Vol: 26 Pages: 1867-1880 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2023.3289753

Get Full-Text PDF Get Analytical Report

Abstract

Image-text retrieval, as a fundamental task in the cross-modal field, aims to explore the relationship between visual and textual modalities. Recent methods address this task only by learning the conceptual and syntactical correspondences between cross-modal fragments, but these correspondences inevitably contain noise without considering external knowledge. To solve this issue, we propose a novel C ommonsense-Guided S emantic and R elational C onsistencies (CSRC) for image-text retrieval that can simultaneously expand the semantics and relations to reduce the cross-modal differences under the assumption that the semantics and relations of the true image-text pair should be consistent between two modalities. Specifically, we first explore commonsense knowledge to expand the specific concepts for visual and textual graphs and optimize the semantic consistency by minimizing the differences in cross-modal semantic importance. Then, we extend the same relations for cross-modal concept pairs with semantic consistency, which serves to implement relational consistency. After that, we combine external commonsense knowledge with internal correlation to enhance concept representation and further optimize relational consistency by regularizing the importance differences between association-enhanced concepts. Extensive experimental results on two popular image-text retrieval datasets demonstrate the effectiveness of our proposed method.

Keywords:

Computer science Consistency (knowledge bases) Semantics (computer science) Natural language processing Information retrieval Task (project management) Artificial intelligence Modal Commonsense knowledge Representation (politics) Knowledge representation and reasoning Programming language

Metrics

Cited By

2.91

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Text semantic-guided adaptive feature aggregation for image-text retrieval

Using Semantic Commonsense Resources in Image Retrieval

Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval

Consensus Knowledge-Guided Semantic Enhanced Interaction for Image-Text Retrieval

Visual and semantic guided scene text retrieval