JOURNAL ARTICLE

Adversarial Graph Attention Network for Multi-modal Cross-Modal Retrieval

Abstract

Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K 2 ) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.

Keywords:
Computer science Modalities Modal Exploit Graph Modality (human–computer interaction) Embedding Theoretical computer science Aggregate (composite) Artificial intelligence Space (punctuation) Information retrieval Machine learning

Metrics

1
Cited By
0.11
FWCI (Field Weighted Citation Impact)
29
Refs
0.47
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Adversarial Graph Convolutional Network for Cross-Modal Retrieval

Xinfeng DongLi LiuLei ZhuLiqiang NieHuaxiang Zhang

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2021 Vol: 32 (3)Pages: 1634-1645
BOOK-CHAPTER

Adversarial Graph Convolutional Network Hashing for Cross-Modal Retrieval

Bo LuTianbao ZhaoG. L. LiangJiaming LiXiaodong Duan

Communications in computer and information science Year: 2025 Pages: 69-80
JOURNAL ARTICLE

Iterative graph attention memory network for cross-modal retrieval

Xinfeng DongHuaxiang ZhangXiao DongXu Lu

Journal:   Knowledge-Based Systems Year: 2021 Vol: 226 Pages: 107138-107138
© 2026 ScienceGate Book Chapters — All rights reserved.