JOURNAL ARTICLE

Generative Adversarial and Self-Attention Based Fine-Grained Cross-Media Retrieval

Abstract

Deep convolutional neural networks have recently demonstrated an impressive ability to conduct the task of fine-grained cross-media retrieval. However, existing fine-grained cross-media retrieval algorithms offer comparatively low retrieval accuracy and are difficult to apply in practice because of three challenging difficulties. Firstly, videos contain many noise frames which may affect the extraction of features. Secondly, existing algorithms deal with different modalities in an indiscriminative way, which ignore the characteristic of each modality, for example, the sequence characteristic of the text. Thirdly, the lack of joint semantic space learning limits retrieval accuracy. To overcome the drawbacks, we propose a novel fine-grained cross-media algorithm, which is based on the generative adversarial network and self-attention mechanism. Our approach firstly removes noise frames in the videos by a spatial cluster filtering algorithm to obtain more pure video data. Then we extract features of each modality. It should be noted that text features are extracted by a self-attention based LSTM structure. Finally, a generative adversarial network is used to learn the common semantic space for features of all modalities. Experimental evaluations on a new benchmark FGCorssNet demonstrate the improving results compared to other counterpart methods. The source codes, models, and data have been made anonymously available at https://github.com/gasanet/GASA.

Keywords:
Computer science Benchmark (surveying) Artificial intelligence Modality (human–computer interaction) Convolutional neural network Modalities Generative grammar Adversarial system Deep learning Task (project management) Noise (video) Machine learning Pattern recognition (psychology) Image (mathematics)

Metrics

1
Cited By
0.10
FWCI (Field Weighted Citation Impact)
14
Refs
0.44
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Self-Attention based fine-grained cross-media hybrid network

Wei ShanDan HuangJiangtao WangFeng ZouSuwen Li

Journal:   Pattern Recognition Year: 2022 Vol: 130 Pages: 108748-108748
JOURNAL ARTICLE

Attention-based Generative Adversarial Hashing for Cross-modal Retrieval

Jianqiong XiaoXiaoqing Zhou

Journal:   2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) Year: 2022
JOURNAL ARTICLE

Structures Aware Fine-Grained Contrastive Adversarial Hashing for Cross-Media Retrieval

Meiyu LiangYawen LiYang YuXiaowen CaoZhe XueAng LiKangkang Lu

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2024 Vol: 36 (7)Pages: 3514-3528
JOURNAL ARTICLE

Multi-label adversarial fine-grained cross-modal retrieval

Chunpu SunHuaxiang ZhangLi LiuDongmei LiuLin Wang

Journal:   Signal Processing Image Communication Year: 2023 Vol: 117 Pages: 117018-117018
© 2026 ScienceGate Book Chapters — All rights reserved.