Generative Adversarial and Self-Attention Based Fine-Grained Cross-Media Retrieval

Jin Seong Hong; Haonan Luo; Yazhou Yao; Zhenmin Tang

doi:10.1145/3448823.3448825

ScienceGate Book Chapters

JOURNAL ARTICLE

Generative Adversarial and Self-Attention Based Fine-Grained Cross-Media Retrieval

Jin Seong Hong Haonan Luo Yazhou Yao Zhenmin Tang

Year: 2020 Pages: 1-8

DOI: 10.1145/3448823.3448825

Get Full-Text PDF Get Analytical Report

Abstract

Deep convolutional neural networks have recently demonstrated an impressive ability to conduct the task of fine-grained cross-media retrieval. However, existing fine-grained cross-media retrieval algorithms offer comparatively low retrieval accuracy and are difficult to apply in practice because of three challenging difficulties. Firstly, videos contain many noise frames which may affect the extraction of features. Secondly, existing algorithms deal with different modalities in an indiscriminative way, which ignore the characteristic of each modality, for example, the sequence characteristic of the text. Thirdly, the lack of joint semantic space learning limits retrieval accuracy. To overcome the drawbacks, we propose a novel fine-grained cross-media algorithm, which is based on the generative adversarial network and self-attention mechanism. Our approach firstly removes noise frames in the videos by a spatial cluster filtering algorithm to obtain more pure video data. Then we extract features of each modality. It should be noted that text features are extracted by a self-attention based LSTM structure. Finally, a generative adversarial network is used to learn the common semantic space for features of all modalities. Experimental evaluations on a new benchmark FGCorssNet demonstrate the improving results compared to other counterpart methods. The source codes, models, and data have been made anonymously available at https://github.com/gasanet/GASA.

Keywords:

Computer science Benchmark (surveying) Artificial intelligence Modality (human–computer interaction) Convolutional neural network Modalities Generative grammar Adversarial system Deep learning Task (project management) Noise (video) Machine learning Pattern recognition (psychology) Image (mathematics)

Metrics

Cited By

0.10

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial and Self-Attention Based Fine-Grained Cross-Media Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Local Self-Attention on Fine-grained Cross-media Retrieval

Self-Attention based fine-grained cross-media hybrid network

Attention-based Generative Adversarial Hashing for Cross-modal Retrieval

Structures Aware Fine-Grained Contrastive Adversarial Hashing for Cross-Media Retrieval

Multi-label adversarial fine-grained cross-modal retrieval