Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network

Meiyu Liang; Junping Du; Wu Liu; Zhe Xue; Yue Geng; Congxian Yang

doi:10.1145/3343031.3350892

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network

Meiyu Liang Junping Du Wu Liu Zhe Xue Yue Geng Congxian Yang

Year: 2019 Pages: 1313-1321

DOI: 10.1145/3343031.3350892

Get Full-Text PDF Get Analytical Report

Abstract

Cross-media search is useful for getting more comprehensive and richer information about social network hot topics or events. To solve the problems of feature heterogeneity and semantic gap of different media data, existing deep cross-media quantization technology provides an efficient and effective solution for cross-media common semantic representation learning. However, due to the fact that social network data often exhibits semantic sparsity, diversity, and contains a lot of noise, the performance of existing cross-media search methods often degrades. To address the above issue, this paper proposes a novel fine-grained cross-media representation learning model with deep quantization attention network for social network cross-media search (CMSL). First, we construct the image-word semantic correlation graph, and perform deep random walks on the graph to realize semantic expansion and semantic embedding learning, which can discover some potential semantic correlations between images and words. Then, in order to discover more fine-grained cross-media semantic correlations, a multi-scale fine-grained cross-media semantic correlation learning method that combines global and local saliency semantic similarity is proposed. Third, the fine-grained cross-media representation, cross-media semantic correlations and binary quantization code are jointly learned by a unified deep quantization attention network, which can preserve both inter-media correlations and intra-media similarities, by minimizing both cross-media correlation loss and binary quantization loss. Experimental results demonstrate that CMSL can generate high-quality cross-media common semantic representation, which yields state-of-the-art cross-media search performance on two benchmark datasets, NUS-WIDE and MIR-Flickr 25k.

Keywords:

Computer science Quantization (signal processing) Representation (politics) Deep learning Artificial intelligence Feature learning Computer vision

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.12

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network

Abstract

Metrics

Citation History

Topics

Related Documents

Self-Attention based fine-grained cross-media hybrid network

Fine-Grained Early Frequency Attention for Deep Speaker Representation Learning

A Cross-layer Self-attention Learning Network for Fine-grained Classification

Attribute-Aware Attention Model for Fine-grained Representation Learning

Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval