Multimodal Sentence Summarization via Multimodal Selective Encoding

Haoran Li; Junnan Zhu; Jiajun Zhang; Xiaodong He; Chengqing Zong

doi:10.18653/v1/2020.coling-main.496

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Sentence Summarization via Multimodal Selective Encoding

Haoran Li Junnan Zhu Jiajun Zhang Xiaodong He Chengqing Zong

Year: 2020

DOI: 10.18653/v1/2020.coling-main.496

Get Full-Text PDF Get Analytical Report

Abstract

This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.

Keywords:

Computer science Automatic summarization Sentence Encoder Encoding (memory) Artificial intelligence Event (particle physics) Focus (optics) Decoding methods Regularization (linguistics) Natural language processing Pattern recognition (psychology) Algorithm

Metrics

Cited By

3.23

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Sentence Summarization via Multimodal Selective Encoding

Abstract

Metrics

Citation History

Topics

Related Documents

Selective Encoding for Abstractive Sentence Summarization

MSMO: Multimodal Summarization with Multimodal Output

Elemental Discourse Unit Guidance Based Model for Multimodal Sentence Summarization

MMSMI: Multilingual Multimodal Summarization for Multimodal Input

Multimodal Summarization with Guidance of Multimodal Reference