Adapting Generative Pretrained Language Model for Open-domain Multimodal Sentence Summarization

Dengtian Lin; Liqiang Jing; Xuemeng Song; Meng Liu; Teng Sun; Liqiang Nie

doi:10.1145/3539618.3591633

ScienceGate Book Chapters

JOURNAL ARTICLE

Adapting Generative Pretrained Language Model for Open-domain Multimodal Sentence Summarization

Dengtian Lin Liqiang Jing Xuemeng Song Meng Liu Teng Sun Liqiang Nie

Year: 2023 Pages: 195-204

DOI: 10.1145/3539618.3591633

Get Full-Text PDF Get Analytical Report

Abstract

Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.

Keywords:

Computer science Automatic summarization Interpretability Sentence Artificial intelligence Natural language processing Key (lock) Domain (mathematical analysis) Generative grammar Security token Feature (linguistics) Embedding Language model Machine learning

Metrics

Cited By

3.32

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Adapting Generative Pretrained Language Model for Open-domain Multimodal Sentence Summarization

Abstract

Metrics

Citation History

Topics

Related Documents

Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Generative-Discriminative Pretrained Language Model For Text Summarization NTU-CE7455 Project Final Report

Semantic–Electromagnetic Inversion With Pretrained Multimodal Generative Model

Controllable Abstractive Summarization Using Multilingual Pretrained Language Model