JOURNAL ARTICLE

Multimodal Summarization with Guidance of Multimodal Reference

Junnan ZhuYu ZhouJiajun ZhangHaoran LiChengqing ZongChangliang Li

Year: 2020 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (05)Pages: 9749-9756   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users' satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i.e., ROUGE-ranking and Order-ranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.

Keywords:
Automatic summarization Computer science Multimodal therapy Artificial intelligence Ranking (information retrieval) Metric (unit) Modality (human–computer interaction) Multimodal learning Construct (python library) Machine learning Multimodality

Metrics

92
Cited By
7.27
FWCI (Field Weighted Citation Impact)
30
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.