Rapid growth of multi-modal documents containing images on the Internet makes multi-modal summarization necessary. Recent advances in neural-based text summarization show the strength of deep learning technique in summarization. This paper proposes a neural-based extractive multi-modal summarization method based on multi-modal RNN. Our method first encodes documents and images with a multi-modal RNN, and then calculates the summary probability of sentences through a logistic classifier using text coverage, text redundancy, and image set coverage as features. We extend the DailyMail corpora by collecting images from the Web. Experiments show our method outperforms the state-of-the-art neural summarization methods.
Anish Mathew KuriakoseV. Umadevi
Mukesh RawatMohd Hamzah SiddiquiMohd Anas MaanShashaank DhimanMohd Asad
Hamzah SiddiquiSaleha SiddiquiMukesh RawatAnas MaanShashaank DhimanMohd Asad
Priyadarshini PatilChandan RaoG.V. Rithin Kumar ReddyRiteesh RamS. M. Meena