JOURNAL ARTICLE

Multi-Modal Abstractive Summarization based Transformer using Video Transcripts

Min Ye LeeSung Won Han

Year: 2021 Journal:   Journal of Korean Institute of Industrial Engineers Vol: 47 (5)Pages: 433-443

Abstract

In this paper, we propose a MASTF methodology, which is a Multimodal Abstractive Summarization based on Transformer. Neural network models applied in the field of generative summaries utilizing conventional multi-modals were techniques utilizing hierarchical attention based on circulating neural networks. Although transformers showed excellent performance in various natural language processing fields, including generative summaries, there were no cases of application in multimodal-based generative summaries. Thus, in this paper, we use transformers to improve the performance of multimodal image subtitle generation summary models. Transformer-based models outperform hierarchical attention-based models by 24.17% on ROUGE-L basis and 10.52% on combining speech and text.

Keywords:
Automatic summarization Transformer Computer science Generative grammar Artificial neural network Subtitle Artificial intelligence Modal Natural language processing Engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
17
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.