JOURNAL ARTICLE

Document Summarization Using Sentence Based Topic Modeling And Clustering

Augustine GeorgeHanumanthappa.

Year: 2018 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

In recent years, the practical application of automatic document summarization has become popular and numerous papers published based on the topic. There are many approaches to identify the significant portion of each document. Topic representation and modelling is an intermediate representation of the text that captures the topics discussed in the input and aids the automatic summarization. The significance of sentences decided based on the representations of topics in the input document. This article attempts to provide a comprehensive summary that includes sentence extraction, tokenization on the extracted sentences. Sentence based Structural Topic Modeling (STM) is used to determine important content for each domain in the integrated document and sentences are grouped using k-means clustering under each topic. Further Text Summarization of sentences under each topic achieved using its Term Frequency of each sentence. Finally, the sentences are arranged based on its Lexical Ranking score in the summarized text.

Keywords:
Automatic summarization Computer science Natural language processing Cluster analysis Sentence Multi-document summarization Information retrieval Artificial intelligence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.35
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Industrial Technology and Control Systems
Physical Sciences →  Engineering →  Control and Systems Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.