JOURNAL ARTICLE

Research on Multi-document Summarization Based on LDA Topic Model

Abstract

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic-importance of the corpus together to calculate the posterior probability of the sentence, and then, based on the posterior probability, it selects sentences to form a summary. Topic-distribution of each sentence represents the likelihood of sentence belonging to each topic and topic-importance represents the degree that the topics cover the significant portion of the corpus. The method highlights the latent topics and optimizes the summarization. Experiment results on the dataset DUC2006 show the advantage of the multi-document summarization algorithm proposed in the paper. ROUGE values are improved compared with those methods, such as LexRank, LDA-SIBS, LDA-PGS.

Keywords:
Latent Dirichlet allocation Automatic summarization Computer science Topic model Artificial intelligence Sentence Natural language processing Ranking (information retrieval) Posterior probability Information retrieval Bayesian probability

Metrics

16
Cited By
3.38
FWCI (Field Weighted Citation Impact)
9
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Tiered sentence based topic model for multi-document summarization

Nadeem AkhtarM. M. Sufyan BegHira JavedMd. Muzakkir Hussain

Journal:   Journal of Information and Optimization Sciences Year: 2022 Vol: 43 (8)Pages: 2131-2141
JOURNAL ARTICLE

Multi-topic multi-document summarization

Masao UtiyamaKôiti Hasida

Year: 2000 Vol: 2 Pages: 892-892
JOURNAL ARTICLE

A Hybrid Topic Model for Multi-Document Summarization

JinAn XuJiangming LiuKenji Araki

Journal:   IEICE Transactions on Information and Systems Year: 2015 Vol: E98.D (5)Pages: 1089-1094
© 2026 ScienceGate Book Chapters — All rights reserved.