JOURNAL ARTICLE

Multi-document summarization based on hierarchical topic model

Abstract

In this paper, we introduced an extractive multi-document summarization method based on hierarchical topic model of hierarchical Latent Dirichlet Allocation (hLDA) and sentences compression. hLDA is a representative generative probabilistic model, which not only can mine latent topics from a large amount of discrete data, but also can organize these topics into a hierarchy to achieve a deeper semantic analysis. At the same time we also use sentence compression technology to refine the summaries, making them more concise. We use TAC 2010 data sets as the experimental test corpus and ROUGE method to evaluate our summaries. The evaluations confirmed that our method has better performance than some traditional methods.

Keywords:
Automatic summarization Latent Dirichlet allocation Computer science Topic model Artificial intelligence Natural language processing Hierarchy Generative model Generative grammar Hierarchical database model Probabilistic logic Sentence Multi-document summarization Information retrieval Test data Data mining

Metrics

5
Cited By
0.39
FWCI (Field Weighted Citation Impact)
13
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.