JOURNAL ARTICLE

Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

Jipeng QiangPing ChenWei DingTong WangFei XieXindong Wu

Year: 2019 Journal:   ACM Transactions on Knowledge Discovery from Data Vol: 13 (4)Pages: 1-21   Publisher: Association for Computing Machinery

Abstract

More and more user comments like Tweets are available, which often contain user concerns. In order to meet the demands of users, a good summary generating from multiple documents should consider reader interests as reflected in reader comments. In this article, we focus on how to generate a summary from multi-document documents by considering reader comments, named as reader-aware multi-document summarization (RA-MDS). We present an innovative topic-based method for RA-MDA, which exploits latent topics to obtain the most salient and lessen redundancy summary from multiple documents. Since finding latent topics for RA-MDS is a crucial step, we also present a Heterogeneous-length Text Topic Modeling (HTTM) to extract topics from the corpus that includes both news reports and user comments, denoted as heterogeneous-length texts. In this case, the latent topics extract by HTTM cover not only important aspects of the event, but also aspects that attract reader interests. Comparisons on summary benchmark datasets also confirm that the proposed RA-MDS method is effective in improving the quality of extracted summaries. In addition, experimental results demonstrate that the proposed topic modeling method outperforms existing topic modeling algorithms.

Keywords:
Automatic summarization Computer science Topic model Information retrieval Redundancy (engineering) Salient Focus (optics) Multi-document summarization Benchmark (surveying) Cover (algebra) Event (particle physics) Exploit Artificial intelligence

Metrics

16
Cited By
1.54
FWCI (Field Weighted Citation Impact)
31
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Reader-Aware Multi-Document Summarization via Sparse Coding

Piji LiLidong BingWai LamHang LiYi Liao

Journal:   arXiv (Cornell University) Year: 2015 Pages: 1270-1276
JOURNAL ARTICLE

Topic modeling combined with classification technique for extractive multi-document text summarization

Rajendra Kumar Roul

Journal:   Soft Computing Year: 2020 Vol: 25 (2)Pages: 1113-1127
JOURNAL ARTICLE

Multi-topic multi-document summarization

Masao UtiyamaKôiti Hasida

Year: 2000 Vol: 2 Pages: 892-892
© 2026 ScienceGate Book Chapters — All rights reserved.