JOURNAL ARTICLE

Multi-document topic segmentation

Abstract

Multiple documents describing the same or closely related sets of events are common and often easy to obtain: for example, consider document clusters on a news aggregator site or multiple reviews of the same product or service. Even though each such document discusses a similar set of topics, they provide alternative views or complimentary information on each of these topics. We argue that revealing hidden relations by jointly segmenting the documents, or, equivalently, predicting links between topically related segments in different documents would help to visualize documents of interest and construct friendlier user interfaces. In this paper, we refer to this problem as multi-document topic segmentation. We propose an unsupervised Bayesian model for the considered problem that models both shared and document-specific topics, and utilizes Dirichlet process priors to determine the effective number of topics. We show that topic segmentation can be inferred efficiently using a simple split-merge sampling algorithm. The resulting method outperforms baseline models on four datasets for multi-document topic segmentation.

Keywords:
Computer science Segmentation Information retrieval Merge (version control) Latent Dirichlet allocation Topic model Market segmentation News aggregator Data mining Artificial intelligence World Wide Web

Metrics

14
Cited By
1.60
FWCI (Field Weighted Citation Impact)
36
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Sub-Topic Segmentation in Multi-Document

Yun XiaoWei Teng

Journal:   Advanced materials research Year: 2013 Vol: 756-759 Pages: 2958-2961
JOURNAL ARTICLE

MUSED: A multimedia multi-document dataset for topic segmentation

Pedro MotaMaxine EskénaziLuísa Coheur

Journal:   Natural Language Engineering Year: 2018 Vol: 24 (6)Pages: 921-946
JOURNAL ARTICLE

Multi-topic multi-document summarization

Masao UtiyamaKôiti Hasida

Year: 2000 Vol: 2 Pages: 892-892
© 2026 ScienceGate Book Chapters — All rights reserved.