JOURNAL ARTICLE

Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position

Abstract

Multi-document summarization is more challenge than single-document summarization since it has to solve the problem of overlapping information among sentences from different documents. Also, since multi-document summarization dataset is rare, methods based on deep learning are difficult to be applied. In this paper, we propose an approach to multi-document summarization based on k-means clustering algorithm, combining with centroid-based method, maximal marginal relevance and sentence positions. This approach is efficient in finding salient sentences and preventing overlapping between sentences. Experiments using DUC 2007 dataset show that our system is more efficient than other researches in this field.

Keywords:
Automatic summarization Multi-document summarization Computer science Centroid Relevance (law) Sentence Salient Cluster analysis Artificial intelligence Field (mathematics) Information retrieval Natural language processing Mathematics

Metrics

8
Cited By
0.46
FWCI (Field Weighted Citation Impact)
21
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.