JOURNAL ARTICLE

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Abstract

The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selecting a small number of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We observe a higher performance over the original model, on par with more complex state-of-the-art methods.

Keywords:
Automatic summarization Centroid Computer science Baseline (sea) Ranking (information retrieval) Similarity (geometry) Simple (philosophy) Information retrieval Artificial intelligence Data mining Image (mathematics)

Metrics

26
Cited By
1.60
FWCI (Field Weighted Citation Impact)
12
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.