The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selecting a small number of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We observe a higher performance over the original model, on par with more complex state-of-the-art methods.
Simão GonçalvesGonçalo dos Santos CorreiaDiogo PernesAfonso Mendes
Ratish PuduppullyParag JainNancy F. ChenMark Steedman
Nidhika YadavTanya AggarwalNiladri Chatterjee
Hai Cao ManhLê Thanh HươngTuan Luu Minh
Anumeha AgrawalRosa Anil GeorgeSelvan Sunitha RaviS. Sowmya Kamath