JOURNAL ARTICLE

Survey on Extractive Text Summarization Methods with Multi-Document Datasets

Abstract

Text summarization has been one of the key research areas in Natural Language Processing (NLP) for a while. The various methods to summarize one or more documents can be broadly classified into extractive and abstractive text summarization where the former involves selecting key parts in the document and embedding into the summary while balancing between salience and redundancy. The latter involves creating new sentences to provide a summary of the documents. Extractive summarization can further be done in a supervised manner with humans or an unsupervised manner without any human intervention. This paper provides the knowledge a few of the current methods to perform extractive text summarization where the input would be multi document sets. Multi document summarization can consider two types of document sets; a homogeneous set of documents which have a common topic or theme and a heterogeneous set where the main topic for the documents are unrelated but they contain some form information that is related to the summary. The first method uses sentence regression where they consider performing sentence ranking along with sentence relations followed by greedy selection process. The second is an unsupervised paragraph embedding method utilizing a density peaks clustering method. The third method proposes document-level reconstruction using a neural document model. The fourth method is a query focused, joint neural network based model with an attention mechanism. The fifth method concentrates on coherence by providing a graph-based model which does not require discourse analysis as a prerequisite. We also see a way to create a heterogeneous multi-documentcorpus along with the limitations of each of these methods.

Keywords:
Computer science Automatic summarization Paragraph Information retrieval Natural language processing Multi-document summarization Artificial intelligence Sentence Cluster analysis Text graph Word embedding Redundancy (engineering) Set (abstract data type) Embedding World Wide Web

Metrics

9
Cited By
0.79
FWCI (Field Weighted Citation Impact)
15
Refs
0.78
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Improving Extractive Multi-Document Text Summarization Through Multi-Objective Optimization

Journal:   Iraqi Journal of Science Year: 2018 Vol: 59 (4B)
JOURNAL ARTICLE

Multi-Document Extractive Text Summarization via Deep Learning Approach

Afsaneh RezaeiSina DamiParisa Daneshjoo

Journal:   2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) Year: 2019 Pages: 680-685
JOURNAL ARTICLE

Multi-document extractive text summarization based on firefly algorithm

Minakshi TomerManoj Kumar

Journal:   Journal of King Saud University - Computer and Information Sciences Year: 2021 Vol: 34 (8)Pages: 6057-6065
© 2026 ScienceGate Book Chapters — All rights reserved.