Survey on Extractive Text Summarization Methods with Multi-Document Datasets

P N Varalakshmi K; Jagadish S. Kallimani

doi:10.1109/icacci.2018.8554768

ScienceGate Book Chapters

JOURNAL ARTICLE

Survey on Extractive Text Summarization Methods with Multi-Document Datasets

P N Varalakshmi K Jagadish S. Kallimani

Year: 2018 Pages: 2113-2119

DOI: 10.1109/icacci.2018.8554768

Get Full-Text PDF Get Analytical Report

Abstract

Text summarization has been one of the key research areas in Natural Language Processing (NLP) for a while. The various methods to summarize one or more documents can be broadly classified into extractive and abstractive text summarization where the former involves selecting key parts in the document and embedding into the summary while balancing between salience and redundancy. The latter involves creating new sentences to provide a summary of the documents. Extractive summarization can further be done in a supervised manner with humans or an unsupervised manner without any human intervention. This paper provides the knowledge a few of the current methods to perform extractive text summarization where the input would be multi document sets. Multi document summarization can consider two types of document sets; a homogeneous set of documents which have a common topic or theme and a heterogeneous set where the main topic for the documents are unrelated but they contain some form information that is related to the summary. The first method uses sentence regression where they consider performing sentence ranking along with sentence relations followed by greedy selection process. The second is an unsupervised paragraph embedding method utilizing a density peaks clustering method. The third method proposes document-level reconstruction using a neural document model. The fourth method is a query focused, joint neural network based model with an attention mechanism. The fifth method concentrates on coherence by providing a graph-based model which does not require discourse analysis as a prerequisite. We also see a way to create a heterogeneous multi-documentcorpus along with the limitations of each of these methods.

Keywords:

Computer science Automatic summarization Paragraph Information retrieval Natural language processing Multi-document summarization Artificial intelligence Sentence Cluster analysis Text graph Word embedding Redundancy (engineering) Set (abstract data type) Embedding World Wide Web

Metrics

Cited By

0.79

FWCI (Field Weighted Citation Impact)

Refs

0.78

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Survey on Extractive Text Summarization Methods with Multi-Document Datasets

Abstract

Metrics

Citation History

Topics

Related Documents

Implementing Extractive Summarization Methods on Extractive Datasets

Multi-document text summarization - a survey

Improving Extractive Multi-Document Text Summarization Through Multi-Objective Optimization

Multi-Document Extractive Text Summarization via Deep Learning Approach

Multi-document extractive text summarization based on firefly algorithm