Unsupervised hierarchical text summarization

S. Divya; N. Sripriya

doi:10.1063/5.0116918

ScienceGate Book Chapters

JOURNAL ARTICLE

Unsupervised hierarchical text summarization

S. Divya N. Sripriya

Year: 2022 Journal: AIP conference proceedings Vol: 2686 Pages: 060006-060006 Publisher: American Institute of Physics

DOI: 10.1063/5.0116918

Get Full-Text PDF Get Analytical Report

Abstract

Text Summarization is a process of shortening the longer text document(s) into a summary of informative sentences in an orderly way. Extractive summarization retrieves the important sentences without making any change to actual sentences. Each sentence in the document is represented as a vector of real numbers, which is termed as embedding. While embedding a sentence, values of specific features are analyzed and are plotted in n-dimensional space. This results in effective prediction of following and preceding sentences of the current sentence. Thus, the sentences with similar semantics lie closer to each other. Unsupervised Summarization groups the similar sentences by estimating the distance between the vectors and decides if sentence must be included in summary or not. Hierarchical Summarization constructs a tree-structure with the input text data, where the truncation of tree is done based on the number of clusters. Each cluster holds sentences that are semantically similar. By determining the nearest neighbor in each cluster, a specified number of sentences were retrieved from each cluster to be included in summary which holds at-least half the size of the input document(s). The performance of hierarchical summarization is measured on CNN/Daily Mail dataset and is determined using performance metrics. The evaluation score concludes that BIRCH algorithm performs better than the other Hierarchical Clustering techniques.

Keywords:

Automatic summarization Computer science Sentence Artificial intelligence Hierarchical clustering Natural language processing Cluster analysis Tree (set theory) Embedding Text graph Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Unsupervised hierarchical text summarization

Abstract

Metrics

Topics

Related Documents

Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization

Abstractive Text Summarization and Unsupervised Text Classifier

Text summarization using unsupervised deep learning

A new approach to unsupervised text summarization

Unsupervised Text Summarization for Abstract-Based Retrieval