Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization

Guanjie Huang; Hong Shen

doi:10.1145/3469968.3469987

ScienceGate Book Chapters

JOURNAL ARTICLE

Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization

Guanjie Huang Hong Shen

Year: 2021 Pages: 86-92

DOI: 10.1145/3469968.3469987

Get Full-Text PDF Get Analytical Report

Abstract

Unsupervised text summarization is a promising approach that avoids human efforts in generating reference summaries, which is particularly important for large-scale datasets. To improve its performance, we propose a hierarchical BERT [1] model that contains both word-level and sentence-level training processes to achieve semantic-rich sentence embeddings. We use the vanilla BERT as the word-level training, and redesign it for the sentence-level training with the new "Sentence Token Prediction" and "Local Shuffle Recovery" training tasks and suitable input format. We first train word-level model to get preliminary sentence embeddings, then we input them into the sentence-level model to further extract higher level and inter-sentence semantic information. After that, we obtain the context sensitive sentence embeddings and utilize them for the KMeans cluster algorithm to finally generate summaries by extracting sentences from the document. To accelerate the training of the BERT model, we adopt the PipeDream [2] model parallelism that distributes the model layers among multiple machines to conduct the training process in parallel. Finally, we show through experimental results that our proposed model outperforms most popular models and achieves a speedup of 2.7 in training time on 4 machines.

Keywords:

Automatic summarization Computer science Sentence Natural language processing Artificial intelligence Information retrieval

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Distributed Hierarchical Sentence Embeddings for Unsupervised Extractive Text Summarization

Abstract

Metrics

Topics

Related Documents

Unsupervised Extractive Text Summarization Using Frequency-Based Sentence Clustering

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs

A Comparative Study of Sentence Embeddings for Unsupervised Extractive Multi-document Summarization

Extractive Text Summarization Using Sentence Ranking

Unsupervised hierarchical text summarization