JOURNAL ARTICLE

Improving Unsupervised Extractive Summarization by Jointly Modeling Facet and Redundancy

Xinnian LiangJing LiShuangzhi WuMu LiZhoujun Li

Year: 2021 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 30 Pages: 1546-1557   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Unsupervised extractive summarization aims to extract salient sentences from documents without labeled corpus. Existing methods are mostly graph-based by computing sentence centrality. These methods have two main problems: facet bias and redundant problems. Facet bias problem leads summarization models tend to select sentences within the same facet, which often leads to the ignoring of other vital facets, especially on long-document and multi-documents. First, to address the facet bias problem, we proposed a novel Facet-Aware centrality-based Ranking model (FAR). We let the model pay more attention to different facets by introducing a sentence-document weight. The weight is added to the sentence centrality score. FAR can alleviate redundancy to some extent. Then, to further reduce redundancy, we proposed a novel Redundancy- and Facet-Aware Ranking model (RFAR) which jointly models facet and redundancy by incorporating Determinantal Point Process (DPP) into the previous proposed FAR. We evaluate our FAR and RFAR on a wide range of summarization tasks that include 8 representative benchmark datasets. Experimental results show that FAR and RFAR consistently outperforms strong baselines, especially in long- and multi-document scenarios, and even perform comparably to some supervised models. Besides, we find that our methods can alleviate the position bias problem.

Keywords:
Automatic summarization Redundancy (engineering) Computer science Centrality Sentence Facet (psychology) Artificial intelligence Salient Graph Benchmark (surveying) Natural language processing Machine learning Theoretical computer science Mathematics Statistics Psychology

Metrics

10
Cited By
1.13
FWCI (Field Weighted Citation Impact)
62
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.