A Novel Chinese Multi-Document Summarization Using Clustering Based Sentence Extraction

De-xi Liu; Yanxiang He; Donghong Ji; Hua Yang

doi:10.1109/icmlc.2006.258855

ScienceGate Book Chapters

JOURNAL ARTICLE

A Novel Chinese Multi-Document Summarization Using Clustering Based Sentence Extraction

De-xi Liu Yanxiang He Donghong Ji Hua Yang

Year: 2006 Vol: 40 Pages: 2592-2597

DOI: 10.1109/icmlc.2006.258855

Get Full-Text PDF Get Analytical Report

Abstract

This paper proposes a strategy for Chinese multi-document summarization based on clustering and sentence extraction. It adopts the term vector to represent the linguistic unit in Chinese document, which obtains higher representation quality than traditional word-based vector space model in a certain extent. As for clustering, we propose two heuristics to automatically detect the proper number of clusters: the first one makes full use of the summary length fixed by the user; the second is a stability method, which has been applied to other unsupervised learning problems. We also discuss a global searching method for sentence selection from the clusters. To evaluate our summarization strategy, an extrinsic evaluation method based on classification task is adopted. Experimental results on news document set show that the new strategy can significantly enhance the performance of Chinese multi-document summarization

Keywords:

Automatic summarization Computer science Cluster analysis Sentence Artificial intelligence Heuristics Multi-document summarization Set (abstract data type) Word (group theory) Stability (learning theory) Document clustering Selection (genetic algorithm) Natural language processing Vector space model Information retrieval Machine learning Mathematics

Metrics

Cited By

1.57

FWCI (Field Weighted Citation Impact)

Refs

0.86

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

A Novel Chinese Multi-Document Summarization Using Clustering Based Sentence Extraction

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-document summarization using sentence clustering

Multi-document Summarization Based on Sentence Clustering

Multi-document Text Summarization Using Sentence Extraction

Multi-document summarization by sentence extraction

Multi-Document Summarization By Sentence Extraction