JOURNAL ARTICLE

Exploiting User Posts for Web Document Summarization

Minh-Tien NguyenVu TranLe-Minh NguyenXuan-Hieu Phan

Year: 2018 Journal:   ACM Transactions on Knowledge Discovery from Data Vol: 12 (4)Pages: 1-28   Publisher: Association for Computing Machinery

Abstract

Relevant user posts such as comments or tweets of a Web document provide additional valuable information to enrich the content of this document. When creating user posts, readers tend to borrow salient words or phrases in sentences. This can be considered as word variation. This article proposes a framework that models the word variation aspect to enhance the quality of Web document summarization. Technically, the framework consists of two steps: scoring and selection. In the first step, the social information of a Web document such as user posts is exploited to model intra-relations and inter-relations in lexical and semantic levels. These relations are denoted by a mutual reinforcement similarity graph used to score each sentence and user post. After scoring, summaries are extracted by using a ranking approach or concept-based method formulated in the form of Integer Linear Programming. To confirm the efficiency of our framework, sentence and story highlight extraction tasks were taken as a case study on three datasets in two languages, English and Vietnamese. Experimental results show that: (i) the framework can improve ROUGE-scores compared to state-of-the-art baselines of social context summarization and (ii) the combination of the two relations benefits the sentence extraction of single Web documents.

Keywords:
Computer science Automatic summarization Information retrieval Multi-document summarization Natural language processing Sentence Ranking (information retrieval) Graph Selection (genetic algorithm) Variation (astronomy) Salient Artificial intelligence Word (group theory) Task (project management) Linguistics

Metrics

5
Cited By
0.79
FWCI (Field Weighted Citation Impact)
57
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.