Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Kamal Al-Sabahi; Zhang Zuping; Yang Kang

doi:10.3837/tiis.2019.01.015

ScienceGate Book Chapters

JOURNAL ARTICLE

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Kamal Al-Sabahi Zhang Zuping Yang Kang

Year: 2019 Journal: KSII Transactions on Internet and Information Systems Vol: 13 (1) Publisher: Korea Society of Internet Information

DOI: 10.3837/tiis.2019.01.015

Get Full-Text PDF Get Analytical Report

Abstract

Since the amount of information on the internet is growing rapidly, it is not\neasy for a user to find relevant information for his/her query. To tackle this\nissue, much attention has been paid to Automatic Document Summarization. The\nkey point in any successful document summarizer is a good document\nrepresentation. The traditional approaches based on word overlapping mostly\nfail to produce that kind of representation. Word embedding, distributed\nrepresentation of words, has shown an excellent performance that allows words\nto match on semantic level. Naively concatenating word embeddings makes the\ncommon word dominant which in turn diminish the representation quality. In this\npaper, we employ word embeddings to improve the weighting schemes for\ncalculating the input matrix of Latent Semantic Analysis method. Two\nembedding-based weighting schemes are proposed and then combined to calculate\nthe values of this matrix. The new weighting schemes are modified versions of\nthe augment weight and the entropy frequency. The new schemes combine the\nstrength of the traditional weighting schemes and word embedding. The proposed\napproach is experimentally evaluated on three well-known English datasets, DUC\n2002, DUC 2004 and Multilingual 2015 Single-document Summarization for English.\nThe proposed model performs comprehensively better compared to the\nstate-of-the-art methods, by at least 1% ROUGE points, leading to a conclusion\nthat it provides a better document representation and a better document summary\nas a result.\n

Keywords:

Metrics

Cited By

0.31

FWCI (Field Weighted Citation Impact)

Refs

0.63

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Abstract

Metrics

Citation History

Topics

Related Documents

Document Summarization Using Sentence-Level Semantic Based on Word Embeddings

A New Approach for Multi-document Summarization Based on Latent Semantic Analysis

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Leveraging word embeddings for spoken document summarization

Single document summarization using word and sentence embeddings