Kamal Al-SabahiZhang ZupingYang Kang
Since the amount of information on the internet is growing rapidly, it is not\neasy for a user to find relevant information for his/her query. To tackle this\nissue, much attention has been paid to Automatic Document Summarization. The\nkey point in any successful document summarizer is a good document\nrepresentation. The traditional approaches based on word overlapping mostly\nfail to produce that kind of representation. Word embedding, distributed\nrepresentation of words, has shown an excellent performance that allows words\nto match on semantic level. Naively concatenating word embeddings makes the\ncommon word dominant which in turn diminish the representation quality. In this\npaper, we employ word embeddings to improve the weighting schemes for\ncalculating the input matrix of Latent Semantic Analysis method. Two\nembedding-based weighting schemes are proposed and then combined to calculate\nthe values of this matrix. The new weighting schemes are modified versions of\nthe augment weight and the entropy frequency. The new schemes combine the\nstrength of the traditional weighting schemes and word embedding. The proposed\napproach is experimentally evaluated on three well-known English datasets, DUC\n2002, DUC 2004 and Multilingual 2015 Single-document Summarization for English.\nThe proposed model performs comprehensively better compared to the\nstate-of-the-art methods, by at least 1% ROUGE points, leading to a conclusion\nthat it provides a better document representation and a better document summary\nas a result.\n
Kamal Al-SabahiZuping ZhangJun LongKhaled Alwesabi
Kuan‐Yu ChenShih-Hung LiuHsin‐Min WangBerlin ChenHsin‐Hsi Chen