Yanmin ChenXiaolong WangBingquan Liu
This paper for the first time investigates using lexical chains as a model of multiple documents written in Chinese to generate an indicative, moderately fluent summary. The algorithm which computes lexical chains based on the HowNet knowledge database is modified to improve the performance and suit Chinese summarization. Based on an analysis of semanteme, the algorithm can remove redundant similarities and remain differences in information content among multiple documents. The method pre-processes the text first, then constructs lexical chains and identifies strong chains. Then significant sentences are extracted from each document and are ordered, and redundant information are recognized and removed. Finally, the summary is generated in chronological order, and the anaphora resolution technology is applied to improve the fluency of the summary. Evaluation results show that the performance of the presented system is obviously better than that of the baseline system, and lexical chains are effective for multidocument summarization.
Chirantana MallickMadhurima DuttaAjit Kumar DasApurba SarkarAsit Kumar Das
Yanmin ChenXizhong LouJulong Pan
Yanmin ChenXiaolong WangYi Guan
Ariani Di FelippoFabrício E. S. TostaThiago Alexandre Salgueiro Pardo