This paper investigates using lexical cohesion to generate a moderately fluent semantic summary from a collection of documents written in Chinese. Based on the algorithm of cohesion analysis using the relationship among the words in the HowNet knowledge database, the built system computes concept frequency rather than word frequency as a measurement of importance. It merges the analysis of lexical semantics and some summarization principles to remove the redundancy and remain the difference in multiple documents. Such approach reduces information loss due to vocabulary switching in the summarization process and the use of a more general notion of relatedness which is based on lexical semantics. Thus we can take into account some more-distant relationship between words. Evaluation results show that the performance of the presented system is obviously better than that of the baseline system. The system can be applied to on-line web texts processing.
Yanmin ChenXiaolong WangBingquan Liu
Iris HendrickxWalter DaelemansErwin MarsiEmiel Krahmer
S. SaraswathiRameshrao ArtiMicrosoft R & D India Private Limited, Hyderabad, India