Finding good keyphrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, where the flexible wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify keyphrases from a test document. Comparisons on keyphrase benchmark datasets confirm that our document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.
Huiting LiuLili WangPeng ZhaoXindong Wu
Fei XieXindong WuXuegang HuJun GaoDan GuoYulian FeiErtian Hua
Qingren WangVictor S. ShengXindong Wu