Characters play an important role in the Chinese language, yet computational processing of Chinese has been dominated by word-based approaches, with leaves in syntax trees being words. We investigate Chinese parsing from the character-level, extending the notion of phrase-structure trees by annotating internal structures of words. We demonstrate the importance of character-level information to Chinese processing by building a joint segmentation, part-of-speech (POS) tagging and phrase-structure parsing system that integrates character-structure features. Our joint system significantly outperforms a state-of-the-art word-based baseline on the standard CTB5 test, and gives the best published results for Chinese parsing.
Chenhui ChuToshiaki NakazawaDaisuke KawaharaSadao Kurohashi
Peng JinJohn CarrollYunfang WuDiana McCarthy
Zheng-Yu NiuHaifeng WangHua Wu
Dongchen LiXiantao ZhangXihong Wu