Abstract

Characters play an important role in the Chinese language, yet computational processing of Chinese has been dominated by word-based approaches, with leaves in syntax trees being words. We investigate Chinese parsing from the character-level, extending the notion of phrase-structure trees by annotating internal structures of words. We demonstrate the importance of character-level information to Chinese processing by building a joint segmentation, part-of-speech (POS) tagging and phrase-structure parsing system that integrates character-structure features. Our joint system significantly outperforms a state-of-the-art word-based baseline on the standard CTB5 test, and gives the best published results for Chinese parsing.

Keywords:
Parsing Computer science Natural language processing Artificial intelligence Phrase Character (mathematics) Syntax Chinese characters Segmentation Word (group theory) Speech recognition Linguistics

Metrics

74
Cited By
19.33
FWCI (Field Weighted Citation Impact)
21
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Chinese-Japanese Machine Translation Exploiting Chinese Characters

Chenhui ChuToshiaki NakazawaDaisuke KawaharaSadao Kurohashi

Journal:   ACM Transactions on Asian Language Information Processing Year: 2013 Vol: 12 (4)Pages: 1-25
JOURNAL ARTICLE

Distributional Similarity for Chinese: Exploiting Characters and Radicals

Peng JinJohn CarrollYunfang WuDiana McCarthy

Journal:   Mathematical Problems in Engineering Year: 2012 Vol: 2012 (1)
JOURNAL ARTICLE

Exploiting limited data for parsing

Dongchen LiXiantao ZhangXihong Wu

Year: 2014 Vol: 1 Pages: 171-175
© 2026 ScienceGate Book Chapters — All rights reserved.