JOURNAL ARTICLE

Subword-based tagging by conditional random fields for Chinese word segmentation

Abstract

We proposed two approaches to improve Chinese word segmentation: a subword-based tagging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.

Keywords:
Conditional random field Computer science Segmentation Vocabulary Artificial intelligence Natural language processing Word (group theory) Character (mathematics) Text segmentation Measure (data warehouse) Speech recognition Pattern recognition (psychology) Mathematics Linguistics Data mining

Metrics

58
Cited By
10.61
FWCI (Field Weighted Citation Impact)
5
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.