JOURNAL ARTICLE

Exploiting limited data for parsing

Abstract

Data sparsity issues are extremely severe for parser due to the flexibility of tree structures. Many tags and productions appears a little, nevertheless, they are crucial for the parse disambiguation where it occurs. Besides, when a common tag somewhat regularly occurs in a non-canonical position, its distribution is usually distinct. In this paper, we propose a metric that measures the scarcity of any phrase with arbitrary span size. To make a better compromise between training trees with high confidence and scarcity, we try to catch some constraints in response to rare but articulating categories when training latent variable grammar. We exploits the limited data more sufficiently by capturing the depicting power of rate tree structure configuration in Expectation & Maximization procedure and Split & Merge framework. The resulting grammars are interpretable as our intension. Based on this approach, we further propose a method that exploits the limited training date from multiple perspectives, and accumulates their advantages in a product model. Despite its limited training data, out model improves parsing performance on Penn Chinese Treebank Fifth Edition, even higher than some systems with extra unlabeled data and external resources. Furthermore, this method is easy to generalized to cope with data sparsity in other natural language processing tasks.

Keywords:
Computer science Parsing Treebank Exploit Artificial intelligence Dependency grammar Heuristics Natural language processing Machine learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
39
Refs
0.09
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

EXPLOITING SUBTREES IN AUTO‐PARSED DATA TO IMPROVE DEPENDENCY PARSING

Wenliang ChenJun’ichi KazamaKiyotaka UchimotoKentaro Torisawa

Journal:   Computational Intelligence Year: 2012 Vol: 28 (3)Pages: 426-451
JOURNAL ARTICLE

Transition-Based Dependency Parsing Exploiting Supertags

Hiroki OuchiKevin DuhHiroyuki ShindoYūji Matsumoto

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2016 Vol: 24 (11)Pages: 2059-2068
© 2026 ScienceGate Book Chapters — All rights reserved.