Exploiting limited data for parsing

Dongchen Li; Xiantao Zhang; Xihong Wu

doi:10.1109/icis.2014.6912128

ScienceGate Book Chapters

JOURNAL ARTICLE

Exploiting limited data for parsing

Dongchen Li Xiantao Zhang Xihong Wu

Year: 2014 Vol: 1 Pages: 171-175

DOI: 10.1109/icis.2014.6912128

Get Full-Text PDF Get Analytical Report

Abstract

Data sparsity issues are extremely severe for parser due to the flexibility of tree structures. Many tags and productions appears a little, nevertheless, they are crucial for the parse disambiguation where it occurs. Besides, when a common tag somewhat regularly occurs in a non-canonical position, its distribution is usually distinct. In this paper, we propose a metric that measures the scarcity of any phrase with arbitrary span size. To make a better compromise between training trees with high confidence and scarcity, we try to catch some constraints in response to rare but articulating categories when training latent variable grammar. We exploits the limited data more sufficiently by capturing the depicting power of rate tree structure configuration in Expectation & Maximization procedure and Split & Merge framework. The resulting grammars are interpretable as our intension. Based on this approach, we further propose a method that exploits the limited training date from multiple perspectives, and accumulates their advantages in a product model. Despite its limited training data, out model improves parsing performance on Penn Chinese Treebank Fifth Edition, even higher than some systems with extra unlabeled data and external resources. Furthermore, this method is easy to generalized to cope with data sparsity in other natural language processing tasks.

Keywords:

Computer science Parsing Treebank Exploit Artificial intelligence Dependency grammar Heuristics Natural language processing Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

Exploiting limited data for parsing

Abstract

Metrics

Topics

Related Documents

Chinese Parsing Exploiting Characters

EXPLOITING SUBTREES IN AUTO‐PARSED DATA TO IMPROVE DEPENDENCY PARSING

Exploiting heterogeneous treebanks for parsing

Exploiting limited data for quality assurance of batteries

Transition-Based Dependency Parsing Exploiting Supertags