Abstract

This paper presents a methodology for the automatic acquisition of lexical and morpho-syntactic information from raw corpora. The system uses information about the inflectional morphology declared by rules and is based on the co-occurrence of different forms of the same paradigm in the corpus. A direct application of this methodology gives very poor precision rates due to rule interaction between paradigms. We present a rule analysis algorithm that solves this problem, giving quite better precision rates, although recall decreases dramatically. Finally, we investigate some techniques to raise the recall, achieving recall rates around 67% with a precision of 92%.

Keywords:
Computer science Recall Natural language processing Recall rate Artificial intelligence Precision and recall Encoding (memory) Raw data Linguistics Programming language

Metrics

4
Cited By
0.38
FWCI (Field Weighted Citation Impact)
7
Refs
0.70
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

DISSERTATION

Lexical knowledge acquisition from bilingual corpora

武仁 宇津呂

University:   Medical Entomology and Zoology Year: 1994
BOOK-CHAPTER

The Acquisition of Some Lexical Constraints from Corpora

Goran Nenadić‪Irena Spasić

Lecture notes in computer science Year: 1999 Pages: 115-120
JOURNAL ARTICLE

Using Web Corpora for the Automatic Acquisition of Lexical-Semantic Knowledge

Sabine Schulte im WaldeStefan Müller

Journal:   LDV-Forum/Journal for language technology and computational linguistics Year: 2013 Vol: 28 (2)Pages: 85-105
© 2026 ScienceGate Book Chapters — All rights reserved.