JOURNAL ARTICLE

Using Web Corpora for the Automatic Acquisition of Lexical-Semantic Knowledge

Sabine Schulte im WaldeStefan Müller

Year: 2013 Journal:   LDV-Forum/Journal for language technology and computational linguistics Vol: 28 (2)Pages: 85-105

Abstract

This article presents two case studies to explore whether and how web corpora can be used to automatically acquire lexical-semantic knowledge from distributional information. For this purpose, we compare three German web corpora and a traditional newspaper corpus on modelling two types of semantic relatedness: (1) Assuming that free word associations are semantically related to their stimuli, we explore to which extent stimulus– associate pairs from various associations norms are available in the corpus data. (2) Assuming that the distributional similarity between a noun–noun compound and its nominal constituents corresponds to the compound’s degree of compositionality, we rely on simple corpus co-occurrence features to predict compositionality. The case studies demonstrate that the corpora can indeed be used to model semantic relatedness, (1) covering up to 73/77% of verb/noun–association types within a 5-word window of the corpora, and (2) predicting compositionality with a correlation of ρ = 0.65 against human ratings. Furthermore, our studies illustrate that the corpus parameters domain, size and cleanness all have an effect on the semantic tasks.

Keywords:
Computer science Natural language processing Principle of compositionality Artificial intelligence Noun Semantic similarity Verb

Metrics

4
Cited By
0.47
FWCI (Field Weighted Citation Impact)
63
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Second Language Acquisition and Learning
Social Sciences →  Psychology →  Developmental and Educational Psychology

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.