Automatically creating datasets for measures of semantic relatedness

Torsten Zesch; Iryna Gurevych

doi:10.3115/1641976.1641980

ScienceGate Book Chapters

JOURNAL ARTICLE

Automatically creating datasets for measures of semantic relatedness

Torsten Zesch Iryna Gurevych

Year: 2006 Pages: 16-24

DOI: 10.3115/1641976.1641980

Get Full-Text PDF Get Analytical Report

Abstract

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.

Keywords:

Computer science Semantic similarity Natural language processing Cover (algebra) Artificial intelligence Domain (mathematical analysis) Test (biology) Information retrieval Distributional semantics Mathematics

Metrics

Cited By

5.50

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Automatically creating datasets for measures of semantic relatedness

Abstract

Metrics

Citation History

Topics

Related Documents

A Survey of Semantic Relatedness Measures

How well do semantic relatedness measures perform?

Automatically generating hypertext in newspaper articles by computing semantic relatedness

A survey of semantic relatedness evaluation datasets and procedures

Evaluation of Semantic Relatedness Measures for Turkish Language