Automatic content-based categorization of Wikipedia articles

Zeno Gantner; Lars Schmidt-Thieme

doi:10.3115/1699765.1699770

ScienceGate Book Chapters

JOURNAL ARTICLE

Automatic content-based categorization of Wikipedia articles

Zeno Gantner Lars Schmidt-Thieme

Year: 2009 Pages: 32-37

DOI: 10.3115/1699765.1699770

Get Full-Text PDF Get Analytical Report

Abstract

Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse -- using text classification methods for predicting the categories of Wikipedia articles -- has attracted less attention so far. We propose to "return the favor" and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine learning/NLP methods.

Keywords:

Computer science Categorization Crowds Text categorization Information retrieval Hierarchy Artificial intelligence Natural language processing Keyword extraction

Metrics

Cited By

2.08

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Wikis in Education and Collaboration

Social Sciences → Social Sciences → Communication

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Automatic content-based categorization of Wikipedia articles

Abstract

Metrics

Citation History

Topics

Related Documents

WikiAutoCat: Information Retrieval System for Automatic Categorization of Wikipedia Articles

Content driven automatic categorization of research articles

A portable multilingual medical directory by automatic categorization of Wikipedia articles

Weakly-Supervised Neural Categorization of Wikipedia Articles

Categorization of Wikipedia Articles with Spectral Clustering