JOURNAL ARTICLE

Automatically generating Wikipedia articles

Abstract

In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structure of human-authored texts to automatically induce a domain-specific template for the topic structure of a new overview. The algorithmic innovation of our work is a method to learn topic-specific extractors for content selection jointly for the entire template. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview. The results of our evaluation confirm the benefits of incorporating structural information into the content selection process.

Keywords:
Computer science Selection (genetic algorithm) Process (computing) The Internet Coherence (philosophical gambling strategy) Domain (mathematical analysis) Information retrieval Integer programming Linear programming Artificial intelligence Machine learning World Wide Web Algorithm Programming language

Metrics

137
Cited By
19.06
FWCI (Field Weighted Citation Impact)
36
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Wikis in Education and Collaboration
Social Sciences →  Social Sciences →  Communication
© 2026 ScienceGate Book Chapters — All rights reserved.