JOURNAL ARTICLE

Snowball: extracting relations from large plain-text collections

Agichtein, EugeneGravano, Luis

Year: 2000 Journal:   Columbia Academic Commons (Columbia University)   Publisher: Columbia University

Abstract

Text documents often contain valuable structured data that is hidden Yin regular English sentences. This data is best exploited infavailable as arelational table that we could use for answering precise queries or running data mining tasks.We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection.We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents.At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.

Keywords:
Tuple Snowball sampling Scalability Table (database) Information extraction Field (mathematics) Quality (philosophy) Data extraction

Metrics

73
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Mycorrhizal Fungi and Plant Interactions
Life Sciences →  Agricultural and Biological Sciences →  Plant Science
Genomics and Phylogenetic Studies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Plant Pathogens and Fungal Diseases
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Cell Biology

Related Documents

JOURNAL ARTICLE

Extracting Relations from Large Plain-Text Collections

Eugene AgichteinLuis Gravano

Journal:   Columbia Academic Commons (Columbia University) Year: 1999
JOURNAL ARTICLE

KELVIN: Extracting Knowledge from Large Text Collections

James MayfieldPaul McNameeCraig HarmonTim FininDawn Lawrie

Journal:   Maryland Shared Open Access Repository (USMAI Consortium) Year: 2014
JOURNAL ARTICLE

Combining Strategies for Extracting Relations from Text Collections

Eugene AgichteinEleazar EskinLuis Gravano

Journal:   Columbia Academic Commons (Columbia University) Year: 2000 Pages: 86-95
DISSERTATION

Extracting Named Entity Relations from Large Text Corpora

Tohru Hirano

University:   NAIST Digital Library (Nara Institute of Science and Technology) Year: 2012
© 2026 ScienceGate Book Chapters — All rights reserved.