JOURNAL ARTICLE

Table extraction using conditional random fields

Abstract

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form.Their rich combination of formatting and content present difficulties for traditional language modeling techniques, however. This paper presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better. We show experimental results on plain-text government statistical reports in which tables are located with 92% F1, and their constituent lines are classified into 12 table-related categories with 94% accuracy. We also discuss future work on undirected graphical models for segmenting columns, finding cells, and classifying them as data cells or label cells.

Keywords:
Conditional random field CRFS Computer science Table (database) Hidden Markov model Graphical model Information extraction Disk formatting Artificial intelligence Data mining Natural language processing Markov random field Information retrieval Maximum-entropy Markov model Pattern recognition (psychology) Markov chain Machine learning Markov model Segmentation Image segmentation Variable-order Markov model

Metrics

354
Cited By
32.59
FWCI (Field Weighted Citation Impact)
14
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Table extraction using conditional random fields

David PintoAndrew McCallumWei XingW. Bruce Croft

Journal:   Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03 Year: 2003
JOURNAL ARTICLE

Table extraction using conditional random fields

David PintoAndrew McCallumXing WeiW. Bruce Croft

Journal:   International Conference on Digital Government Research Year: 2003 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.