JOURNAL ARTICLE

Context aware ontology based information extraction

Sapan ShahSreedhar Reddy

Year: 2012 Journal:   International Conference on Management of Data Pages: 32-43

Abstract

We have developed an ontology based information extraction system where property and relation name occurrences are used to identify domain entities using patterns written in terms of dependency relations. Our key intuition is that, with respect to a given ontology, properties and relations are much easier to identify than entities, as the former generally occur in a limited number of terminological variations. Once identified, properties and relations provide cues to identify related entities. To achieve this, we have developed a pattern language which uses the grammatical relations of dependency parsing as well as linguistic features over text fragments. Ontology constructs such as classes, properties and relations are integral to pattern specification and provide a means for extracting entities and property values. The pattern matcher uses the patterns to construct an object graph from a text document. The object graph comprises entity, property and relation nodes. We have developed a global context aware algorithm to determine the ontological types of these nodes. Type of one node can help determine the types of other related nodes. We use the concept of entropy to measure the uncertainty associated with the type of a node. The type information is then propagated through the graph from low entropy nodes to high entropy nodes in an iterative fashion. We show how the global propagation algorithm does better than a local algorithm in determining the types of nodes. The main contributions of this paper are: an ontology aware pattern language; a global context aware type identification algorithm.

Keywords:
Computer science Parsing Ontology Dependency graph Theoretical computer science Entropy (arrow of time) Natural language processing Information extraction Rule-based machine translation Graph Artificial intelligence Data mining

Metrics

1
Cited By
0.23
FWCI (Field Weighted Citation Impact)
17
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Service-Oriented Architecture and Web Services
Physical Sciences →  Computer Science →  Information Systems
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.