Matching and integration across heterogeneous data sources

Patrick Pantel; Andrew Philpot; Eduard Hovy

doi:10.1145/1146598.1146738

ScienceGate Book Chapters

JOURNAL ARTICLE

Matching and integration across heterogeneous data sources

Patrick Pantel Andrew Philpot Eduard Hovy

Year: 2006 Pages: 438-439

DOI: 10.1145/1146598.1146738

Get Full-Text PDF Get Analytical Report

Abstract

A sea of undifferentiated information is forming from the body of data that is collected by people and organizations, across government, for different purposes, at different times, and using different methodologies. The resulting massive data heterogeneity requires automatic methods for data alignment, matching and/or merging. In this poster, we describe two systems, Guspin™ and Sift™, for automatically identifying equivalence classes and for aligning data across databases. Our technology, based on principles of information theory, measures the relative importance of data, leveraging them to quantify the similarity between entities. These systems have been applied to solve real problems faced by the Environmental Protection Agency and its counterparts at the state and local government level.

Keywords:

Computer science Data integration Matching (statistics) Data science Data mining Statistics

Metrics

Cited By

1.61

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

Advanced Database Systems and Queries

Physical Sciences → Computer Science → Computer Networks and Communications

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Matching and integration across heterogeneous data sources

Abstract

Metrics

Citation History

Topics

Related Documents

Semantic matching across heterogeneous data sources

Entity Matching Across Multiple Heterogeneous Data Sources

Entity Matching across Heterogeneous Sources

High-performance spatiotemporal trajectory matching across heterogeneous data sources

Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information