JOURNAL ARTICLE

Matching and integration across heterogeneous data sources

Abstract

A sea of undifferentiated information is forming from the body of data that is collected by people and organizations, across government, for different purposes, at different times, and using different methodologies. The resulting massive data heterogeneity requires automatic methods for data alignment, matching and/or merging. In this poster, we describe two systems, Guspin™ and Sift™, for automatically identifying equivalence classes and for aligning data across databases. Our technology, based on principles of information theory, measures the relative importance of data, leveraging them to quantify the similarity between entities. These systems have been applied to solve real problems faced by the Environmental Protection Agency and its counterparts at the state and local government level.

Keywords:
Computer science Data integration Matching (statistics) Data science Data mining Statistics

Metrics

5
Cited By
1.61
FWCI (Field Weighted Citation Impact)
17
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Semantic matching across heterogeneous data sources

Huimin Zhao

Journal:   Communications of the ACM Year: 2007 Vol: 50 (1)Pages: 45-50
BOOK-CHAPTER

Entity Matching Across Multiple Heterogeneous Data Sources

Chao KongMing GaoXu ChenWeining QianAoying Zhou

Lecture notes in computer science Year: 2016 Pages: 133-146
JOURNAL ARTICLE

High-performance spatiotemporal trajectory matching across heterogeneous data sources

Xuri GongZhou HuangYaoli WangLun WuYu Liu

Journal:   Future Generation Computer Systems Year: 2019 Vol: 105 Pages: 148-161
JOURNAL ARTICLE

Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Huimin Zhao

Journal:   Journal of Database Management Year: 2010 Vol: 21 (4)Pages: 91-110
© 2026 ScienceGate Book Chapters — All rights reserved.