JOURNAL ARTICLE

Schema-Agnostic Entity Matching using Pre-trained Language Models

Abstract

Entity matching (EM) is the process of linking records from different data sources. While extensive research has been done in various aspects of EM, many of these studies generally assume EM tasks as schema-specific, which attempt to match record pairs at attributes level. Unfortunately, in the real-world, tables that undergo EM may not have an aligned schema, and often, the schema or metadata of the table and attributes are not known beforehand.In view of this challenge, this paper presents an effective approach for schema-agnostic EM, where having schema-aligned tables is not compulsory. The proposed method stemmed from the idea of treating tuples in tables for EM similar to sentence pair classification problem in natural language processing (NLP). A pre-trained language model, BERT is adopted by fine-tuning it using labeled dataset. The proposed method was experimented using benchmark datasets and compared against two state-of-the-art approaches,namely DeepMatcher and Magellan. The experimental results show that our proposed solution outperforms by an average of 9% in F1 score. The performance is in fact consistent across different types of datasets, showing significant improvement of 29.6% for one of dirty datasets. These prove that our proposed solution is versatile for EM.

Keywords:
Computer science Schema (genetic algorithms) Schema matching Natural language processing Tuple Sentence Metadata Artificial intelligence Star schema Data mining Information retrieval Database schema Data integration Database design

Metrics

19
Cited By
2.60
FWCI (Field Weighted Citation Impact)
15
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Deep entity matching with pre-trained language models

Yuliang LiJinfeng LiYoshihiko SuharaAnHai DoanWang-Chiew Tan

Journal:   Proceedings of the VLDB Endowment Year: 2020 Vol: 14 (1)Pages: 50-60
JOURNAL ARTICLE

Probing the Robustness of Pre-trained Language Models for Entity Matching

Mehdi Akbarian RastaghiEhsan KamallooDavood Rafiei

Journal:   Proceedings of the 31st ACM International Conference on Information & Knowledge Management Year: 2022 Pages: 3786-3790
JOURNAL ARTICLE

SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching

Huahua DingChaofan DaiYahui WuWubin MaHaohao Zhou

Journal:   Knowledge-Based Systems Year: 2024 Vol: 293 Pages: 111708-111708
© 2026 ScienceGate Book Chapters — All rights reserved.