Probing the Robustness of Pre-trained Language Models for Entity Matching

Mehdi Akbarian Rastaghi; Ehsan Kamalloo; Davood Rafiei

doi:10.1145/3511808.3557673

ScienceGate Book Chapters

JOURNAL ARTICLE

Probing the Robustness of Pre-trained Language Models for Entity Matching

Mehdi Akbarian Rastaghi Ehsan Kamalloo Davood Rafiei

Year: 2022 Journal: Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pages: 3786-3790

DOI: 10.1145/3511808.3557673

Get Full-Text PDF Get Analytical Report

Abstract

The paradigm of fine-tuning Pre-trained Language Models (PLMs) has been successful in Entity Matching (EM). Despite their remarkable performance, PLMs exhibit tendency to learn spurious correlations from training data. In this work, we aim at investigating whether PLM-based entity matching models can be trusted in real-world applications where data distribution is different from that of training. To this end, we design an evaluation benchmark to assess the robustness of EM models to facilitate their deployment in the real-world settings. Our assessments reveal that data imbalance in the training data is a key problem for robustness. We also find that data augmentation alone is not sufficient to make a model robust. As a remedy, we prescribe simple modifications that can improve the robustness of PLM-based EM models. Our experiments show that while yielding superior results for in-domain generalization, our proposed model significantly improves the model robustness, compared to state-of-the-art EM models.

Keywords:

Robustness (evolution) Computer science Spurious relationship Machine learning Software deployment Artificial intelligence Training set Data modeling Data mining Database Software engineering

Metrics

Cited By

4.35

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Probing the Robustness of Pre-trained Language Models for Entity Matching

Abstract

Metrics

Citation History

Topics

Related Documents

Probing the Robustness of Pre-trained Language Models for Structured and Unstructured Entity Matching

Deep entity matching with pre-trained language models

Schema-Agnostic Entity Matching using Pre-trained Language Models

SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching

JointMatcher: Numerically-aware entity matching using pre-trained language models with attention concentration