JOURNAL ARTICLE

Probing the Robustness of Pre-trained Language Models for Entity Matching

Mehdi Akbarian RastaghiEhsan KamallooDavood Rafiei

Year: 2022 Journal:   Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pages: 3786-3790

Abstract

The paradigm of fine-tuning Pre-trained Language Models (PLMs) has been successful in Entity Matching (EM). Despite their remarkable performance, PLMs exhibit tendency to learn spurious correlations from training data. In this work, we aim at investigating whether PLM-based entity matching models can be trusted in real-world applications where data distribution is different from that of training. To this end, we design an evaluation benchmark to assess the robustness of EM models to facilitate their deployment in the real-world settings. Our assessments reveal that data imbalance in the training data is a key problem for robustness. We also find that data augmentation alone is not sufficient to make a model robust. As a remedy, we prescribe simple modifications that can improve the robustness of PLM-based EM models. Our experiments show that while yielding superior results for in-domain generalization, our proposed model significantly improves the model robustness, compared to state-of-the-art EM models.

Keywords:
Robustness (evolution) Computer science Spurious relationship Machine learning Software deployment Artificial intelligence Training set Data modeling Data mining Database Software engineering

Metrics

15
Cited By
4.35
FWCI (Field Weighted Citation Impact)
17
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.