JOURNAL ARTICLE

Rethinking Textual Adversarial Defense for Pre-Trained Language Models

Jiayi WangRongzhou BaoZhuosheng ZhangHai Zhao

Year: 2022 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 30 Pages: 2526-2540   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels (sentence / word / character), adversarial attacks can fool PrLMs to generate incorrect predictions, which questions the robustness of PrLMs. However, we find that most existing textual adversarial examples are unnatural, which can be easily distinguished by both human and machine. Based on a general anomaly detector, we propose a novel metric (Degree of Anomaly) as a constraint to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. Under this new constraint, the success rate of existing attacks drastically decreases, which reveals that the robustness of PrLMs is not as fragile as they claimed. In addition, we find that four types of randomization can invalidate a large portion of textual adversarial examples. Based on anomaly detector and randomization, we design a universal defense framework, which is among the first to perform textual adversarial defense without knowing the specific attack. Empirical results show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses, while preserving higher original accuracy at the same time. Our work discloses the essence of textual adversarial attacks, and indicates that (i) further works of adversarial attacks should focus more on how to overcome the detection and resist the randomization, otherwise their adversarial examples would be easily detected and invalidated; and (ii) compared with the unnatural and perceptible adversarial examples, it is those undetectable adversarial examples that pose real risks for PrLMs and require more attention for future robustness-enhancing strategies.

Keywords:
Adversarial system Computer science Artificial intelligence Robustness (evolution) Anomaly detection Sentence Machine learning

Metrics

16
Cited By
3.13
FWCI (Field Weighted Citation Impact)
74
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Efficient Key-Based Adversarial Defense for ImageNet by Using Pre-Trained Models

AprilPyone MaungMaungIsao EchizenHitoshi Kiya

Journal:   IEEE Open Journal of Signal Processing Year: 2024 Vol: 5 Pages: 902-913
BOOK-CHAPTER

Pre-trained Language Models

Huaping ZhangJianyun Shang

Year: 2025 Pages: 73-90
BOOK-CHAPTER

Pre-trained Language Models

Gerhard PaaßSven Giesselbach

Artificial intelligence: foundations, theory, and algorithms/Artificial intelligence: Foundations, theory, and algorithms Year: 2023 Pages: 19-78
© 2026 ScienceGate Book Chapters — All rights reserved.