JOURNAL ARTICLE

WeDef: Weakly Supervised Backdoor Defense for Text Classification

Abstract

Existing backdoor defense methods are only effective for limited trigger types. To defend different trigger types at once, we start from the class-irrelevant nature of the poisoning process and propose a novel weakly supervised backdoor defense framework WeDef. Recent advances in weak supervision make it possible to train a reasonably accurate text classifier using only a small number of user-provided, class-indicative seed words. Such seed words shall be considered independent of the triggers. Therefore, a weakly supervised text classifier trained by only the poisoned documents without their labels will likely have no backdoor. Inspired by this observation, in WeDef, we define the reliability of samples based on whether the predictions of the weak classifier agree with their labels in the poisoned training set. We further improve the results through a two-phase sanitization: (1) iteratively refine the weak classifier based on the reliable samples and (2) train a binary poison classifier by distinguishing the most unreliable samples from the most reliable samples. Finally, we train the sanitized model on the samples that the poison classifier predicts as benign. Extensive experiments show that WeDef is effective against popular trigger-based attacks (e.g., words, sentences, and paraphrases), outperforming existing defense methods.

Keywords:
Backdoor Classifier (UML) Computer science Artificial intelligence Binary classification Training set Machine learning Binary number Pattern recognition (psychology) Mathematics Computer security Support vector machine

Metrics

4
Cited By
0.78
FWCI (Field Weighted Citation Impact)
26
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Weakly Supervised Text Classification

Yu Meng

Synthesis lectures on data mining and knowledge discovery Year: 2019 Pages: 49-70
JOURNAL ARTICLE

Weakly-Supervised Hierarchical Text Classification

Meng YuJiaming ShenChao ZhangJiawei Han

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2019 Vol: 33 (01)Pages: 6826-6833
BOOK-CHAPTER

Weakly Supervised Hierarchical Text Classification

Meng Yu

Synthesis lectures on data mining and knowledge discovery Year: 2019 Pages: 71-87
© 2026 ScienceGate Book Chapters — All rights reserved.