JOURNAL ARTICLE

Black-Box Adversarial Attacks Against Language Model Detector

Abstract

The Language Model (LM) Detector has gained attention for its remarkable performance in detecting machine-generated texts. It however remains unclear how this detector would perform against different adversarial attacks. In this paper, we aim to address this question by conducting a systematic analysis on the resilience of the LM Detector against eight black-box adversarial attack methods. We also propose a new technique, called StrictPWWS that introduces the semantic similarity constraint into the conventional Probability Weighted Word Saliency (PWWS). Our finding reveals that the selection of a search algorithm helps the attack methods generate better adversarial samples that can bypass the LM Detector. Moreover, tightening linguistic constraints emerges as an effective way to improve the attack success rate. StrictPWWS demonstrates achieving superior performance compared to other adversarial attack methods.

Keywords:
Adversarial system Computer science Detector Language model Selection (genetic algorithm) Constraint (computer-aided design) Artificial intelligence Word (group theory) Adversarial machine learning Resilience (materials science) Machine learning Engineering Mathematics

Metrics

1
Cited By
0.26
FWCI (Field Weighted Citation Impact)
4
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Black-Box Adversarial Attacks Against SQL Injection Detection Model

Maha AlqhtaniDaniyal AlghazzawiSuaad Alarifi

Journal:   Contemporary Mathematics Year: 2024 Pages: 5098-5112
JOURNAL ARTICLE

Boundary Defense Against Black-box Adversarial Attacks

Manjushree B. AithalXiaohua Li

Journal:   2022 26th International Conference on Pattern Recognition (ICPR) Year: 2022 Pages: 2349-2356
BOOK-CHAPTER

Black-box adversarial attacks

Pin‐Yu ChenCho‐Jui Hsieh

Elsevier eBooks Year: 2022 Pages: 29-46
JOURNAL ARTICLE

Black-Box Adversarial Attacks against Audio Forensics Models

Yi JiangDengpan Ye

Journal:   Security and Communication Networks Year: 2022 Vol: 2022 Pages: 1-8
© 2026 ScienceGate Book Chapters — All rights reserved.