Black-Box Adversarial Attacks Against Language Model Detector

Huyen Ha; Duc Tran; Dukyun Kim

doi:10.1145/3628797.3628949

ScienceGate Book Chapters

JOURNAL ARTICLE

Black-Box Adversarial Attacks Against Language Model Detector

Huyen Ha Duc Tran Dukyun Kim

Year: 2023 Pages: 754-760

DOI: 10.1145/3628797.3628949

Get Full-Text PDF Get Analytical Report

Abstract

The Language Model (LM) Detector has gained attention for its remarkable performance in detecting machine-generated texts. It however remains unclear how this detector would perform against different adversarial attacks. In this paper, we aim to address this question by conducting a systematic analysis on the resilience of the LM Detector against eight black-box adversarial attack methods. We also propose a new technique, called StrictPWWS that introduces the semantic similarity constraint into the conventional Probability Weighted Word Saliency (PWWS). Our finding reveals that the selection of a search algorithm helps the attack methods generate better adversarial samples that can bypass the LM Detector. Moreover, tightening linguistic constraints emerges as an effective way to improve the attack success rate. StrictPWWS demonstrates achieving superior performance compared to other adversarial attack methods.

Keywords:

Adversarial system Computer science Detector Language model Selection (genetic algorithm) Constraint (computer-aided design) Artificial intelligence Word (group theory) Adversarial machine learning Resilience (materials science) Machine learning Engineering Mathematics

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.59

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Malware Detection Techniques

Physical Sciences → Computer Science → Signal Processing

Black-Box Adversarial Attacks Against Language Model Detector

Abstract

Metrics

Citation History

Topics

Related Documents

Black-Box Adversarial Attacks Against SQL Injection Detection Model

Boundary Defense Against Black-box Adversarial Attacks

Single-Shot Black-Box Adversarial Attacks Against Malware Detectors: A Causal Language Model Approach

Black-box adversarial attacks

Black-Box Adversarial Attacks against Audio Forensics Models