Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Yiran Xu; Nan Zhong; Guang Li; Anda Cheng; Yinggui Wang; Zhenxing Qian; Xinpeng Zhang

doi:10.24963/ijcai.2025/68

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Yiran Xu Nan Zhong Guang Li Anda Cheng Yinggui Wang Zhenxing Qian Xinpeng Zhang

Year: 2025 Pages: 601-609

DOI: 10.24963/ijcai.2025/68

Get Full-Text PDF Get Analytical Report

Abstract

Text-to-image (T2I) diffusion models exhibit impressive generation capabilities in recently studies. However, they are vulnerable to backdoor attacks, where model outputs are manipulated by malicious triggers. In this paper, we propose a novel input-level defense method, called Fine-grained Prompt Screening (GrainPS). Our method is motivated by the phenomenon, i.e., Semantics Misalignment, where the backdoor trigger causes the inconsistency between the cross-attention projections of object words (the key words to determine the main content of the generated image) and their true semantics. In particular, we divide each prompt into pieces and conduct fine-grained analysis by examining the impact of the trigger on object words in the cross-attention layers rather than their global influence on the entire generated image. To assess the impact of each word on object words, we formulate "semantics alignment score'' as the metric with a carefully crafted detection strategy to identify the trigger. Therefore, our implementation can detect backdoor input prompts and localize of triggers simultaneously. Evaluations across four advanced backdoor attack scenarios demonstrate the effectiveness of our proposed defense method.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.38

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Steganography and Watermarking Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Abstract

Metrics

Topics

Related Documents

Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models

Prompt suffix-attack against text-to-image diffusion models

Unified Prompt Attack Against Text-to-Image Generation Models

Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models