JOURNAL ARTICLE

Black-Box Attacks on Image Activity Prediction and its Natural Language Explanations

Abstract

Explainable AI (XAI) methods aim to describe the decision process of deep neural networks. Early XAI methods produced visual explanations, whereas more recent techniques generate multimodal explanations that include textual information and visual representations. Visual XAI methods have been shown to be vulnerable to white-box and gray-box adversarial attacks, with an attacker having full or partial knowledge of and access to the target system. As the vulnerabilities of multimodal XAI models have not been examined, in this paper we assess for the first time the robustness to black-box attacks of the natural language explanations generated by a self-rationalizing image-based activity recognition model. We generate unrestricted, spatially variant perturbations that disrupt the association between the predictions and the corresponding explanations to mislead the model into generating unfaithful explanations. We show that we can create adversarial images that manipulate the explanations of an activity recognition model by having access only to its final output.

Keywords:
Computer science Robustness (evolution) Adversarial system Artificial intelligence Black box Natural language Image (mathematics) Visualization Natural (archaeology) Deep neural networks White box Artificial neural network Machine learning Natural language processing

Metrics

2
Cited By
0.51
FWCI (Field Weighted Citation Impact)
77
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Explainable Artificial Intelligence (XAI)
Physical Sciences →  Computer Science →  Artificial Intelligence
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Makrut Attacks Against Black-Box Explanations

Achyut HegdeMaximilian NoppelChristian Wressnegger

Lecture notes in computer science Year: 2025 Pages: 289-293
JOURNAL ARTICLE

Generating Natural Language Attacks in a Hard Label Black Box Setting

Rishabh MaheshwarySaket MaheshwaryVikram Pudi

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2021 Vol: 35 (15)Pages: 13525-13533
JOURNAL ARTICLE

Generating Natural Language Attacks in a Hard Label Black Box Setting

Rishabh MaheshwarySaket MaheshwaryVikram Pudi

Journal:   arXiv (Cornell University) Year: 2020 Pages: 13525-13533
© 2026 ScienceGate Book Chapters — All rights reserved.