JOURNAL ARTICLE

Rethinking Feature Attribution for Robust Image Classification

Revista, ZenIA, 10

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Feature attribution methods aim to explain the predictions of deep learning models by identifying the input features that are most relevant to the model's decision. However, these methods are often sensitive to small perturbations in the input, leading to unstable and unreliable explanations, especially when models are deployed in real-world scenarios where robustness is paramount. This paper investigates the limitations of current feature attribution techniques in the context of robust image classification. We propose a novel approach that integrates adversarial training with feature attribution to generate more robust and faithful explanations. Our method, termed "Robust Attribution through Adversarial Perturbation" (RAAP), leverages adversarial examples to identify and mitigate attribution biases. We evaluate RAAP on several benchmark datasets and demonstrate that it produces feature attributions that are both more stable under input perturbations and more aligned with human perception. Furthermore, we show that RAAP can be used to improve the robustness of image classification models by identifying and correcting spurious correlations learned during training. Our results highlight the importance of considering robustness when evaluating and deploying feature attribution methods in safety-critical applications.

Keywords:
Robustness (evolution) Adversarial system Spurious relationship Attribution Pattern recognition (psychology) Feature extraction Feature (linguistics) Contextual image classification

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Explainable Artificial Intelligence (XAI)
Physical Sciences →  Computer Science →  Artificial Intelligence
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.