Rethinking Feature Attribution for Robust Image Classification

Revista, Zen; IA, 10

doi:10.5281/zenodo.17821158

ScienceGate Book Chapters

JOURNAL ARTICLE

Rethinking Feature Attribution for Robust Image Classification

Revista, Zen IA, 10

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.17821158

Get Full-Text PDF Get Analytical Report

Abstract

Feature attribution methods aim to explain the predictions of deep learning models by identifying the input features that are most relevant to the model's decision. However, these methods are often sensitive to small perturbations in the input, leading to unstable and unreliable explanations, especially when models are deployed in real-world scenarios where robustness is paramount. This paper investigates the limitations of current feature attribution techniques in the context of robust image classification. We propose a novel approach that integrates adversarial training with feature attribution to generate more robust and faithful explanations. Our method, termed "Robust Attribution through Adversarial Perturbation" (RAAP), leverages adversarial examples to identify and mitigate attribution biases. We evaluate RAAP on several benchmark datasets and demonstrate that it produces feature attributions that are both more stable under input perturbations and more aligned with human perception. Furthermore, we show that RAAP can be used to improve the robustness of image classification models by identifying and correcting spurious correlations learned during training. Our results highlight the importance of considering robustness when evaluating and deploying feature attribution methods in safety-critical applications.

Keywords:

Robustness (evolution) Adversarial system Spurious relationship Attribution Pattern recognition (psychology) Feature extraction Feature (linguistics) Contextual image classification

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Rethinking Feature Attribution for Robust Image Classification

Abstract

Metrics

Topics

Related Documents

Rethinking Feature Attribution for Robust Image Classification

Category-aware feature attribution for Self-Optimizing medical image classification

Rethinking Feature Distribution for Loss Functions in Image Classification

Efficient Feature Extraction for Robust Image Classification and Retrieval

Unsupervised noise-robust feature extraction for aerial image classification