The increasing complexity and widespread deployment of Artificial Intelligence (AI) models, particularly deep neural networks, necessitate robust and trustworthy interpretability mechanisms. Current explainable AI (XAI) techniques, such as saliency maps and feature importance methods, often provide explanations based on correlations rather than true causal relationships, leading to instability, susceptibility to adversarial perturbations, and limited actionable insights. This paper introduces the concept of Causal Saliency, a novel approach to AI interpretation that leverages counterfactual explanations to identify features whose causal perturbation minimally but effectively alters a model's prediction. By grounding explanations in a causal understanding of the data-generating process, Causal Saliency offers inherently more robust, faithful, and actionable interpretations than traditional associative methods. We propose a framework for generating causal counterfactuals based on structural causal models, which not only highlights critical features but also demonstrates *how* specific changes in these features causally lead to different outcomes. This methodology enhances transparency, fosters greater trust in AI systems, and provides clear pathways for debugging, improving fairness, and ensuring the reliability of AI applications in high-stakes domains.
Victor GuyomardFrançoise FessantThomas GuyetTassadit BouadiAlexandre Termier
Alice McEleneyRuth M. J. Byrne
Junqi JiangLuca MarzariAaryan PurohitFrancesco Leofante
Junqi JiangLuca MarzariAaryan PurohitFrancesco Leofante