Numerous methods have been developed to address the critical need to understand the behavior of AI systems. Arguably, the most popular are model-agnostic local explanation techniques, which focus on examining model behavior for individual instances. While several implementations have been proposed, comparatively less attention has been paid to assessing the robustness and transferability of the generated explanations to unseen data. More importantly, most robustness analyzes have focused on differentiable models and deep neural networks. In this paper, we analyze the robustness of two well-known model-agnostic explanation methods, LIME and SHAP, from a methodological perspective and propose a criterion to measure the transferability of explanations from the training to the testing phases. Therefore, the proposed methodology validates explanations not only in terms of model performance but also in terms of their robustness during the learning process. We conclude that the transferability of SHAP explanations is better in sparse or low-density data sets than that of LIME, while the opposite is true for very dense data sets. We also observed that there are no significant differences between the results obtained for different machine learning models combined with these two model-agnostic techniques.
Sewwandi, Rajapaksha Hewa Ranasinghage Dilini
Sewwandi, Rajapaksha Hewa Ranasinghage Dilini
Sai Ram Aditya ParisineniMayukha Pal
Amir Hossein Akhavan RahnamaLaura Galera AlfaroZhendong WangMaria Movin