Bradley GathersIan E. NielsenKeith W. SoulesOzan TekbenRavi P. RamachandranNidhal BouaynayaHassan M. Fathallah‐ShaykhGhulam Rasool
Trust is a critical factor in the safe and effective deployment of Artificial Intelligence (AI) models in essential tasks. With AI models increasingly being employed in domains such as self-driving cars, medicine, defense, and information technology, there is a pressing need to improve their explainability, trustworthiness, and interpretability. Feature Visualization (FV) is an approach that generates images to highlight the learned features of deep neural networks. However, numerous FV methods exist, and there is currently no standard framework to evaluate their effectiveness in improving model trust. This paper introduces a novel method, Integrating Activations to Evaluate Faithfulness (IntActEval), which quantitatively assesses FV methods by analyzing the faithfulness and accuracy of their visualizations to the model itself. We examined five FV methods across seven convolutional neural network (CNN) models. The Vanilla (unregularized) and Gaussian Noise (regularized) FV techniques produced the most faithful explanations for all seven models, with statistical significance for the tested data. In our CNN experiments, robustly trained models achieve the most plausible results. This paper provides a general guide to current FV methods and identifies the most reliable and effective techniques for enhancing the debugging and improvement of AI models.
Yuya AsazumaKazuaki HanawaKentaro Inui