Interpretability Based Neural Network Repair

Zuohui Chen; Jun Zhou; Youcheng Sun; Jingyi Wang; Qi Xuan; Xiaoniu Yang

doi:10.1145/3650212.3680330

ScienceGate Book Chapters

JOURNAL ARTICLE

Interpretability Based Neural Network Repair

Zuohui Chen Jun Zhou Youcheng Sun Jingyi Wang Qi Xuan Xiaoniu Yang

Year: 2024 Pages: 908-919

DOI: 10.1145/3650212.3680330

Get Full-Text PDF Get Analytical Report

Abstract

Along with the prevalent use of deep neural networks (DNNs), concerns have been raised on the security threats from DNNs such as backdoors in the network. While neural network repair methods have shown to be effective for fixing the defects in DNNs, they have been also found to produce biased models, with imbalanced accuracy across different classes, or weakened adversarial robustness, allowing malicious attackers to trick the model by adding small
perturbations. To address these challenges, we propose INNER, an INterpretability-based NEural Repair approach. INNER formulates the idea of neuron routing for identifying fault neurons, in which the interpretability technique model probe is used to evaluate each
neuron’s contribution to the undesired behaviour of the neural network. INNER then optimizes the identified neurons for repairing the neural network. We test INNER on three typical application scenarios, including backdoor attacks, adversarial attacks, and wrong predictions. Our experimental results demonstrate that INNER can effectively repair neural networks, by ensuring accuracy, fairness, and robustness. Moreover, the performance of other repair methods can be also improved by re-using the fault neurons found by INNER, justifying the generality of the proposed approach.

Keywords:

Interpretability Computer science Artificial neural network Artificial intelligence

Metrics

Cited By

0.64

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Interpretability Based Neural Network Repair

Abstract

Metrics

Citation History

Topics

Related Documents

Neural Network Interpretability

A Convolutional Neural Network Reinforcement Framework Based on Neural Network Interpretability

Causality-based neural network repair

CICE: Neural Network Interpretability Based on Conceptual Interference Coefficients

Deep neural network compression through interpretability-based filter pruning