Abstract

Along with the prevalent use of deep neural networks (DNNs), concerns have been raised on the security threats from DNNs such as backdoors in the network. While neural network repair methods have shown to be effective for fixing the defects in DNNs, they have been also found to produce biased models, with imbalanced accuracy across different classes, or weakened adversarial robustness, allowing malicious attackers to trick the model by adding small
perturbations. To address these challenges, we propose INNER, an INterpretability-based NEural Repair approach. INNER formulates the idea of neuron routing for identifying fault neurons, in which the interpretability technique model probe is used to evaluate each
neuron’s contribution to the undesired behaviour of the neural network. INNER then optimizes the identified neurons for repairing the neural network. We test INNER on three typical application scenarios, including backdoor attacks, adversarial attacks, and wrong predictions. Our experimental results demonstrate that INNER can effectively repair neural networks, by ensuring accuracy, fairness, and robustness. Moreover, the performance of other repair methods can be also improved by re-using the fault neurons found by INNER, justifying the generality of the proposed approach.

Keywords:
Interpretability Computer science Artificial neural network Artificial intelligence

Metrics

1
Cited By
0.64
FWCI (Field Weighted Citation Impact)
29
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Explainable Artificial Intelligence (XAI)
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Neural Network Interpretability

Andre YeZian Wang

Apress eBooks Year: 2022 Pages: 771-792
JOURNAL ARTICLE

Causality-based neural network repair

Bing SunJun SunLong Hoang PhamJie Shi

Journal:   Proceedings of the 44th International Conference on Software Engineering Year: 2022 Pages: 338-349
JOURNAL ARTICLE

Deep neural network compression through interpretability-based filter pruning

Kaixuan YaoFeilong CaoYee LeungJiye Liang

Journal:   Pattern Recognition Year: 2021 Vol: 119 Pages: 108056-108056
© 2026 ScienceGate Book Chapters — All rights reserved.