Towards deep neural networks robust to adversarial examples

Alexander Matyasko

doi:10.32657/10356/143316

ScienceGate Book Chapters

DISSERTATION

Towards deep neural networks robust to adversarial examples

Alexander Matyasko

Year: 2020

DOI: 10.32657/10356/143316

Get Full-Text PDF Get Analytical Report

Abstract

Deep learning has become the dominant approach for any problem where learning from data is necessary, e.g. recognizing objects, understanding natural language. If the data is the "nail", then deep learning is the "hammer". Nevertheless, state-of-the-art deep neural networks are prone to small perturbations in the input data. For example, recent experiments have shown that adding adversarial noise to inputs creates images that are optically indistinguishable from the original data, but the neural network misclassifies it with high confidence. These adversarially crafted modifications of the input, so-called adversarial examples, are neural network ``blind spots'' and is the main subject of this dissertation. In this thesis, we outline the problem of adversarial examples and show several partial solutions to it. The existence of adversarial examples has spurred significant interest in deep learning research. The research on adversarial examples can be broadly divided into research on attacks and research on defenses. We make original contributions to both fields of research. First of all, we establish a connection between classifier margin and its robustness. We generalize a support vector machine (SVM) margin maximization objective to deep neural networks. We also prove that our formulation is equivalent to robust optimization. In the subsequent work, we suggest that ideally, adversarial examples for the robust classifier should be indistinguishable from the regular data. Unlike approaches based on robust optimization, we do not require that an input noise does not change the label of the input. We formulate a problem of learning robust classifier in the framework of generative adversarial networks (GAN), where an auxiliary network, or an adversary discriminator, is trained to distinguish regular and adversarial data. Then, a robust classifier is trained to classify the original inputs correctly and to fool the discriminator with its adversarial examples. Finally, accurately estimating the model's robustness is a challenging task. Existing attack methods require multiple restarts or do not explicitly minimize the norm of the perturbation. To address the above limitations, we propose a primal-dual proximal gradient attack algorithm. Our attack is fast and accurate. We directly solve the attacker's problem for any Lp-norm for which the proximal operator can be computed in the closed-form. In summary, we present two defenses and one white-box attack in this thesis. Future efforts should address the robustness of deep neural networks to unrestricted adversarial examples, provide strong theoretical guarantees on the model's performances, and verify model robustness for a comprehensive comparison of various defenses.

Keywords:

Adversarial system Deep neural networks Artificial intelligence Artificial neural network Computer science Deep learning Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

129

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Integrated Circuits and Semiconductor Failure Analysis

Physical Sciences → Engineering → Electrical and Electronic Engineering

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Towards deep neural networks robust to adversarial examples

Abstract

Metrics

Citation History

Topics

Related Documents

Towards robust adversarial examples for deep neural networks

ARGAN: Adversarially Robust Generative Adversarial Networks for Deep Neural Networks Against Adversarial Examples

Towards robustifying deep neural networks against adversarial, fringe and distorted examples

Compound adversarial examples in deep neural networks

Simplicial-Map Neural Networks Robust to Adversarial Examples