DISSERTATION

Towards deep neural networks robust to adversarial examples

Abstract

Deep learning has become the dominant approach for any problem where learning from data is necessary, e.g. recognizing objects, understanding natural language. If the data is the "nail", then deep learning is the "hammer". Nevertheless, state-of-the-art deep neural networks are prone to small perturbations in the input data. For example, recent experiments have shown that adding adversarial noise to inputs creates images that are optically indistinguishable from the original data, but the neural network misclassifies it with high confidence. These adversarially crafted modifications of the input, so-called adversarial examples, are neural network ``blind spots'' and is the main subject of this dissertation. In this thesis, we outline the problem of adversarial examples and show several partial solutions to it. The existence of adversarial examples has spurred significant interest in deep learning research. The research on adversarial examples can be broadly divided into research on attacks and research on defenses. We make original contributions to both fields of research. First of all, we establish a connection between classifier margin and its robustness. We generalize a support vector machine (SVM) margin maximization objective to deep neural networks. We also prove that our formulation is equivalent to robust optimization. In the subsequent work, we suggest that ideally, adversarial examples for the robust classifier should be indistinguishable from the regular data. Unlike approaches based on robust optimization, we do not require that an input noise does not change the label of the input. We formulate a problem of learning robust classifier in the framework of generative adversarial networks (GAN), where an auxiliary network, or an adversary discriminator, is trained to distinguish regular and adversarial data. Then, a robust classifier is trained to classify the original inputs correctly and to fool the discriminator with its adversarial examples. Finally, accurately estimating the model's robustness is a challenging task. Existing attack methods require multiple restarts or do not explicitly minimize the norm of the perturbation. To address the above limitations, we propose a primal-dual proximal gradient attack algorithm. Our attack is fast and accurate. We directly solve the attacker's problem for any Lp-norm for which the proximal operator can be computed in the closed-form. In summary, we present two defenses and one white-box attack in this thesis. Future efforts should address the robustness of deep neural networks to unrestricted adversarial examples, provide strong theoretical guarantees on the model's performances, and verify model robustness for a comprehensive comparison of various defenses.

Keywords:
Adversarial system Deep neural networks Artificial intelligence Artificial neural network Computer science Deep learning Machine learning

Metrics

22
Cited By
0.00
FWCI (Field Weighted Citation Impact)
129
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Integrated Circuits and Semiconductor Failure Analysis
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.