Deep learning classifiers are now known to have flaws in the representations\nof their class. Adversarial attacks can find a human-imperceptible perturbation\nfor a given image that will mislead a trained model. The most effective methods\nto defend against such attacks trains on generated adversarial examples to\nlearn their distribution. Previous work aimed to align original and adversarial\nimage representations in the same way as domain adaptation to improve\nrobustness. Yet, they partially align the representations using approaches that\ndo not reflect the geometry of space and distribution. In addition, it is\ndifficult to accurately compare robustness between defended models. Until now,\nthey have been evaluated using a fixed perturbation size. However, defended\nmodels may react differently to variations of this perturbation size. In this\npaper, the analogy of domain adaptation is taken a step further by exploiting\noptimal transport theory. We propose to use a loss between distributions that\nfaithfully reflect the ground distance. This leads to SAT (Sinkhorn Adversarial\nTraining), a more robust defense against adversarial attacks. Then, we propose\nto quantify more precisely the robustness of a model to adversarial attacks\nover a wide range of perturbation sizes using a different metric, the Area\nUnder the Accuracy Curve (AUAC). We perform extensive experiments on both\nCIFAR-10 and CIFAR-100 datasets and show that our defense is globally more\nrobust than the state-of-the-art.\n
Akshat JainSanskar AgarwalArmaan PareekVanshika Singh
Bin WangQIAN Yaguan CHEN Liang
Linyang LiDemin SongXipeng Qiu