Adversarial attacks have demonstrated the vulnerability of deep neural networks (DNNs), which raises considerable security concerns. The existing attack methods require either prior knowledge of the victim DNNs and labels or frequent model querying. However, these requirements are usually infeasible or time-consuming, leading to suspicions about whether the attacks can be launched in real-world scenarios. To this end, we propose a universally strict black-box attack, which generates adversarial samples only using unlabeled data. It reduces the reliance on external information such as victim models, training processes, and ground-truth labels. Specifically, we first learn a latent manifold using contrastive learning. Then a novel universally adversarial loss is proposed to obtain adversaries directly in the latent space. It utilizes the dissimilarity between samples to craft perturbations without accessing labels and decision boundaries. Moreover, we propose a cluster selection for negative samples to improve the effectiveness of our attack. By evaluating the universally strict black-box attack against baseline models, we find it reaches an average fooling rate of 57.93%, which is on par with transfer-based black-box attacks. Our method shows the threats of adversarial attacks under more practical conditions and could serve as a new benchmark for assessing the robustness of DNNs.
Ayesha SiddiqueKhaza Anuarul Hoque
Nina NarodytskaShiva Prasad Kasiviswanathan
Xueqin ZhangPeng GengWei HongYixuan WangChunhua Gu
Hanxun ZhouZhihui LiuYufeng HuShuo ZhangLongyu KangYong FengYan WangWei GuoCliff C. Zou
Chenxu WangMing ZhangJinjing ZhaoXiaohui Kuang