Meng MengTianzhu ZhangZhe ZhangYongdong ZhangFeng Wu
Weakly supervised object localization (WSOL) aims at localizing objects with only image-level labels, which has better scalability and practicability than fully supervised methods. However, without pixel-level supervision, existing methods tend to generate rough localization maps, which hinders localization performance. To alleviate this problem, we propose an adversarial transformer network (ATNet), which aims to obtain a well-learned localization model with pixel-level pseudo labels. The proposed ATNet enjoys several merits. First, we design an object transformer ( G ) that can generate localization maps and pseudo labels effectively and dynamically, and a part transformer ( D ) to accurately discriminate detailed local differences between localization maps and pseudo labels. Second, we propose to train G and D via an adversarial process, where G can generate more accurate localization maps approaching pseudo labels to fool D . To the best of our knowledge, this is the first work to explore transformers with adversarial training to obtain a well-learned localization model for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our ATNet achieves favorable performance against state-of-the-art WSOL methods. Besides, our adversarial training can provide higher robustness against adversarial attacks.
Xiaolin ZhangYunchao WeiJiashi FengShuicheng YanThomas S. Huang
Sabrina Narimene BenassouWuzhen ShiFeng Jiang
Shakeeb MurtazaMarco PedersoliAydin SarrafÉric Granger
Fu-Cheng PanBeiLei BianBinXu WangYueping YangXiaoming Ju