Fine-grained visual classification (FGVC) is a tough task due to its high annotation cost of the fine-grained subcategories. To build a large-scale dataset at low manual cost, straightforwardly learning from web images for FGVC has attracted broad attention. However, there exist two characteristics in the need of concerning for the web dataset: 1) Noisy images; 2) A large proportion of hard examples. In this paper, we propose a simple yet effective approach to deal with noisy images and hard examples during training. Our method is a pure web-supervised method for FGVC. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to the state-of-the-art web-supervised methods. The data and source code of this work have been posted available at: https://github.com/NUST-Machine-Intelligence-Laboratory/WSNFG.
Chuanyi ZhangYazhou YaoHuafeng LiuGuo-Sen XieXiangbo ShuTianfei ZhouZheng ZhangFumin ShenZhenmin Tang
Yifeng DingZhanyu MaShaoguo WenJiyang XieDongliang ChangZhongwei SiMing WuHaibin Ling
Zhenhuan HuangXiaoyue DuanBo ZhaoJinhu LüBaochang Zhang