Peipei ZhaoSiyan YangWei DingRuyi LiuWentian XinXiangzeng LiuQiguang Miao
Fine-grained visual classification (FGVC) is a very challenging task due to distinguishing subcategories under the same super-category. Recent works mainly localize discriminative image regions and capture subtle inter-class differences by utilizing attention-based methods. However, at the same layer, most attention-based works only consider large-scale attention blocks with the same size as feature maps, and they ignore small-scale attention blocks that are smaller than feature maps. To distinguish subcategories, it is important to exploit small local regions. In this work, a novel multi-scale attention network (MSANet) is proposed to capture large and small regions at the same layer in fine-grained visual classification. Specifically, a novel multi-scale attention layer (MSAL) is proposed, which generates multiple groups in each feature maps to capture different-scale discriminative regions. The groups based on large-scale regions can exploit global features and the groups based on the small-scale regions can extract local subtle features. Then, a simple feature fusion strategy is utilized to fully integrate global features and local subtle features to mine information that are more conducive to FGVC. Comprehensive experiments in Caltech-UCSD Birds-200-2011 (CUB), FGVC-Aircraft (AIR) and Stanford Cars (Cars) datasets show that our method achieves the competitive performances, which demonstrate its effectiveness.
Yaqing HouWenkai ZhangDongsheng ZhouHongwei GeQiang ZhangXiaopeng Wei
Rujia LiJunya LiuZhen YangXin ZhouZhijian Yin
An ChenXiaodong WangZhiqiang WeiKe ZhangLei Huang
Kaifeng DingCungeng YangChengzhuan YangZhonglong Zheng
Fan ZhangMeng LiGuisheng ZhaiYizhao Liu