Fine-grained image classification (FGIC) aims to identify subtle visual differences among subcategories, which is challenging due to the small inter-class variances. Existing methods recognize subcategories mainly by locating discriminative parts which exists in the regions with high responses in deep feature maps. However, the regions with high responses in deep feature maps correspond to large receptive fields in the input image, leading to the result that subtle visual differences among subcategories cannot be captured precisely. In this paper we propose a novel Cross-Granularity Fusion Network (CGFN), which excavates subtle yet discriminative granularity features within each part and captures potential interactions among granularity features to build powerful part feature representations. The CGFN consists of two modules: First, the Multi-Granularity Proposal (MGP) module locates diverse and discriminative parts and focuses context-complementary granularities across different hierarchies within each part. Second, a Cross-Granularity Fusion (CGF) module is developed by fusing granularity features to acquire robust part features for the final classification. We conduct a series of experiments on publicly available datasets i.e., CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets and experimental results demonstrate that the CGFN achieves state-of-the-art performance.
Jiabao WangYang LiHang LiXun ZhaoRui ZhangZhuang Miao
Shenghe WuJun HuChen SunFujin ZhongQinghua ZhangGuoyin Wang
Yang XuShanshan WuBiqi WangMing–Hsuan YangZebin WuYazhou YaoZhihui Wei
Zhiwen ZhengJuxiang ZhouJianhou GanSen LuoWei Gao
Ying YuHong TangJin QianZhiliang ZhuZhen CaiJingqin Lv