Existing weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature maps directly. We discover that due to the operation of stacking local receptive filed, Convolutional Neural Network causes the discriminative region diffusion in high-level feature maps, which leads to inaccurate discriminative region localization. In this paper, we propose an end-to-end Discriminative Feature-oriented Gaussian Mixture Model (DF-GMM), to address the problem of discriminative region diffusion and find better fine-grained details. Specifically, DF-GMM consists of 1) a low-rank representation mechanism (LRM), which learns a set of low-rank discriminative bases by Gaussian Mixture Model (GMM) to accurately select discriminative details and filter more irrelevant information in high-level semantic feature maps, 2) a low-rank representation reorganization mechanism (LR2M) which resumes the space information of low-rank discriminative bases to reconstruct the low-rank feature maps. By recovering the low-rank discriminative bases into the same embedding space of highlevel feature maps, LR2M alleviates the discriminative region diffusion problem in high-level feature map and discriminative regions can be located more precisely on the new low-rank feature maps. Extensive experiments verify that DF-GMM yields the best performance under the same settings with the most competitive approaches, in CUBBird, Stanford-Cars datasets, and FGVC Aircraft.
Zhihui WangShijie WangPengbo ZhangHaojie LiWei ZhongJianjun Li
Xiangteng HeYuxin PengJunjie Zhao
Chenxi LeiLinfeng JiangJingshen JiWeilin ZhongHuilin Xiong
Tiantian YanShijie WangZhihui WangHaojie LiZhongxuan Luo
Zhuhui WangShijie WangHaojie LiZhi DouJianjun Li