Bo ZhaoXiao WuJiashi FengQiang PengShuicheng Yan
Fine-grained object classification is a challenging task due to the subtle\ninter-class difference and large intra-class variation. Recently, visual\nattention models have been applied to automatically localize the discriminative\nregions of an image for better capturing critical difference and demonstrated\npromising performance. However, without consideration of the diversity in\nattention process, most of existing attention models perform poorly in\nclassifying fine-grained objects. In this paper, we propose a diversified\nvisual attention network (DVAN) to address the problems of fine-grained object\nclassification, which substan- tially relieves the dependency on\nstrongly-supervised information for learning to localize discriminative regions\ncompared with attentionless models. More importantly, DVAN explicitly pursues\nthe diversity of attention and is able to gather discriminative information to\nthe maximal extent. Multiple attention canvases are generated to extract\nconvolutional features for attention. An LSTM recurrent unit is employed to\nlearn the attentiveness and discrimination of attention canvases. The proposed\nDVAN has the ability to attend the object from coarse to fine granularity, and\na dynamic internal representation for classification is built up by\nincrementally combining the information from different locations and scales of\nthe image. Extensive experiments con- ducted on CUB-2011, Stanford Dogs and\nStanford Cars datasets have demonstrated that the proposed diversified visual\nattention networks achieve competitive performance compared to the state-\nof-the-art approaches, without using any prior knowledge, user interaction or\nexternal resource in training or testing.\n
Daoyuan ChenLei FanShuangshuang WangXiaofan YuBin Kang
Chuanbin LiuHongtao XieZheng-Jun ZhaLingyun YuZhineng ChenYongdong Zhang
Yuxin PengXiangteng HeJunjie Zhao
Rujia LiJunya LiuZhen YangXin ZhouZhijian Yin