Finding photos that fall into the same metaclass's subcategories is half of the goal of fine-grained image retrieval. The primary problem that this kind of approach has always had is that some of the current methods are essentially unable to capture discriminative local features, which is crucial to distinguish visually similar subcategories. It has also always had to deal with issues like significant intra-class differences and small inter-class differences, which are very challenging to resolve. The paper proposes a multi-granularity attentional learning approach that adaptively focuses on and deals with the most distinctive regions and features, while ignoring some less necessary regions to enhance fine-grained retrieval. Specifically, the paper designed three collaborative attention modules: channel attention (for adaptively recalibrating channel feature responses), spatial attention (for highlighting salient areas), and partial attention (for locating key object parts). A large number of attempts on the CUB-200-2011 dataset demonstrate the superiority of the method, which the paper verifies is significantly superior to the baseline method and achieves state-of-the-art retrieval performance. Ablation studies and visualisation further validated the effectiveness and complementarity of the different concerns.
Wei ChenHaoyang XuNan PuYu LiuMingrui LaoWeiping WangLi LiuMichael S. Lew
Yajie GuMingjie WangJianhou GanJun WangYiming ZhaoChuanzhi Zhang
Jieh HsiangWenjun LiuBee-Chung ChenHsieh-Chang Tu