The high similarity between subcategories and low similarity within subcategories in fine-grained visual classification leads to difficulties in classification. Most existing methods take a single image into a model and perform image classification by locating discriminative regions and learning fine-grained features, ignoring the contrast information between images which also facilitates classification. A method named CIANet was proposed, which includes: (1) channel interaction structure, using the bilinear operation to obtain channel correlation between images, and integrating original features to extract complementary features. (2) The attention boosting and suppression module takes the channel features with the highest weight as the attention map for boosting features and suppressing significant regions of the image, guiding the model to learn more fine-grained differentiating features. CIA-Net does not rely on extra bounding boxes and partial annotations for end-to-end training. Experiments were conducted on three benchmark datasets (CUB-200-2011, FGVC-Aircraft, and Stanford Cars) and the results showed that CIA-Net has higher classification accuracy.
Xiang GuanGuoqing WangXing XuYi Bin
Bo ZhaoXiao WuJiashi FengQiang PengShuicheng Yan
Shuai XuDongliang ChangJiyang XieZhanyu Ma