Adu Asare BaffourZhen QinYong J. WangZhiguang QinKim‐Kwang Raymond ChooZhiguang QinKim-Kwang Raymond Choo
The underlining task for fine-grained image recognition captures both the inter-class and intra-class discriminate features. Existing methods generally use auxiliary data to guide the network or a complex network comprising multiple sub-networks. They have two significant drawbacks: (1) Using auxiliary data like bounding boxes requires expert knowledge and expensive data annotation. (2) Using multiple sub-networks make network architecture complex and requires complicated training or multiple training steps. We propose an end-to-end Spatial Self-Attention Network (SSANet) comprising a spatial self-attention module (SSA) and a self-attention distillation (Self-AD) technique. The SSA encodes contextual information into local features, improving intra-class representation. Then, the Self-AD distills knowledge from the SSA to a primary feature map, obtaining inter-class representation. By accumulating classification losses from these two modules enables the network to learn both inter-class and intra-class features in one training step. The experiment findings demonstrate that SSANet is effective and achieves competitive performance.
Guangyu ZhaoZhenlun SunYahui Liu
Lei HuangAn ChenXiaodong WangLeon Bevan BullockZhiqiang Wei
Hao LiuShenglan LiuLin FengLianyu HuXiang LiHeyu Fu
Wei ShanDan HuangJiangtao WangFeng ZouSuwen Li
Mingjie HuangXiyan SunYuanfa JiXiaomao ChenSuqing Yan