Hai LiuCheng ZhangBochen XieTingting LiuQingsong XuYoufu Li
Fine-grained Bird Image Recognition dedicates to achieve accurate bird image classification and it is also a fundamental task in robot vision tracking. Given that endangered bird surveillance and conservation is of great significance for bird protection from extinction, automated approaches are in need to facilitate bird surveillance. In this work, we propose a novel robot vision tracking based method for bird surveillance with an affinity relation-aware model named TBNet that combines CNN and Transformer architecture and has a novel feature selection (FS) module. Specifically, CNN is employed to extract superficial information. Transformer is utilized for exploiting abstract semantic affinity relations. FS module is introduced to reveal discriminative features. Comprehensive experiments demonstrate that can achieve state-of-the-art performance on the CUB-200-2011 dataset (91.0%) and the NABirds dataset (90.9%).
Edwin Arkel RiosMin-Chun HuBo‐Cheng Lai
Mingjian YangYixin LiZhuo SuFan Zhou
Shijie WangHaojie LiZhihui WangWanli Ouyang