Fine-grained biometric image recognition aims to achieve classification of subclasses by processing detailed features, which is still a critical problem to be solved in computing due to the small differences between subclasses. In recent years, Transformer model, which was originally used in natural language processing, has been applied to computer vision. The transformer model splits the image into patches and calculates the weights between different parts to obtain a better feature representation. In this paper, we propose a model of transformer for fine-grained biometric image recognition. Specifically, in the process of patch coding by the model, our model generates corresponding weights for all patches, and saves corresponding attention scores. To verify the effectiveness of our method, we conducted experiments on the CUB-200-2011 and Stanford Dog datasets.
Zhiyong XiaoGuang DiaoZhaohong Deng
Bo YanSiwei WangEn ZhuXinwang LiuWei Chen
Ying YuWei WeiCairong ZhaoJin QianEnhong Chen