Zhefeng ZhuKe QiWenbin ChenYicong ZhouPeiyue LiZhenxian Liu
Aiming at the problem that image recognition based on transformers has low image recognition rate due to ignoring local information of image blocks, an image recognition framework based on multi-scale Feature Fusion Transformer (FFT) is proposed, where the FFT block is designed to fuse feature information of different scales, and the residual attention module is introduced to emphasize feature channels and feature regions of interest. The FFT framework not only avoids the problem of vision transformer internal structure and local information loss of image feature blocks but also captures richer detailed features, which effectively improves the image recognition rate. A large number of experiments are performed on common image recognition datasets Tiny-ImageNet, CIFAR-10 and CIFAR-100, and the recognition accuracy can reach 57.81%, 82.04% and 56.98%, respectively, which are significantly higher than the mainstream image recognition algorithms.
Canlin LiShun SongWenjiao ZhangXinyue Wang
Hongying LiuFuquan ZhangYiqing XuJunling WangHong LuWei WeiJun Zhu
Haini LuoDan XuBing YangHaoyuan Zhang
Carlos Roig MariDavid Varas GonzálezElisenda Bou‐Balust