Chih-Hao LinYu-Hsuan TsengPei-Chen WuCheng-Yu HuangMeng-Ying Lai
Fine-grained image recognition aims to accurately distinguish subclass differences within the same major category. However, due to subtle inter-class differences and high annotation costs, it has long been a significant challenge in the field of computer vision. This study innovatively proposes a self-supervised image recognition framework integrating multi-scale attention mechanisms and contrastive learning, enabling efficient and high-quality feature extraction without manual annotation. The method leverages a multi-level attention module to deeply explore both local and global image information. Meanwhile, momentum encoding strategies and data augmentation techniques are used to generate positive and negative sample pairs for contrastive training. Experimental results on standard datasets such as CUB-200-2011 and FGVC-Aircraft show that the proposed method achieves Top-1 recognition accuracies of 89.2% and 87.5%, respectively, demonstrating a significant performance improvement over current mainstream methods.
Chih-Hao LinYu-Hsuan TsengPei-Chen WuCheng-Yu HuangMeng-Ying Lai
JI ShengyuJIANG ZhikangMA XiangYANG Lvxi
Pengsong LiTong LinBingqian ZhouZhiyi Ji