JOURNAL ARTICLE

SupCon-ViT: Supervised contrastive learning for ultra-fine-grained visual categorization

Abstract

With the increasing availability of datasets exhibiting fine granularity and subtle differences between categories, fine-grained visual categorization tasks have gained significant attention across various domains. However, the focus often lies solely on overall dataset performance metrics such as top-l accuracy, while lacking a comprehensive understanding of the underlying factors. This paper addresses this gap by presenting a detailed analysis of the CUB-200-2011 dataset through extensive experiments. We identify and investigate specific ultra-fine-grained subsets that significantly impact the overall accuracy of the dataset. To enhance the performance of ultra-fine-grained visual classification, we propose SupCon-ViT, an ultra-fine-grained visual categorization network based on supervised contrastive learning. The key component of our approach is a supervised contrastive learning module, which effectively guides the network to learn discriminative local features within samples. This is accomplished by continuously pulling closer the normalized embeddings from the same class and pushing away embeddings from different classes. As a result, our approach achieves discriminative local representations, leading to improved network classification performance. Experimental results demonstrate the effectiveness of our proposed method on four ultra-fine-grained subsets of the CUB dataset. Notably, our approach achieves significant performance improvements without requiring additional expert information during training. This work contributes to the broader understanding of fine-grained visual categorization and offers a practical solution to enhance the accuracy of ultrafine-grained visual classification tasks.The code is available at https://github.comnucinda01ove/SupCon-ViT-pytorch.

Keywords:
Discriminative model Categorization Computer science Granularity Artificial intelligence Machine learning Class (philosophy) Deep learning Pattern recognition (psychology)

Metrics

1
Cited By
0.26
FWCI (Field Weighted Citation Impact)
44
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

DISSERTATION

Ultra-Fine-Grained Visual Categorization

Xiaohan Yu

University:   Griffith Research Online (Griffith University, Queensland, Australia) Year: 2021
JOURNAL ARTICLE

Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples

Ziye FangXin JiangHao TangZechao Li

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2024 Vol: 34 (8)Pages: 7135-7148
JOURNAL ARTICLE

Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization

Xiaohan YuJun WangYang ZhaoYongsheng Gao

Journal:   Pattern Recognition Year: 2022 Vol: 135 Pages: 109131-109131
© 2026 ScienceGate Book Chapters — All rights reserved.