JOURNAL ARTICLE

Category-specific Semantic Coherency Learning for Fine-grained Image Recognition

Abstract

Existing deep learning based weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature (HLF) maps directly. However, as HLF maps are derived based on spatial aggregation of convolution which is basically a pattern matching process that applies fixed filters, it is ineffective to model visual contents of same semantic but varying posture or perspective. We argue that this will cause the selected discriminative regions of same sub-category are not semantically corresponding and thus degrade the WFGIR performance. To address this issue, we propose an end-to-end Category-specific Semantic Coherency Network (CSC-Net) to semantically align the discriminative regions of the same subcategory. Specifically, CSC-Net consists of: 1) Local-to-Attribute Projecting Module (LPM), which automatically learns a set of latent attributes via collecting the category-specific semantic details while eliminating the varying spatial distributions from the local regions. 2) Latent Attribute Aligning (LAA), which aligns the latent attributes to specific semantic via graph convolution based on their discriminability, to achieve category-specific semantic coherency; 3) Attribute-to-Local Resuming Module (ARM), which resumes the original Euclidean space of latent attributes and construct latent attribute aligned feature maps by a location-embedding graph unpooling operation. Finally, the new feature maps are used which applies the category-specific semantic coherency implicitly for more accurate discriminative regions localization. Extensive experiments verify that CSC-Net yields the best performance under the same settings with most competitive approaches, on CUB Bird, Stanford-Cars, and FGVC Aircraft datasets.

Keywords:
Discriminative model Computer science Artificial intelligence Pattern recognition (psychology) Feature (linguistics) Graph Semantics (computer science) Probabilistic latent semantic analysis Set (abstract data type) Theoretical computer science

Metrics

22
Cited By
1.68
FWCI (Field Weighted Citation Impact)
35
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.