JOURNAL ARTICLE

Deep learning for fine-grained visual recognition

Teng Li

Year: 2017 Journal:   Adelaide Research & Scholarship (AR&S) (University of Adelaide)   Publisher: University of Adelaide

Abstract

Fine-grained object recognition is an important task in computer vision. The cross-convolutional-layer pooling method is one of the significant milestones in the development of this field in recent years. Based on the method, we conducted a number of experiments on a new fine-grained car dataset - CompCars. The corresponding experiments illustrate its applicability and effectiveness on this newly- designed dataset. Meanwhile, based on the experiments, we found out that pooling the most distinguishable regions like car logos and headlights areas in the indicator maps, which usually have higher activations, with the local features in the same regions can achieve better results than those by pooling the whole indicator maps with the corresponding local features. Therefore, we conjecture that better performance may be achieved if we have more powerful indicator maps or pooling channels that can better highlight these distinguishable regions. Based on the above hypothesis and inspired by the cross-convolutional-layer pooling, next we propose the Spatially Weighted Pooling (SWP) method, which is a simple yet effective pooling strategy to improve fine-grained classification performance. SWP learns a dozen of pooling channels or spatial encoding masks that aggregate local convolutional feature maps with learned spatial importance information and produce more discriminative features. It can be seamlessly integrated into existing convolutional neural network (CNN) architectures such as the deep residual network. It also allows end-to-end training. SWP has few parameters to learn, usually in several hundreds, therefore does not introduce much computational overhead. SWP has shown significant capability to improve fine-grained visual recognition performance by simply adding it before fully-connected layers in off-the-shelf deep convolutional networks. We have conducted comprehensive experiments on a number of widely-used fine-grained datasets with a variety of deep CNN architectures such as Alex networks (AlexNet), VGG networks (VGGNet) and the deep residual networks (ResNet). By integrating SWP into ResNet (ResNet-SWP), we achieve state-of-the-art results on three fine-grained datasets and the MIT67 indoor scene recognition dataset. With ResNet152-SWP models, we obtain 85:2% on the bird dataset CUB-200-2011 without bounding-box annotations and 87:4% with bounding-box, 91:2% on FGVC-aircraft, 94:1% on Stanford-cars with bounding-box information and 82:5% on the MIT67 dataset.

Keywords:
Artificial intelligence Computer science Computer vision

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.13
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

DISSERTATION

Visual fine-grained recognition

Marcel Simon

University:   Thüringer Universitäts- und Landesbibliothek Year: 2019
JOURNAL ARTICLE

Deep Learning-Based Fine-Grained Image Recognition

Yaning Yang

Journal:   Advances in Engineering Technology Research Year: 2024 Vol: 11 (1)Pages: 780-780
JOURNAL ARTICLE

Fine-Grained Visual Computing Based on Deep Learning

Zhihan LvLiang QiaoAmit Kumar SinghQingjun Wang

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2021 Vol: 17 (1s)Pages: 1-19
© 2026 ScienceGate Book Chapters — All rights reserved.