Deep learning for fine-grained visual recognition

Teng Li

doi:10.4225/55/595c7aa1a62bf

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep learning for fine-grained visual recognition

Teng Li

Year: 2017 Journal: Adelaide Research & Scholarship (AR&S) (University of Adelaide) Publisher: University of Adelaide

DOI: 10.4225/55/595c7aa1a62bf

Get Full-Text PDF Get Analytical Report

Abstract

Fine-grained object recognition is an important task in computer vision. The cross-convolutional-layer pooling method is one of the significant milestones in the development of this field in recent years. Based on the method, we conducted a number of experiments on a new fine-grained car dataset - CompCars. The corresponding experiments illustrate its applicability and effectiveness on this newly- designed dataset. Meanwhile, based on the experiments, we found out that pooling the most distinguishable regions like car logos and headlights areas in the indicator maps, which usually have higher activations, with the local features in the same regions can achieve better results than those by pooling the whole indicator maps with the corresponding local features. Therefore, we conjecture that better performance may be achieved if we have more powerful indicator maps or pooling channels that can better highlight these distinguishable regions. Based on the above hypothesis and inspired by the cross-convolutional-layer pooling, next we propose the Spatially Weighted Pooling (SWP) method, which is a simple yet effective pooling strategy to improve fine-grained classification performance. SWP learns a dozen of pooling channels or spatial encoding masks that aggregate local convolutional feature maps with learned spatial importance information and produce more discriminative features. It can be seamlessly integrated into existing convolutional neural network (CNN) architectures such as the deep residual network. It also allows end-to-end training. SWP has few parameters to learn, usually in several hundreds, therefore does not introduce much computational overhead. SWP has shown significant capability to improve fine-grained visual recognition performance by simply adding it before fully-connected layers in off-the-shelf deep convolutional networks. We have conducted comprehensive experiments on a number of widely-used fine-grained datasets with a variety of deep CNN architectures such as Alex networks (AlexNet), VGG networks (VGGNet) and the deep residual networks (ResNet). By integrating SWP into ResNet (ResNet-SWP), we achieve state-of-the-art results on three fine-grained datasets and the MIT67 indoor scene recognition dataset. With ResNet152-SWP models, we obtain 85:2% on the bird dataset CUB-200-2011 without bounding-box annotations and 87:4% with bounding-box, 91:2% on FGVC-aircraft, 94:1% on Stanford-cars with bounding-box information and 82:5% on the MIT67 dataset.

Keywords:

Artificial intelligence Computer science Computer vision

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.13

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Deep learning for fine-grained visual recognition

Abstract

Metrics

Topics

Related Documents

Visual fine-grained recognition

Deep Learning-Based Fine-Grained Image Recognition

Deep multi-context Network for FINE-GRAINED VISUAL RECOGNITION

Learning Deep Representations of Fine-Grained Visual Descriptions

Fine-Grained Visual Computing Based on Deep Learning