JOURNAL ARTICLE

Bilinear CNN Models for Fine-Grained Visual Recognition

Abstract

We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only. Using networks initialized from the ImageNet dataset followed by domain specific fine-tuning we obtain 84.1% accuracy of the CUB-200-2011 dataset requiring only category labels at training time. We present experiments and visualizations that analyze the effects of fine-tuning and the choice two networks on the speed and accuracy of the models. Results show that the architecture compares favorably to the existing state of the art on a number of fine-grained datasets while being substantially simpler and easier to train. Moreover, our most accurate model is fairly efficient running at 8 frames/sec on a NVIDIA Tesla K40 GPU. The source code for the complete system will be made available at http://vis-www.cs.umass.edu/bcnn.

Keywords:
Bilinear interpolation Computer science Convolutional neural network Pattern recognition (psychology) Pairwise comparison Artificial intelligence Feature (linguistics) Computation Invariant (physics) Image (mathematics) Feature extraction Categorization Algorithm Computer vision Mathematics

Metrics

2011
Cited By
67.21
FWCI (Field Weighted Citation Impact)
59
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Bilinear Convolutional Neural Networks for Fine-Grained Visual Recognition

Tsung‐Yu LinAruni RoyChowdhurySubhransu Maji

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2017 Vol: 40 (6)Pages: 1309-1322
DISSERTATION

Visual fine-grained recognition

Marcel Simon

University:   Thüringer Universitäts- und Landesbibliothek Year: 2019
© 2026 ScienceGate Book Chapters — All rights reserved.