Bilinear networks generally only do bilinear pooling operation on the feature vector after the last layer is processed by the convolutional layer of the activation function, and use the fused features for fine-grained image classification. However, this approach ignores the feature information of the middle convolutional layer, which cannot completely describe each semantic part of the image and easily causes the loss of discriminative information of fine-grained categories. In this paper, we propose a fine-grained image classification method based on multilayer feature fusion, in which the ResNet model is selected to replace the two-way VGGNet model in the original bilinear convolutional neural network to improve the feature extraction ability of the network, and the different layer feature outputs extracted from the last three convolutional residual blocks of ResNet are bilinearly pooled, and the three feature vectors are stitched together for multilayer The resulting three feature vectors are stitched together for multilayer feature fusion and then fed to the classifier for classification.
Priti P. VaidyaS. M. Kamalapur
M. SrinivasYen‐Yu LinHong-Yuan Mark Liao