JOURNAL ARTICLE

Convolutional Neural Networks with Generalized Attentional Pooling for Action Recognition

Abstract

Inspired by the recent advance in attentional pooling techniques in image classification and action recognition tasks, we propose the Generalized Attentional Pooling (GAP) based Convolutional Neural Network (CNN) algorithm for action recognition in still images. The proposed GAP-CNN can be formulated as a new approximation of the second-order/bilinear pooling techniques widely used in fine-grained image classification. Unlike the existing rank-1 approximation, a generalized factoring (with non-linear functions) is introduced to exploit the intrinsic structural information of the sample covariance matrices of convolutional layer outputs. Without requiring preprocessing steps such as object (e.g., human body) bounding boxes detection, the proposed GAP-CNN automatically focuses on the most informative part in still images. With the additional guidance of keypoints of human pose, the proposed GAP-CNN algorithm achieves the state-of-the-art action recognition accuracy on the large-scale MPII still image dataset.

Keywords:
Convolutional neural network Computer science Pooling Artificial intelligence Pattern recognition (psychology) Covariance Contextual image classification Bilinear interpolation Feature extraction Bounding overwatch Preprocessor Image (mathematics) Computer vision Mathematics

Metrics

4
Cited By
0.29
FWCI (Field Weighted Citation Impact)
26
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Hand Gesture Recognition Systems
Physical Sciences →  Computer Science →  Human-Computer Interaction
© 2026 ScienceGate Book Chapters — All rights reserved.