JOURNAL ARTICLE

Spatial Pyramid Attention for Deep Convolutional Neural Networks

Xu MaJingda GuoAndrew SansomMara McGuireAndrew KalaaniQi ChenSihai TangQing YangSong Fu

Year: 2021 Journal:   IEEE Transactions on Multimedia Vol: 23 Pages: 3048-3058   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available. 1 [Online]. Available: https://github.com/13952522076/SPANet_TMM

Keywords:
Computer science Pooling Artificial intelligence Pyramid (geometry) Convolutional neural network Benchmark (surveying) Margin (machine learning) Overhead (engineering) Computation Feature (linguistics) Pattern recognition (psychology) Exploit Deep learning Feature learning Machine learning Algorithm

Metrics

49
Cited By
3.27
FWCI (Field Weighted Citation Impact)
79
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and ELM
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Spatial deep convolutional neural networks

Qi WangPaul A. ParkerRobert Lund

Journal:   Spatial Statistics Year: 2025 Vol: 66 Pages: 100883-100883
JOURNAL ARTICLE

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Kaiming HeXiangyu ZhangShaoqing RenJian Sun

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2015 Vol: 37 (9)Pages: 1904-1916
© 2026 ScienceGate Book Chapters — All rights reserved.