JOURNAL ARTICLE

Hardware Efficient Convolution Processing Unit for Deep Neural Networks

Abstract

Convolutional Neural Network (CNN) is a type of deep neural networks that are commonly used for object detection and classification. State-of-the-art hardware for training and inference of CNN architectures require a considerable amount of computation and memory intensive resources. CNN achieves greater accuracy at the cost of high computational complexity and large power consumption. To optimize the memory requirement, processing speed and power, it is crucial to design more efficient accelerator architecture for CNN computation. In this work, an overlap of spatially adjacent data is exploited in order to parallelize the movement of data. A fast, re-configurable hardware accelerator architecture along with optimized kernel design suitable for a variety of CNN models is proposed. Our design achieves 2.1x computational benefits over state-of the-art accelerator architectures.

Keywords:
Computer science Convolutional neural network Kernel (algebra) Convolution (computer science) Computation Hardware acceleration Deep learning Inference Artificial intelligence Computer engineering Computational complexity theory Computer architecture Artificial neural network Computer hardware Field-programmable gate array Algorithm

Metrics

4
Cited By
0.32
FWCI (Field Weighted Citation Impact)
15
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.