JOURNAL ARTICLE

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

Abstract

OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks.

Keywords:
Computer science Software portability Compiler Field-programmable gate array Convolutional neural network Kernel (algebra) Deep learning Parallel computing Code (set theory) Workload Computer architecture Embedded system Key (lock) Artificial intelligence Operating system Programming language Set (abstract data type)

Metrics

211
Cited By
18.56
FWCI (Field Weighted Citation Impact)
17
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
© 2026 ScienceGate Book Chapters — All rights reserved.