Abstract

Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.

Keywords:
Computer science Field-programmable gate array Kernel (algebra) Hardware acceleration Convolutional neural network CUDA Digital signal processing Parallel computing Bandwidth (computing) Computer hardware Embedded system Artificial intelligence

Metrics

6
Cited By
0.34
FWCI (Field Weighted Citation Impact)
8
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Design and Implementation of OpenCL-Based FPGA Accelerator for YOLOv2

Chenchen CuiFen GeZiyu LiXin YueZhou FangNing Wu

Journal:   2021 IEEE 21st International Conference on Communication Technology (ICCT) Year: 2021 Pages: 1004-1007
JOURNAL ARTICLE

FPGA Accelerator for 3DES Algorithm Based on OpenCL

WU Jianfeng, ZHENG Bowen, NIE Yi, CHAI Zhilei

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2021
JOURNAL ARTICLE

An Efficient OpenCL-Based FPGA Accelerator for MobileNet

Wei LiuLv Peng

Journal:   Journal of Physics Conference Series Year: 2021 Vol: 1883 (1)Pages: 012086-012086
© 2026 ScienceGate Book Chapters — All rights reserved.