Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Chen Zhang; Peng Li; Guangyu Sun; Yijin Guan; Bingjun Xiao; Jason Cong

doi:10.1145/2684746.2689060

ScienceGate Book Chapters

JOURNAL ARTICLE

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Chen Zhang Peng Li Guangyu Sun Yijin Guan Bingjun Xiao Jason Cong

Year: 2015 Pages: 161-170

DOI: 10.1145/2684746.2689060

Get Full-Text PDF Get Analytical Report

Abstract

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

Keywords:

Reconfigurability Computer science Field-programmable gate array Convolutional neural network Memory bandwidth Scalability Deep learning Throughput Bandwidth (computing) Computer architecture Design space exploration Computer engineering Embedded system Artificial intelligence Computer hardware Wireless

Metrics

2076

Cited By

104.26

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

CCD and CMOS Imaging Sensors

Physical Sciences → Engineering → Electrical and Electronic Engineering

Neural Networks and Reservoir Computing

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Memory and Neural Computing

Physical Sciences → Engineering → Electrical and Electronic Engineering

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Abstract

Metrics

Citation History

Topics

Related Documents

Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks

Composite FPGA-based Accelerator for Deep Convolutional Neural Networks

FPGA-Based Hardware Accelerator Design for Convolutional Neural Networks

Optimizing FPGA-based convolutional neural networks accelerator for image super-resolution

An FPGA-based accelerator implementation for deep convolutional neural networks