WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Xuan Wang; Chao Wang; Jing Cao; Lei Gong; Xuehai Zhou

doi:10.1109/tcad.2020.3012323

ScienceGate Book Chapters

JOURNAL ARTICLE

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Xuan Wang Chao Wang Jing Cao Lei Gong Xuehai Zhou

Year: 2020 Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol: 39 (11)Pages: 4290-4302 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tcad.2020.3012323

Get Full-Text PDF Get Analytical Report

Abstract

In recent years, a variety of accelerators on FPGAs have been proposed to speed up the convolutional neural network (CNN) in many domain-specific application fields. Besides, some optimization algorithms, such as fast algorithms and network sparsity, have greatly reduced the theoretical computational workload of CNN inference. There are currently a few accelerators on FPGAs that support both the fast Winograd algorithm (WinoA) and network sparsity to minimize the amount of computation. However, on the one hand, these architectures feed data into processing elements (PEs) in units of blocks, some boundary losses caused by sparse irregularities cannot be avoided. On the other hand, these works have not discussed the design space exploration under the sparse condition. In this article, we propose a novel accelerator called WINONN. We fully discuss the challenges faced by supporting WinoA, weight sparsity, and activation sparsity simultaneously. To minimize the online encoding overhead caused by activation sparsity, an efficient encoding format called multibit mask (MBM) is proposed. To handle the irregularities of sparse data, we proposed a novel Scatter-Compute-Gather method in hardware design, combined with a freely sliding buffer to achieve fine-grained data loading to minimize the boundary waste. Finally, we combine a theoretical analysis and experimental method to explore the design space, allowing WINONN to get the best performance on a specific FPGA. Our high scalability design enables us to deploy sparse Winograd accelerators on very small embedded FPGAs, which is not supported in previous works. The experimental results on VGG16 show that we achieve the highest digital signal processing unit (DSP) efficiency and highest energy efficiency compared with the state-of-the-art sparse architectures.

Keywords:

Computer science Field-programmable gate array Scalability Overhead (engineering) Convolutional neural network Encoding (memory) Sparse matrix Design space exploration Algorithm Computer engineering Parallel computing Computer hardware Embedded system Artificial intelligence

Metrics

Cited By

3.15

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

CCD and CMOS Imaging Sensors

Physical Sciences → Engineering → Electrical and Electronic Engineering

Advanced Memory and Neural Computing

Physical Sciences → Engineering → Electrical and Electronic Engineering

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Abstract

Metrics

Citation History

Topics

Related Documents

WinoNN: optimising FPGA-based neural network accelerators using fast winograd algorithm (work-in-progress)

Work-in-Progress: WinoNN: Optimising FPGA-based Neural Network Accelerators using Fast Winograd Algorithm

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators

Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm