JOURNAL ARTICLE

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Xuan WangChao WangJing CaoLei GongXuehai Zhou

Year: 2020 Journal:   IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol: 39 (11)Pages: 4290-4302   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In recent years, a variety of accelerators on FPGAs have been proposed to speed up the convolutional neural network (CNN) in many domain-specific application fields. Besides, some optimization algorithms, such as fast algorithms and network sparsity, have greatly reduced the theoretical computational workload of CNN inference. There are currently a few accelerators on FPGAs that support both the fast Winograd algorithm (WinoA) and network sparsity to minimize the amount of computation. However, on the one hand, these architectures feed data into processing elements (PEs) in units of blocks, some boundary losses caused by sparse irregularities cannot be avoided. On the other hand, these works have not discussed the design space exploration under the sparse condition. In this article, we propose a novel accelerator called WINONN. We fully discuss the challenges faced by supporting WinoA, weight sparsity, and activation sparsity simultaneously. To minimize the online encoding overhead caused by activation sparsity, an efficient encoding format called multibit mask (MBM) is proposed. To handle the irregularities of sparse data, we proposed a novel Scatter-Compute-Gather method in hardware design, combined with a freely sliding buffer to achieve fine-grained data loading to minimize the boundary waste. Finally, we combine a theoretical analysis and experimental method to explore the design space, allowing WINONN to get the best performance on a specific FPGA. Our high scalability design enables us to deploy sparse Winograd accelerators on very small embedded FPGAs, which is not supported in previous works. The experimental results on VGG16 show that we achieve the highest digital signal processing unit (DSP) efficiency and highest energy efficiency compared with the state-of-the-art sparse architectures.

Keywords:
Computer science Field-programmable gate array Scalability Overhead (engineering) Convolutional neural network Encoding (memory) Sparse matrix Design space exploration Algorithm Computer engineering Parallel computing Computer hardware Embedded system Artificial intelligence

Metrics

45
Cited By
3.15
FWCI (Field Weighted Citation Impact)
41
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
CCD and CMOS Imaging Sensors
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Advanced Memory and Neural Computing
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.