A Fine-Grained Sparse Accelerator for Multi-Precision DNN

Shulin Zeng; Yujun Lin; Shuang Liang; Junlong Kang; Dongliang Xie; Yi Shan; Song Han; Yu Wang; Huazhong Yang

doi:10.1145/3289602.3293964

ScienceGate Book Chapters

JOURNAL ARTICLE

A Fine-Grained Sparse Accelerator for Multi-Precision DNN

Shulin Zeng Yujun Lin Shuang Liang Junlong Kang Dongliang Xie Yi Shan Song Han Yu Wang Huazhong Yang

Year: 2019 Pages: 185-185

DOI: 10.1145/3289602.3293964

Get Full-Text PDF Get Analytical Report

Abstract

Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers proposed model compression algorithms using sparsification and quantization, along with specific hardware architecture designs, to accelerate various applications. However, the irregularity of memory access caused by the sparsity severely damages the regularity of intensive computation loops. Therefore, the architecture design for sparse neural networks is crucial to better software and hardware co-design for neural network applications. To face these challenges, this paper first analyzes the computation patterns of different NN structures and unify them into the form of sparse matrix-vector multiplication, sparse matrix-matrix multiplication, and element-wise multiplication. On the basis of the EIE which supports only the fully-connected network and recurrent neural network (RNN), we expand it to support the convolution neural network (CNN) using the input vector transform unit. This paper designs a multi-precision multiplier with supporting datapath, which makes the proposed architecture have a better acceleration effect in the low-bit quantization with the same hardware architecture. The proposed accelerator architecture can achieve the equivalent performance and energy efficiency up to 574.2 GOPS, 42.8 GOPS/W for CNN and 110.4 GOPS, 8.24 GOPS/W for RNN under 4-bit quantization on Xilinx XCKU115 FPGA running at 200MHz. And it is the state-of-the-art accelerator supporting CNN-RNN-based models like the long-term recurrent convolutional network with 571.1 GOPS performance and 42.6 GOPS/W energy efficiency under 4-bit data format.

Keywords:

Computer science Datapath Hardware acceleration Convolutional neural network Field-programmable gate array Multiplication (music) Quantization (signal processing) Recurrent neural network Computation Artificial neural network Parallel computing Network architecture Sparse matrix Computer engineering Computer hardware Artificial intelligence Algorithm

Metrics

Cited By

0.61

FWCI (Field Weighted Citation Impact)

Refs

0.74

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Machine Learning and ELM

Physical Sciences → Computer Science → Artificial Intelligence

Neural Networks and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Power Systems and Renewable Energy

Physical Sciences → Energy → Energy Engineering and Power Technology

A Fine-Grained Sparse Accelerator for Multi-Precision DNN

Abstract

Metrics

Citation History

Topics

Related Documents

A precision-scalable sparse CNN accelerator with fine-grained mixed bitwidth configurability

FSA: A Fine-Grained Systolic Accelerator for Sparse CNNs

A Fine-grained Sparse Neural Network Accelerator for Image Classification

Thread: Towards fine-grained precision reconfiguration in variable-precision neural network accelerator

An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator With Fine-Grained Mixed Precision of FP8–FP16