In order to eliminate the invalid operations caused by the sparsity of the model parameters in the forward process of the Convolution Neural Network(CNN),a dataflow and parallel accelerator system for the sparse neural network are designed based on the Field Programmable Gate array(FPGA).By using a dedicated logic module,the non-zero elements in the feature map matrices and the convolution filter matrices are picked up.Then the valid data is transferred to the array consisting of Digital Signal Processor(DSP) for multiply-accumulate operations. On this basis,all relevant intermediate results are transferred to the adder tree to generate the final output feature map.Meanwhile,the coarse-grained parallelism is implemented along the width,height and output channel of the feature maps,and the optimal design parameters are searched for.Experiments are carried out based on Xilinx FPGAs for verification,and the results show that the design enables the sparse convolution layer in VGG to deliver performance of 678.2 GOPS and energy efficiency of 69.45 GOPS/W,displaying a considerable improvement of performance and energy efficiency compared with FPGA-based accelerators for the dense and sparse networks.
Kavitha Malali Vishveshwarappa GowdaSowmya MadhavanStefano RinaldiB. D. ParameshachariAnitha Atmakur
Xiangzhi XuQi LiuWenjin HuangWenlu PengYihua Huang
YU Zijian,MA De,YAN Xiaolang,SHEN Juncheng
Jincheng ZouQing TangCongcong He