In this paper, we propose a sparse convolutional neural network accelerator design on FPGAs. Similar to the DNNWEAVER architecture, our accelerator uses two-level hierarchy: multiple Processing Units (PUs) and each PU comprises a set of Processing Elements (PEs). To address the irregularity of sparse neural networks, we introduce a novel sparse dataflow for sparse CNN computing as well as weight merging method to balance the computation load on different PUs for better overall efficiency. We implement our design with 32 PUs and 14 PEs in each PU. When compared with DNNWEAVER on VGG16 network, our accelerator achieves 3.49× speedup and 3.05× energy saving on average when running at 150MHz on a Xilinx ZC706 board and reaches the speed of 400 GOPS.
Xiaoru XieJun LinZhongfeng WangJinghe Wei
Liqiang LuJiaming XieRuirui HuangJiansong ZhangWei LinYun Liang
Bosheng LiuXiaohong ChenYinhe HanYing WangJiajun LiHaobo XuXiaowei Li