Convolutional neural networks (CNNs) have achieved great success in many applications. Recently, various FPGA-based accelerators have been proposed to improve the performance of CNNs. However, current most FPGA-based methods only use the same bit-width selection for all CNN layers which lead to very low resource utilization and difficulty in further performance improvement. In this paper, we propose a bit-width adaptive accelerator design approach which can adapt to the CNN layers with various bit-width requirements in a same network. We construct multiple different bit-width convolutional processors to compute the CNN layers in parallel way. We partition the FPGA DSP resources and use our optimization approach to find the optimal resource allocation. On a Xilinx Virtex-7 FPGA, our design approach achieves higher throughput than the state-of-the-art FPGA-based CNN accelerators from 5.48× to 7.25× and by 6.20× on average, when we evaluate the convolutional layers of AlexNet and deeper VGG CNNs.
Youyao LiuFengyi MiaoXiong XiaoZetian Zhang
Zhezhi HeShaahin AngiziDeliang Fan
Mingyong ZhuangXinhui LiaoHuhong WuJianyang ZhouZichao Guo
Di WangTiebin WuHongbing TanHongliang Li