The convolution neural network (CNN) is widely used in many aspects, such as Speech Recognition, Face Detection, and Image Classification. Utilizing a GPU is the traditional way of implementing CNN, which is fast but inefficient. In pursuit of lower power consumption and higher efficiency, we prefer application-specific hardware computing. This paper proposes a run-time reconfigurable CNN accelerator SoC (CNN-AS) architecture embedded in instruction-extended RISC-V. The application-specific extension instruction set is designed to accelerate high-frequency operations in CNN. To optimize the circuit structure, we created an 8-bit dynamic fixed-point (DFP) scheme within the CNN-AS. The accuracy of the DFP implementation is also compared with the TensorFlow float implementation. Furthermore, the corresponding software of RESNET and VGG16 is described and simulated with CNN-AS. Lastly, we compare the overall simulated results with other non-SoC FPGA designs in efficiency, throughput, and power.
Hansen WangDongju LiTsuyoshi Isshiki
Otto SimolaAleksi KorsmanVerneri HirvonenAntti TarkkaJulius HelanderKimmo JärvinenMarko KosunenJussi Ryynänen
Qiang JiaoWei HuFang LiuYong Dong
Nguyen Cong DaoAndrew AttwoodBea HealyDirk Koch