Convolutional neural networks (CNNs) play a major role in image recognition. This paper proposes efficient field programmable gate array (FPGA) implementations of CNN, where the dual core implementation, parallelism in convolution layer, and partial reconfiguration based constant multiplication in convolution layer achieve less cycles, less delay, and less resources respectively. All the proposed designs are implemented on Zed Board (Zynq 7000 FPGA (XC7Z020CLG484-1)) with Xilinx Vivado. The synthesis results show that the proposed parallel convolution based single core implementation of CNN with Modified National Institute of Standards and Technology (MNIST) data set achieves 89% of reduction in delay for 10 tasks as compared with the conventional convolution based single core implementation with a co-processor.
N.L. VenkataramanS. SumithraS. Suresh KumarR. PurushothamanK. KukulavaniV. Gowri
Junye SiJianfei JiangQin WangJia Huang
Hasan IrmakNikolaos AlachiotisDaniel Ziener
Yeong-Kang LaiLing-Cheng Huang