Ardian Dwi CTrio AdionoNana Sutisna
Developments of hardware accelerators for deep learning has increased rapidly due to the demand for flexibility to be applied to various deep learning architectures. The architecture that is widely marketed in recent years is GPU-based architecture where developers meet many difficulties for accelerators to be applied to different architectures. In this paper, the author will design an FPGA-based accelerator that will be used to handle processes in the convolution layer of Convolutional Neural Network (CNN). The system is designed with a base clock of 10 ns capable of providing a throughput of 1Gbyte/sec. The test results using a kernel with a size of 3 × 3 completed with an iteration time of 2683.92 us with a latency of 7930 ns. Furthermore, testing using a 2×2 kernel was completed with an iteration time of 2643 us with a latency of 5930 ns. The use of accelerators to complete the convolution process is proven to speed up the process of completing the convolution process with time difference of up to ±1400 us compared to the process carried out in Matlab.
Y. Sravya MounikaB B ShabarinathPrajwal Rao
Wan DuShang-Zhi ChenLei WangRifai Chai