Convolutional neural networks (CNNs) are widely used in practical scenarios such as license plate recognition, face recognition and radar image detection, where the main accelerators used are GPU platforms with high power output. However, FPGA is more suitable to be used in some application scenarios because of its own flexibility and a large number of computing resources, and it has lower energy consumption than GPU. In deploying large neural network models, pure hardware development requires a long time cycle. Our work chooses to use a High Level Synthesis tool based on OpenCL for development, which significantly improves the development efficiency and enables fast implementation of the model. The winograd algorithm is also used in the convolutional kernel module to accelerate the convolutional operation. The final verification is completed on the FPGA development board DE5a-Net, where we process a 544*544 format image in 149ms, as well as achieving a peak performance of 248.7 GOP/S.
Anrong YangYuanhui LiHongqiao ShuJianlin DengChuanzhao MaZheng LiQigang Wang
Hasitha Muthumala WaidyasooriyaMasanori HariyamaKunio Uchiyama