OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks.
YU Zijian,MA De,YAN Xiaolang,SHEN Juncheng
Li LuoYakun WuFei QiaoYi YangQi WeiXiaobo ZhouYongkai FanShuzheng XuXin-Jun LiuHuazhong Yang
Jincheng ZouQing TangCongcong He
Hong WangXiao ZhangDehui KongGuoning LuDegen ZhenFang ZhuKe Xu