Md. Syadus SefatSemih AslanJeffrey W KellingtonApan Qasem
This paper introduces a new energy-efficient FPGA accelerator targeting the hotspots in Deep Neural Network (DNN) applications. Our design leverages the Coherent Accelerator Processor Interface (CAPI) which provides a coherent view of system memory to attached accelerators. Our implementation bypasses the need for device driver code and significantly reduces the communication and I/O overhead. Performance is further improved by a tiling transformation that exploits data locality in the computation kernel via the CAPI Power Service Layer (PSL) cache. A new adder tree configuration is proposed which achieves a tunable balance between resource utilization and power consumption. An implementation on a CAPI-supported Kintex FPGA board achieves up to 155 GOPs/s and 15.79 GOPs/watt, improving on the state-of-the-art of FPGA-based DNN implementations.
Esraa AdelRana MagdySara MohamedMona MamdouhEman El MandouhHassan Mostafa
Rajiv BishwokarmaXingguo Xiong
Mengxing ZhaoXiang LiShunyi ZhuZhou Li