Field-programmable gate array (FPGA) is a promising choice as a heterogeneous computing component for energy-aware applications in high-performance computing. Emerging high-level synthesis tools such as Intel OpenCL SDK offer a streamlined design flow to facilitate the use of FPGAs for scientists and researchers. Focused on the HACCmk kernel routine as a case study, we explore the kernel optimization space and their performance implications. We describe the resource usage, performance, and performance per watt of the kernel implementations in OpenCL. Using directives for accelerator programming, the performance per watt on an Intel Arria10-based FPGA platform can achieve 2.5X improvement over that on an Intel Xeon 16-core CPU, and 2.1X improvement over that on an Nvidia K80 GPU, while trading off 50% of performance.
Zheming JinIris JohnsonHal Finkel