General Matrix Multiply (GEMM), as a fundamental operation in neural network, plays an important role in artificial intelligence and signal processing applications. In this paper, we proposed three SMID RISC-V custom instructions to accelerate GEMM computations, supporting multiple precisions including 32-bit, 16-bit and 8-bit fixed. Furthermore, we implemented address calculation and loop control units along with the GEMM acceleration module to reduce the memory access overhead. These three GEMM custom instructions, along with the near-memory optimization units, were incorporated in the RV-GEMM processor and implemented on the FPGA platform for speedup evaluation. It was also compiled in Synopsys Design Compiler with CMOS 55nm process for hardware overhead estimation. Compared to the baseline RISC-V processor, for GEMM computations under precisions of 32-bit, 16-bit and 8-bit fixed, the RV-GEMM processor achieved speedup ratios of 15.8×, 28.7× and 42.5×. The peak energy efficiency also reached 260 GOPS/W, 420 GOPS/W and 609 GOPS/W, respectively.
Feihong DongLin-Jie JiangCunyang LuanRong SunYongkui Yang
Jordi ForntEnrico ReggianiPau Fontova-MustéNarcís RodasAlessandro PappalardoOsman ÜnsalAdrián CristalJosep AltetFrancesc MollJaume Abella
Seong-Hun JeongJooyeon LeeJaeha Kung
Muhammad Rifqi Daffa SudrajatTrio AdionoInfall Syafalni