Seong-Hun JeongJooyeon LeeJaeha Kung
Many deep learning accelerators have been proposed and designed in both academia and industry for executing deep neural networks with better power efficiency. Recently, many studies focus on developing a system-on-chip including both host processor and accelerators. In this paper, we demonstrate a full software-hardware stack for accelerating deep learning benchmarks using a co-processor attached to RISC-V core. To do so, we extend the RISC-V instruction set and modified the compilation stack to show significant end-to-end performance boost compared to the CPU-only processing scenario.
Zijian JiangKeran ZhengDavid BolandYungang BaoKan Shi
Matteo PerottiPasquale Davide SchiavoneGiuseppe TagliaviniDavide RossiTariq KurdMark D. HillYingying LiuLuca Benini
Enfang CuiTianzheng LiWei Qian