Edge computing is the key to unlocking the power of deep neural networks on edge devices. However, deploying power-hungry deep neural network inference on resource-constrained and power-limited devices poses serious challenges in delivering real-time performance. With the advent of RISC-V Vector extension, there has been a renewed interest in vector processors to exploit data-parallel workloads. General purpose processors featuring vector coprocessors are riddled with complex control mechanisms such as instruction schedulers, operand queues, and scoreboards which have largely inhibited their presence in the realm of low-power microcontrollers. This work features a systolic array based vector unit that is closely integrated into the pipeline of a 32-bit, in-order, single-issue RISC-V scalar core that runs at 50 MHz. The robustness of neural networks coupled with the flexibility offered by the RISC-V Vector extension instruction set is used to significantly reduce several architectural complexities of the vector unit. The vector processor is implemented on Xilinx Virtex 7 (XC7VX485T) FPGA. Benchmarking the RISC-V Vector processor shows a speedup of up to $40.7\times$ over the scalar RISC-V core on image recognition tasks at the cost of $1.2\times$ power consumption and $1.8\times$ hardware resources. The soft core vector processor also compares well with similar processors that use data-level parallelism.
Shilpa Anil BoradeSaurabh BansodAnirban Jyoti HatiShashank Kumar Singh
Sandy A. WasifMiran WaelPaul R. GenßlerEman AzabMaggie MashalyMohamed A. Abd El GhanyHussam Amrouch
Kariofyllis PatsidisChrysostomos NicopoulosGeorgios Ch. SirakoulisGiorgos Dimitrakopoulos