Abstract

Machine learning adoption has seen a widespread bloom in recent years, with neural network implementations being at the forefront. In light of these developments, vector processors are currently experiencing a resurgence of interest, due to their inherent amenability to accelerate data-parallel algorithms required in machine learning environments. In this paper, we propose a scalable and high-performance RISC-V vector processor core. The presented processor employs a triptych of novel mechanisms that work synergistically to achieve the desired goals. An enhanced vector-specific incarnation of register renaming is proposed to facilitate dynamic hardware loop unrolling and alleviate instruction dependencies. Moreover, a cost-efficient decoupled execution scheme splits instructions into execution and memory-access streams, while hardware support for reductions accelerates the execution of key instructions in the RISC-V ISA. Extensive performance evaluation and hardware synthesis analysis validate the efficiency of the new architecture.

Keywords:
Computer science Scalability Loop unrolling Reduced instruction set computing Processor register Parallel computing Instruction set Computer architecture Microarchitecture Multi-core processor Scheme (mathematics) Computer hardware Operating system Compiler Memory address

Metrics

30
Cited By
4.52
FWCI (Field Weighted Citation Impact)
29
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Memory and Neural Computing
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.