JOURNAL ARTICLE

Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators

Ahmed KamaleldinSalma HeshamDiana Göhringer

Year: 2020 Journal:   IEEE Access Vol: 8 Pages: 148812-148826   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Multi-/Many-core architectures are emerging as scalable, high-performance and energy-efficient computing platforms suitable for a variety of application domains from edge to cloud computing. Recently, the appearance of RISC-V open-source ISA creates new possibilities to develop customized computing platforms with high savings in the non-recurring engineering costs. Moreover, the current trends toward open-source hardware frameworks are aimed to reduce design time and cost for complex system-on-chip architectures. Therefore, modularity and re-usability of hardware components are major challenges for flexible hardware architectures. The motivation behind this work is to introduce a modular cluster-based many-core architecture for FPGA accelerators that is re-usable and flexible tailored to implement different many-core taxonomies with less design time and costs by using regular and replicated sets of computing, memory, and interconnection blocks. The proposed many-core architecture is built using multiple processing clusters coupled with a NoC for communication which allows a high degree of design scalability. The processing cluster inside features a configurable multi-core architecture consisting of multiple RISC-V processing elements (PE) tightly coupled with a bus-based interconnection for intra-cluster communication using parameterized scratchpad shared memory. Each PE features a single RISC-V core with a tightly coupled parameterized scratchpad local memory and generic AXI interface. Evaluation results demonstrate that the proposed architecture features a scalable computing performance of 501 MOp/s for 4 clusters and 878 MOp/s for 8 clusters. Moreover, a scalable memory bandwidth up to 4.3 GB/s is achieved for 9 clusters with a power consumption of 1.4 W per cluster utilizing 7.7% of on-chip memory resources. The many-core architecture is implemented and evaluated on Xilinx Virtex Ultrascale+ with the feature of changing the architecture configurations during run-time using dynamic and partial reconfiguration which provides more flexibility and re-usability.

Keywords:
Field-programmable gate array Computer science Modular design Computer architecture Architecture Embedded system Reduced instruction set computing Core (optical fiber) Parallel computing Instruction set Operating system Telecommunications

Metrics

33
Cited By
3.51
FWCI (Field Weighted Citation Impact)
27
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Embedded Systems Design Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
Interconnection Networks and Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
© 2026 ScienceGate Book Chapters — All rights reserved.