Abstract

This paper focuses on memory access improvements for the OpenCL architecture for FPGAs with the goal of achieving trade-off between performance and required resources. In OpenCL compute units, there is usually a linear relation between computation time and local memory access latency. This latency is normally hidden by increasing the parallel workload. However, with such an approach, the target FPGA device could easily run out of resources. In this work, conflict-free multiported memories are used to minimize local memory access latency. Experiments show that multiported memories can successfully increase computation speed and reduce the required parallel workload for maximum throughput to practical amounts.

Keywords:
Computer science Latency (audio) Workload Field-programmable gate array Parallel computing Computation CAS latency Embedded system Computer hardware Operating system Semiconductor memory Algorithm Memory controller

Metrics

3
Cited By
0.30
FWCI (Field Weighted Citation Impact)
9
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Parallel Computing and Optimization Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture
Advanced Data Storage Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications
Interconnection Networks and Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.