JOURNAL ARTICLE

GEMM-Based Quantized Neural Network FPGA Accelerator Design

Abstract

In this study, we will explore Neural Network based FPGA acceleration based on accelerating General Matrix Multiplication (GEMM). GEMM acceleration allows regularized and modular implementation of accelerator design, as well as providing the benefits of scalability. GEMM based designs also offer a degree of functional flexibility which is a key benefit to understand the highly dynamic architectural developments in Deep Learning algorithms. We quantify the theoretical performance model and tradeoffs of a GEMM accelerator along with exploration of the design space. Moreover, we propose a design for an accelerator exploiting 8-bit quantization to increase bandwidth while preserving model accuracy, exploiting FPGAs for model parallelization and data re-use for high performance and low latency neural network inference. Lastly, we test and evaluate our design on the MNIST dataset. The proposed method is useful to optimize the hardware area in Deep Learning systems without sacrificing performance.

Keywords:
Computer science Field-programmable gate array Modular design Hardware acceleration MNIST database Artificial neural network Design space exploration Computer architecture Scalability Deep learning Flexibility (engineering) Inference Computer engineering Parallel computing Artificial intelligence Embedded system

Metrics

4
Cited By
0.11
FWCI (Field Weighted Citation Impact)
4
Refs
0.46
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and ELM
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.