JOURNAL ARTICLE

Joint CPU-FPGA Hardware-Aware Quantized Training of Graph Convolutional Networks

Abstract

In this work, a method for training Graph Convolutional Networks where the bulk of the computations for both the forward and backward passes are performed with short word-length fixed-point operands on a unified FPGA hardware accelerator is presented. The accelerator targets the programmable logic of an AMD Zynq Utrascale+ FPGA device with a scalable architecture that can be configured with a variable number of hardware threads and compute units per thread. The gradients and activations are computed using a streaming architecture to reduce memory accesses and pipeline stalls while quantization is applied to all adjacency, feature, weight, and gradient tensors. Experiments show that using the hardware-aware quantized training methodology for this application, we can achieve classification accuracy with little to no classification accuracy degradation when quantizing down to 4-bit fixed-point compared to the reference 32-bit floating-point model. Additionally, we see a significant, up to 34x speedup, compared to running the same model on the CPU in the processing system (PS) only. The proposed hardware-software combination enables efficient, accurate, and fast training and inference at the edge.

Keywords:
Computer science Field-programmable gate array Joint (building) Graph Training (meteorology) Computer hardware Embedded system Computer architecture Theoretical computer science Engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.20
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Graph Neural Networks
Physical Sciences →  Computer Science →  Artificial Intelligence
Graph Theory and Algorithms
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.