JOURNAL ARTICLE

Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System

Abstract

Hardware accelerators are being designed to offload compute-intensive tasks such as deep neural networks from the CPU to improve the overall performance of an application, specifically on the performance-per-watt metric. Encoder-decoder-based sequence-to-sequence models such as the Transformer model have demonstrated state-of-the-art results in end-to-end automatic speech recognition systems (ASRs). The Transformer model being intensive on memory and computation poses a challenge for an FPGA implementation. This paper proposes an end-to-end architecture to accelerate a Transformer for an ASR system. The host CPU orchestrates the computations from different encoder and decoder stages of the Transformer architecture on the designed hardware accelerator with no necessity for intervening FPGA reconfiguration. The communication latency is hidden by prefetching the weights of the next encoder/decoder block while the current block is being processed. The computation is split across both the Super Logic Regions (SLRs) of the FPGA, mitigating the inter-SLR communication. The proposed design presents an optimal latency, exploiting the available resources. The accelerator design is realized using high-level synthesis tools and evaluated on an Alveo U-50 FPGA card. The design demonstrates an average speed-up of $32 \times$ compared to an Intel Xeon E5-2640 CPU and $8.8 \times$ compared to NVIDIA GeForce RTX 3080 Ti Graphics card for a 32-bit floating point single precision model.

Keywords:
Computer science Field-programmable gate array Computer hardware Encoder Xeon Hardware acceleration Floating point End-to-end principle Computation Transformer Embedded system Parallel computing Operating system Artificial intelligence Engineering

Metrics

2
Cited By
0.51
FWCI (Field Weighted Citation Impact)
20
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Packet Processing and Optimization
Physical Sciences →  Computer Science →  Hardware and Architecture
Embedded Systems Design Techniques
Physical Sciences →  Computer Science →  Hardware and Architecture

Related Documents

JOURNAL ARTICLE

A Transformer-Based End-to-End Automatic Speech Recognition Algorithm

Fang DongYiyang QianTianlei WangPeng LiuJiuwen Cao

Journal:   IEEE Signal Processing Letters Year: 2023 Vol: 30 Pages: 1592-1596
JOURNAL ARTICLE

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed HadwanHamzah A. AlsayadiSalah Al-Hagree

Journal:   Computers, materials & continua/Computers, materials & continua (Print) Year: 2022 Vol: 74 (2)Pages: 3471-3487
JOURNAL ARTICLE

A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition

Ruchao FanWei ChuChang PengAbeer Alwan

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2023 Vol: 31 Pages: 1436-1448
© 2026 ScienceGate Book Chapters — All rights reserved.