JOURNAL ARTICLE

Specialized Neural Network Hardware Accelerators

V. V. Korneev

Year: 2023 Journal:   PROGRAMMNAYA INGENERIA Vol: 14 (1)Pages: 3-11

Abstract

It seems that by now enough samples of specialized neural network microprocessor crystals and systems based on them have already been created to indicate the trends of their development, and most importantly, their place in the overall development of supercomputer architectures and technologies. The use of low-bit representations of numbers, such as FP8, INT8, BF16, acceptable in neural network computing, allows, on the one hand, to achieve the performance of the 2015 FP8 TFLOPS, 1008 BF16 TFLOPS crystal, and, on the other hand, to reduce the energy consumption of the multiplication operation. Low bit depth caused attention to rounding errors. In a number of crystals, the set of rounding modes has been expanded in comparison with the generally accepted standard and the possibility of programmatically setting the rounding mode has been introduced. In addition, the validity of the creation of specialized neuroprocessor crystals is due to the use of structural programming elements, in which a computer is programmatically formed for an executable algorithm. Therefore, along with reduction the bit depth and support for processing sparse neural networks, in computing systems created on the basis of SambaNova SN30 RDU, Graphcore Colossus MK2 IPU, Untether AI Boqueria, AWS Trainium1, Tesla Dojo D1, there is the possibility, to some extent, of implementing structural programming of calculations. The sparsity of the processed data caused the abandonment of cache memory and the use of on-chip large scratchpad memory with increased bandwidth for data delivery between memory and arithmetic logic devices, as well as between memory and on-chip and inter-chip communication fabric. Therefore, we can talk about a different hierarchical structure of memory compared to the traditional one using cache memory. Thus, specialization in neural network algorithms has led to the emergence of massively parallel systems architectures for processing low-bit data formats with poor temporal and spatial localization of memory requests.

Keywords:
Computer science Executable Supercomputer Rounding Artificial neural network Parallel computing Microprocessor Cache Computer hardware Computer engineering Computer architecture Embedded system Operating system Artificial intelligence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.01
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Data Processing Techniques
Physical Sciences →  Engineering →  Control and Systems Engineering

Related Documents

JOURNAL ARTICLE

Hardware neural network accelerators

Olivier Temam

Year: 2013 Pages: 1-1
JOURNAL ARTICLE

Hardware neural network accelerators

Olivier Temam

Journal:   International Conference on Hardware/Software Codesign and System Synthesis Year: 2013
JOURNAL ARTICLE

A Survey on Neural Network Hardware Accelerators

Tamador MohaidatKasem Khalil

Journal:   IEEE Transactions on Artificial Intelligence Year: 2024 Vol: 5 (8)Pages: 3801-3822
JOURNAL ARTICLE

Towards Hardware Trojan Resilient Convolutional Neural Network Accelerators

Peiyao SunBasel HalakTom J. Kázmierski

Journal:   Journal of Hardware and Systems Security Year: 2025 Vol: 9 (3-4)Pages: 89-106
© 2026 ScienceGate Book Chapters — All rights reserved.