Specialized Neural Network Hardware Accelerators

V. V. Korneev

doi:10.17587/prin.14.3-11

ScienceGate Book Chapters

JOURNAL ARTICLE

Specialized Neural Network Hardware Accelerators

V. V. Korneev

Year: 2023 Journal: PROGRAMMNAYA INGENERIA Vol: 14 (1)Pages: 3-11

DOI: 10.17587/prin.14.3-11

Get Full-Text PDF Get Analytical Report

Abstract

It seems that by now enough samples of specialized neural network microprocessor crystals and systems based on them have already been created to indicate the trends of their development, and most importantly, their place in the overall development of supercomputer architectures and technologies. The use of low-bit representations of numbers, such as FP8, INT8, BF16, acceptable in neural network computing, allows, on the one hand, to achieve the performance of the 2015 FP8 TFLOPS, 1008 BF16 TFLOPS crystal, and, on the other hand, to reduce the energy consumption of the multiplication operation. Low bit depth caused attention to rounding errors. In a number of crystals, the set of rounding modes has been expanded in comparison with the generally accepted standard and the possibility of programmatically setting the rounding mode has been introduced. In addition, the validity of the creation of specialized neuroprocessor crystals is due to the use of structural programming elements, in which a computer is programmatically formed for an executable algorithm. Therefore, along with reduction the bit depth and support for processing sparse neural networks, in computing systems created on the basis of SambaNova SN30 RDU, Graphcore Colossus MK2 IPU, Untether AI Boqueria, AWS Trainium1, Tesla Dojo D1, there is the possibility, to some extent, of implementing structural programming of calculations. The sparsity of the processed data caused the abandonment of cache memory and the use of on-chip large scratchpad memory with increased bandwidth for data delivery between memory and arithmetic logic devices, as well as between memory and on-chip and inter-chip communication fabric. Therefore, we can talk about a different hierarchical structure of memory compared to the traditional one using cache memory. Thus, specialization in neural network algorithms has led to the emergence of massively parallel systems architectures for processing low-bit data formats with poor temporal and spatial localization of memory requests.

Keywords:

Computer science Executable Supercomputer Rounding Artificial neural network Parallel computing Microprocessor Cache Computer hardware Computer engineering Computer architecture Embedded system Operating system Artificial intelligence

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.01

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Data Processing Techniques

Physical Sciences → Engineering → Control and Systems Engineering

Specialized Neural Network Hardware Accelerators

Abstract

Metrics

Topics

Related Documents

Hardware neural network accelerators

Hardware neural network accelerators

A Survey on Neural Network Hardware Accelerators

Towards Hardware Trojan Resilient Convolutional Neural Network Accelerators

Memory Requirements for Convolutional Neural Network Hardware Accelerators