Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

Hemkant Nehete; Sandeep Soni; Tharun Kumar Reddy Bollu; Balasubramanian Raman; Brajesh Kumar Kaushik

doi:10.1109/ojnano.2024.3524265

ScienceGate Book Chapters

JOURNAL ARTICLE

Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

Hemkant Nehete Sandeep Soni Tharun Kumar Reddy Bollu Balasubramanian Raman Brajesh Kumar Kaushik

Year: 2024 Journal: IEEE Open Journal of Nanotechnology Vol: 6 Pages: 16-26 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/ojnano.2024.3524265

Get Full-Text PDF Get Analytical Report

Abstract

Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.

Keywords:

Inference Magnetoresistive random-access memory Computer science Training (meteorology) Artificial neural network Computer architecture Architecture Artificial intelligence Machine learning Random access memory Computer hardware

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.24

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Neural Networks and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Brain Tumor Detection and Classification

Life Sciences → Neuroscience → Neurology

Machine Learning and ELM

Physical Sciences → Computer Science → Artificial Intelligence

Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

Abstract

Metrics

Topics

Related Documents

Sparsity-Oriented MRAM-Centric Computing for Efficient Neural Network Inference

A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type

HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference

Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators

Optimal Network Architecture and Inference Dependencies for Efficient Training of Deep Neural Networks in Bioinformatics