Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

Wonyong Sung; Lukas Jyuhn‐Hsiarn Lee; Jin-Hwan Park

doi:10.1109/sips47522.2019.9020317

ScienceGate Book Chapters

JOURNAL ARTICLE

Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

Wonyong Sung Lukas Jyuhn‐Hsiarn Lee Jin-Hwan Park

Year: 2019 Pages: 160-165

DOI: 10.1109/sips47522.2019.9020317

Get Full-Text PDF Get Analytical Report

Abstract

Real-time speech recognition on mobile and embedded devices is an important application of neural networks. Acoustic modeling is the fundamental part of speech recognition and is usually implemented with long short-term memory (LSTM)-based recurrent neural networks (RNNs). However, the single thread execution of an LSTM RNN is extremely slow in most embedded devices because the algorithm needs to fetch a large number of parameters from the DRAM for computing each output sample. We explore a few acoustic modeling algorithms that can be executed very efficiently on embedded devices. These algorithms reduce the overhead of memory accesses using multi-timestep parallelization that computes multiple output samples at a time by reading the parameters only once from the DRAM. The algorithms considered are the quasi RNNs (QRNNs), Gated ConvNets, and diagonalized LSTMs. In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in QRNNs and Gated ConvNets. The experiments were conducted using the connectionist temporal classification (CTC)-based end-to-end speech recognition on WSJ corpus. We not only significantly increase the execution speed but also obtain a much higher accuracy, compared to LSTM RNN-based modeling. Thus, this work can be applicable not only to embedded system-based implementations but also to server-based ones.

Keywords:

Computer science Dram Recurrent neural network Connectionism End-to-end principle Artificial neural network Thread (computing) Language model Speech recognition Parallel computing Computer engineering Algorithm Artificial intelligence Computer hardware

Metrics

Cited By

0.15

FWCI (Field Weighted Citation Impact)

Refs

0.61

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Acoustic Modeling Using Convolutional Neural Networks

Modeling Nonlinear Audio Effects with End-to-end Deep Neural Networks

End‐to‐End Acoustic Echo Cancellation Based on Time‐Domain Neural Networks

End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition

End-to-end Neural Networks