Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Ryo Masumura; Taichi Asami; Takanobu Oba; Sumitaka Sakauchi; Akinori Ito

doi:10.1587/transinf.2018edp7242

ScienceGate Book Chapters

JOURNAL ARTICLE

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Ryo Masumura Taichi Asami Takanobu Oba Sumitaka Sakauchi Akinori Ito

Year: 2019 Journal: IEICE Transactions on Information and Systems Vol: E102.D (12)Pages: 2557-2567 Publisher: Institute of Electronics, Information and Communication Engineers

DOI: 10.1587/transinf.2018edp7242

Get Full-Text PDF Get Analytical Report

Abstract

This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

Keywords:

Recurrent neural network Computer science Latent variable Language model Artificial intelligence Context (archaeology) Inference Speech recognition Artificial neural network

Metrics

Cited By

0.92

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Latent words recurrent neural network language models

Bidirectional recurrent neural network language models for automatic speech recognition

Hierarchical Latent Words Language Models for Automatic Speech Recognition

Context Enhancement of Recurrent Neural Network Language Models for Automatic Speech Recognition

Viterbi Approximation of Latent Words Language Models for Automatic Speech Recognition