Feature-rich continuous language models for speech recognition

Piotr Mirowski; Sumit Chopra; Suhrid Balakrishnan; Srinivas Bangalore

doi:10.1109/slt.2010.5700858

ScienceGate Book Chapters

JOURNAL ARTICLE

Feature-rich continuous language models for speech recognition

Piotr Mirowski Sumit Chopra Suhrid Balakrishnan Srinivas Bangalore

Year: 2010

DOI: 10.1109/slt.2010.5700858

Get Full-Text PDF Get Analytical Report

Abstract

State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and low-dimensional mapping of words, and base our predictions for the probabilities of the target word on non-linear dynamics of the latent space representation of the words in context window. We build on neural networks-based language models; by expressing them as energy-based models, we can further enrich the models with additional inputs such as part-of-speech tags, topic information and graphs of word similarity. We demonstrate a significantly lower perplexity on different text corpora, as well as improved word accuracy rate on speech recognition tasks, as compared to Kneser-Ney back-off n-gram-based language models.

Keywords:

Perplexity Computer science Language model Word (group theory) Artificial intelligence Natural language processing Feature (linguistics) Representation (politics) Context (archaeology) Probabilistic logic Similarity (geometry) Speech recognition Word error rate Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Feature-rich continuous language models for speech recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Building Language Models for Continuous Speech Recognition Systems

HMM continuous speech recognition using stochastic language models

Feature sets in continuous speech recognition for the Portuguese language

Adaptation of grammar-based language models for continuous speech recognition

Using smoothed K-TSS language models in continuous speech recognition