Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition

Qiujia Li; Chao Zhang; Philip C. Woodland

doi:10.1109/asru46091.2019.9003837

ScienceGate Book Chapters

JOURNAL ARTICLE

Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition

Qiujia Li Chao Zhang Philip C. Woodland

Year: 2019 Journal: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pages: 39-46

DOI: 10.1109/asru46091.2019.9003837

Get Full-Text PDF Get Analytical Report

Abstract

This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models. The traditional SC system framework includes hidden Markov models and connectionist temporal classification (CTC) based acoustic models, language models (LMs), and a decoding procedure based on a lexicon, whereas the end-to-end style attention-based system jointly models the whole process with a single model. By rescoring the hypotheses produced by traditional systems using end-to-end style systems based on an extended noisy source-channel model, ISCA allows structured knowledge to be easily incorporated via the SC-based model while exploiting the complementarity of the attention-based model. Experiments on the AMI meeting corpus show that ISCA is able to give a relative word error rate reduction up to 21% over an individual system, and by 13% over an alternative method which also involves combining CTC and attention-based models.

Keywords:

Computer science Connectionism Hidden Markov model Language model Speech recognition Decoding methods Channel (broadcasting) Sequence (biology) Artificial intelligence Word error rate Sequence labeling Lexicon Complementarity (molecular biology) Natural language processing Artificial neural network Algorithm Task (project management)

Metrics

Cited By

2.00

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

Advancing Sequence-to-Sequence Based Speech Recognition

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition