Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition

Raden Mu'az Mun'im; Nakamasa Inoue; Koichi Shinoda

doi:10.1109/icassp.2019.8683171

ScienceGate Book Chapters

JOURNAL ARTICLE

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition

Raden Mu'az Mun'im Nakamasa Inoue Koichi Shinoda

Year: 2019 Pages: 6151-6155

DOI: 10.1109/icassp.2019.8683171

Get Full-Text PDF Get Analytical Report

Abstract

We investigate the feasibility of sequence-level knowledge distillation of Sequence-to-Sequence (Seq2Seq) models for Large Vocabulary Continuous Speech Recognition (LVCSR). We first use a pre-trained larger teacher model to generate multiple hypotheses per utterance with beam search. With the same input, we then train the student model using these hypotheses generated from the teacher as pseudo labels in place of the original ground truth labels. We evaluate our proposed method using Wall Street Journal (WSJ) corpus. It achieved up to 9.8× parameter reduction with accuracy loss of up to 7.0% word-error rate (WER) increase.

Keywords:

Sequence (biology) Computer science Vocabulary Speech recognition Word error rate Utterance Word (group theory) Natural language processing Artificial intelligence Ground truth Distillation Compression (physics) Beam search Algorithm Mathematics Linguistics Search algorithm

Metrics

Cited By

3.84

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Sequence-Level Knowledge Distillation

Advancing Sequence-to-Sequence Based Speech Recognition

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition