Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition

Jennifer Drexler; James Glass

doi:10.1109/asru46091.2019.9003873

ScienceGate Book Chapters

JOURNAL ARTICLE

Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition

Jennifer Drexler James Glass

Year: 2019 Journal: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pages: 913-919

DOI: 10.1109/asru46091.2019.9003873

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we present a novel training procedure for attention-based end-to-end automatic speech recognition. Our goal is to push the encoder network to output only linguistic information, improving generalization performance particularly in low-resource scenarios. We accomplish this with the addition of a text encoder network, which the speech encoder is encouraged to mimic. Our main innovation is the comparison of the attention-weighted speech encoder outputs to the outputs of the text encoder - this guarantees two sequences of the same length that can be directly aligned. We show that our training procedure significantly decreases word error rates in all experiments and has the biggest absolute impact in the lowest resource scenarios.

Keywords:

Encoder Computer science Generalization Speech recognition End-to-end principle Word (group theory) Resource (disambiguation) Word error rate Artificial intelligence Natural language processing Mathematics

Metrics

Cited By

0.15

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Improving Attention-Based End-to-End Speech Recognition by Monotonic Alignment Attention Matrix Reconstruction

Character-Aware Attention-Based End-to-End Speech Recognition

End-to-end Speech-to-Punctuated-Text Recognition

Toward Developing Attention-Based End-To-End Automatic Speech Recognition

Visual analysis of attention-based end-to-end speech recognition