Joint Phoneme-Grapheme Model for End-To-End Speech Recognition

Yotaro Kubo; Michiel Bacchiani

doi:10.1109/icassp40776.2020.9054557

ScienceGate Book Chapters

JOURNAL ARTICLE

Joint Phoneme-Grapheme Model for End-To-End Speech Recognition

Yotaro Kubo Michiel Bacchiani

Year: 2020 Vol: 28 Pages: 6119-6123

DOI: 10.1109/icassp40776.2020.9054557

Get Full-Text PDF Get Analytical Report

Abstract

This paper proposes methods to improve a commonly used end-to-end speech recognition model, Listen-Attend-Spell (LAS). The methods we propose use multi-task learning to improve generalization of the model by leveraging information from multiple labels. The focus in this paper is on multi-task models for simultaneous signal-to-grapheme and signal-to-phoneme conversions while sharing the encoder parameters. Since phonemes are designed to be a precise description of the linguistic aspects of the speech signal, using phoneme recognition as an auxiliary task can help guiding the early stages of training to be more stable. In addition to conventional multi-task learning, we obtain further improvements by introducing a method that can exploit dependencies between labels in different tasks. Specifically, the dependencies between phonemes and grapheme sequences are considered. In conventional multi-task learning these sequences are assumed to be independent. Instead, in this paper, a joint model is proposed based on "iterative refinement" where dependency modeling is achieved by a multi-pass strategy. The proposed method is evaluated on a 28000h corpus of Japanese speech data. Performance of a conventional multi-task approach is contrasted with that of the joint model with iterative refinement.

Keywords:

Computer science Grapheme Speech recognition Task (project management) Joint (building) Artificial intelligence Focus (optics) Generalization Dependency (UML) Language model Exploit Encoder Natural language processing

Metrics

Cited By

1.91

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Joint Phoneme-Grapheme Model for End-To-End Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion

Joint decoding for phoneme-grapheme continuous speech recognition

LLM-based phoneme-to-grapheme for phoneme-based speech recognition