JOURNAL ARTICLE

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition

Abstract

It has been known for a long time that the classic Hidden-Markov-Model (HMM) derivation for speech recognition contains assumptions such as independence of observation vectors and weak duration modeling that are practical but unrealistic.When using the hybrid approach this is amplified by trying to fit a discriminative model into a generative one.Hidden Conditional Random Fields (CRFs) and segmental models (e.g.Semi-Markov CRFs / Segmental CRFs) have been proposed as an alternative, but for a long time have failed to get traction until recently.In this paper we explore different length modeling approaches for segmental models, their relation to attention-based systems.Furthermore we show experimental results on a handwriting recognition task and to the best of our knowledge the first reported results on the Switchboard 300h speech recognition corpus using this approach.

Keywords:
Computer science Speech recognition Encoder Vocabulary Artificial intelligence Speech coding Natural language processing Linguistics

Metrics

19
Cited By
2.78
FWCI (Field Weighted Citation Impact)
36
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.