JOURNAL ARTICLE

Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Abstract

In this work, we propose a novel sequence-discriminative training criterion for automatic speech recognition (ASR) based on the Conformer Transducer. Inspired by the large-margin classifier framework, we separate the "good" and the "bad" hypotheses in an N-best list produced from a pre-trained transducer model by a margin (τ), hence the term, Max-Margin Transducer (MMT) loss. It is observed that fine-tuning with the proposed loss achieves significant improvement over baseline transducer loss but does not outperform the state-of-the-art minimum word error rate (MWER) training. However, combining the proposed MMT loss with MWER surpasses the performance of either losses suggesting the complimentary nature of MWER and MMT losses. With the combined losses, we obtained 7.44% and 7.68% relative WER improvements on Librispeech test-clean and test-other sets, respectively, and up to 8.9% relative improvement on Multi-lingual Librispeech test sets.

Keywords:
Discriminative model Margin (machine learning) Transducer Word error rate Computer science Classifier (UML) Speech recognition Pattern recognition (psychology) Artificial intelligence Mathematics Machine learning Acoustics Physics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
29
Refs
0.03
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.