Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Rupak Vignesh Swaminathan; Grant P. Strimel; Ariya Rastrow; Harish Mallidi; Kai Zhen; Hieu Duy Nguyen; Nathan Susanj; Athanasios Mouchtaris

doi:10.1109/icassp48485.2024.10446322

JOURNAL ARTICLE

Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Rupak Vignesh Swaminathan Grant P. Strimel Ariya Rastrow Harish Mallidi Kai Zhen Hieu Duy Nguyen Nathan Susanj Athanasios Mouchtaris

Year: 2024 Pages: 12226-12230

DOI: 10.1109/icassp48485.2024.10446322

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we propose a novel sequence-discriminative training criterion for automatic speech recognition (ASR) based on the Conformer Transducer. Inspired by the large-margin classifier framework, we separate the "good" and the "bad" hypotheses in an N-best list produced from a pre-trained transducer model by a margin (τ), hence the term, Max-Margin Transducer (MMT) loss. It is observed that fine-tuning with the proposed loss achieves significant improvement over baseline transducer loss but does not outperform the state-of-the-art minimum word error rate (MWER) training. However, combining the proposed MMT loss with MWER surpasses the performance of either losses suggesting the complimentary nature of MWER and MMT losses. With the combined losses, we obtained 7.44% and 7.68% relative WER improvements on Librispeech test-clean and test-other sets, respectively, and up to 8.9% relative improvement on Multi-lingual Librispeech test sets.

Keywords:

Discriminative model Margin (machine learning) Transducer Word error rate Computer science Classifier (UML) Speech recognition Pattern recognition (psychology) Artificial intelligence Mathematics Machine learning Acoustics Physics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.03

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Abstract

Metrics

Topics

Related Documents

Max-Margin-Based Discriminative Feature Learning

Speaker verification using large margin GMM discriminative training

Incremental word learning using large-margin discriminative training and variance floor estimation

Piece-wise max-margin-based discriminative feature learning

Speaker Identification Using Discriminative Learning of Large Margin GMM