JOURNAL ARTICLE

Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning

Abstract

This paper proposes a reinforcement learning based semi-supervised training approach for sequence-to-sequence automatic speech recognition (ASR) systems. Most recent semi-supervised training approaches are based on multi-loss functions such as cross-entropy loss for speech-to-text paired data and reconstruction loss for speech-text unpaired data.Although these approaches show promising results, some considerations still remain: (a) different loss functions are used for paired and unpaired data separately even though the purpose is classification accuracy improvement, and (b) several methods need auxiliary networks that increase the complexity of a semi-supervised training process.To address these issues, a reinforcement learning based approach is proposed. The proposed approach focuses on rewarding ASR to generate more correct sentences for both paired and unpaired speech data. The proposed approach is evaluated on the Wall Street Journal task domain. The experimental results show that the proposed method is effective by reducing the character error rate from 10.4% to 8.7%.

Keywords:
Computer science Reinforcement learning Artificial intelligence Speech recognition Word error rate Sequence (biology) Supervised learning Training set Machine learning Pattern recognition (psychology) Artificial neural network

Metrics

7
Cited By
0.73
FWCI (Field Weighted Citation Impact)
49
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.