JOURNAL ARTICLE

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection

Takafumi MoriyaHiroshi SatōTsubasa OchiaiMarc DelcroixTakahiro Shinozaki

Year: 2023 Journal:   IEEE Access Vol: 11 Pages: 13906-13917   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Automatic speech recognition of a target speaker in the presence of interfering speakers remains a challenging issue. One approach to tackle this problem is target-speaker speech recognition, which conditions the recognition process on an embedding that characterizes the voice of the target speaker. This enables recognizing only the speech of the target speaker while ignoring interferences. In this work, we propose an end-to-end target-speaker speech recognition system based on a neural transducer architecture to allow streaming and on-device recognition. Moreover, a target-speaker speech recognition system should be able to detect when the target speaker is inactive and output nothing in such a case. We introduce training and decoding schemes to allow target-speaker activity detection within our proposed recognition system. We confirm experimentally that our proposed end-to-end system performs competitively to conventional cascade approaches of a target speech extraction module and a recognition module while reducing computation costs and allowing streaming decoding.

Keywords:
Computer science Speech recognition Speaker recognition Voice activity detection Decoding methods Speaker diarisation End-to-end principle Speech processing Artificial intelligence Pattern recognition (psychology)

Metrics

10
Cited By
2.68
FWCI (Field Weighted Citation Impact)
48
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.