Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

H. K. Kim; Seunghyun Seo; Lukas Jyuhn‐Hsiarn Lee; Seolki Baek

doi:10.21437/interspeech.2023-361

ScienceGate Book Chapters

JOURNAL ARTICLE

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

H. K. Kim Seunghyun Seo Lukas Jyuhn‐Hsiarn Lee Seolki Baek

Year: 2023 Pages: 1653-1657

DOI: 10.21437/interspeech.2023-361

Get Full-Text PDF Get Analytical Report

Abstract

Punctuated text prediction is crucial for automatic speech recognition as it enhances readability and impacts downstream natural language processing tasks.In streaming scenarios, the ability to predict punctuation in real-time is particularly desirable but presents a difficult technical challenge.In this work, we propose a method for predicting punctuated text from input speech using a chunk-based Transformer encoder trained with Connectionist Temporal Classification (CTC) loss.The acoustic model trained with long sequences by concatenating the input and target sequences can learn punctuation marks attached to the end of sentences more effectively.Additionally, by combining CTC losses on the chunks and utterances, we achieved both the improved F1 score of punctuation prediction and Word Error Rate (WER).

Keywords:

Punctuation Computer science Speech recognition Connectionism Readability Language model Transformer Encoder Artificial intelligence Word error rate Natural language processing Artificial neural network

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.56

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

Abstract

Metrics

Citation History

Topics

Related Documents

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection

Improved Training for Online End-to-end Speech Recognition Systems

Streaming End-to-End Multi-Talker Speech Recognition

Enhancing End-to-End Malayalam Automatic Speech Recognition with Language Model Augmentation