Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project

Pavel Ircing; Josef Psutka; Jan Hajič; Bill Byrne; Jiří Mírovský

doi:10.21437/interspeech.2005-489

ScienceGate Book Chapters

JOURNAL ARTICLE

Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project

Josef Psutka Pavel Ircing Josef Psutka Jan Hajič Bill Byrne Jiří Mírovský

Year: 2005 Pages: 1349-1352

DOI: 10.21437/interspeech.2005-489

Get Full-Text PDF Get Analytical Report

Abstract

This paper describes the 3.5-years effort put into building LVCSR systems for recognition of spontaneous speech of Czech, Russian, and Slovak witnesses of the Holocaust in the MALACH project. For processing of colloquial, highly emotional and heavily accented speech of elderly people containing many non-speech events we have developed techniques that very effectively handle both non-speech events and colloquial and accented variants of uttered words. Manual transcripts as one of the main sources for language modeling were automatically „normalized” using standardized lexicon, which brought about 2 to 3% reduction of the word error rate (WER). The subsequent interpolation of such LMs with models built from an additional collection (consisting of topically selected sentences from general text corpora) resulted into an additional improvement of performance of up to 3 % .

Keywords:

Czech Slovak Computer science Transcription (linguistics) Speech recognition Natural language processing Linguistics

Metrics

Cited By

0.38

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project

Abstract

Metrics

Citation History

Topics

Related Documents

Towards Automatic Transcription of Spontaneous Czech Speech in the MALACH Project

Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project

Transformer-based Automatic Speech Recognition of Formal and Colloquial\n Czech in MALACH Project

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

Large vocabulary ASR for spontaneous czech in the MALACH project