Low-resource speech recognition using pre-trained speech representation models

Chun Fung Ranzo Huang

doi:10.14711/thesis-991013222951203412

ScienceGate Book Chapters

DISSERTATION

Low-resource speech recognition using pre-trained speech representation models

Chun Fung Ranzo Huang

Year: 2023

DOI: 10.14711/thesis-991013222951203412

Get Full-Text PDF Get Analytical Report

Abstract

Difficulties in eliciting substantial spoken data from speaker populations of interest and producing the accompanying transcripts result in low-resource scenarios in which the development of robust automatic speech recognition (ASR) systems may be hindered. With the aid of a large volume of unlabeled audio data, self-supervised speech representation learning may address this limitation by learning a model-based feature extractor via a proxy task in advance, thus offering pre-trained representations transferable to the ASR task for fine-tuning. This dissertation reviews current self-supervised speech representation learning methodologies and investigates the application of wav2vec 2.0 ASR on a developing corpus named CU-MARVEL in order to provide automatic transcripts for streamlining it...[ Read more ]

Keywords:

Computer science Speech recognition Extractor Task (project management) Representation (politics) Feature learning Artificial intelligence Natural language processing Feature (linguistics) Speech analytics Labeled data Acoustic model Feature extraction Speaker recognition Speech processing Engineering Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Low-resource speech recognition using pre-trained speech representation models

Abstract

Metrics

Topics

Related Documents

Exploring Representation-Efficient Transfer Learning Approaches for Speech Recognition and Translation Using Pre-trained Speech Models

Using Pre-trained Models for Code-Switched Speech Recognition

Food Recognition Using Large-scale Pre-trained Speech Models

Navajo Speech Recognition Using Low-Resource Language Models

Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models