Difficulties in eliciting substantial spoken data from speaker populations of interest and producing the accompanying transcripts result in low-resource scenarios in which the development of robust automatic speech recognition (ASR) systems may be hindered. With the aid of a large volume of unlabeled audio data, self-supervised speech representation learning may address this limitation by learning a model-based feature extractor via a proxy task in advance, thus offering pre-trained representations transferable to the ASR task for fine-tuning. This dissertation reviews current self-supervised speech representation learning methodologies and investigates the application of wav2vec 2.0 ASR on a developing corpus named CU-MARVEL in order to provide automatic transcripts for streamlining it...[ Read more ]
Wenhao WangYonghe WangNan ChenFeilong Bao
P. VasukiUjjwaleshwar SrikanthVijay Sankarnarayanan
Satoshi NaitoMasafumi NishimuraMasafumi NishidaYasuo HoriuchiShingo Kuroiwa
Emery M. SutherlandMelvatha R. CheeMarios S. Pattichis
Lara GauderLeonardo PepinoLuciana FerrerPablo Riera