Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning

Feng-Ting Liao; Yung-Chieh Chan; Yi‐Chang Chen; Chan-Jan Hsu; Da-shan Shiu

doi:10.1109/asru57964.2023.10389617

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning

Feng-Ting Liao Yung-Chieh Chan Yi‐Chang Chen Chan-Jan Hsu Da-shan Shiu

Year: 2023 Pages: 1-8

DOI: 10.1109/asru57964.2023.10389617

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.

Keywords:

Computer science Speech recognition Shot (pellet) Zero (linguistics) Domain (mathematical analysis) Conditioning Artificial intelligence Materials science Mathematics Linguistics

Metrics

Cited By

1.79

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning

Abstract

Metrics

Citation History

Topics

Related Documents

ZAR: Zero-shot Action Recognition with Dynamic Prompt Tuning

Exploiting Classification of LLM with Zero-shot Prompt & Fine-tuning

Zero-Shot Cross-Domain Code Search without Fine-Tuning

Zero-Shot Face Recognition via Fine-Tuning CLIP with Contrastive Alignment

Vision Transformers for Scene Recognition: Fine- tuning and Zero-Shot Classification with CLIP