Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Chia-Chih Kuo; Kuan-Yu Chen; Shang-Bao Luo

doi:10.1109/taslp.2021.3120638

ScienceGate Book Chapters

JOURNAL ARTICLE

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Chia-Chih Kuo Kuan-Yu Chen Shang-Bao Luo

Year: 2021 Journal: IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 29 Pages: 3170-3179 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/taslp.2021.3120638

Get Full-Text PDF Get Analytical Report

Abstract

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, in this study, an audio-aware SMCQA framework is proposed. Two different mechanisms are introduced to distill the useful cues from speech, and then a BERT-based SMCQA framework is presented. In other words, the proposed SMCQA framework not only inherits the advantages of contextualized language representations learned by BERT but integrates the complementary acoustic-level information distilled from audio with the text-level information. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

Keywords:

Computer science Natural language processing Language model Embedding Offset (computer science) Spoken language Encoder Transformer Artificial intelligence Representation (politics) Speech recognition Keyword spotting Question answering

Metrics

Cited By

0.42

FWCI (Field Weighted Citation Impact)

Refs

0.69

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Exploring Visual Multiple-Choice Question Answering with Pre-trained Vision-Language Models

An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering

Improving visual question answering with pre-trained language modeling

Question Answering Systems Based on Pre-trained Language Models: Recent Progress

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering