JOURNAL ARTICLE

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Chia-Chih KuoKuan-Yu ChenShang-Bao Luo

Year: 2021 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 29 Pages: 3170-3179   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, in this study, an audio-aware SMCQA framework is proposed. Two different mechanisms are introduced to distill the useful cues from speech, and then a BERT-based SMCQA framework is presented. In other words, the proposed SMCQA framework not only inherits the advantages of contextualized language representations learned by BERT but integrates the complementary acoustic-level information distilled from audio with the text-level information. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

Keywords:
Computer science Natural language processing Language model Embedding Offset (computer science) Spoken language Encoder Transformer Artificial intelligence Representation (politics) Speech recognition Keyword spotting Question answering

Metrics

4
Cited By
0.42
FWCI (Field Weighted Citation Impact)
87
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.