JOURNAL ARTICLE

An audio-visual corpus for speech perception and automatic speech recognition

Martin CookeJon BarkerStuart CunninghamXu Shao

Year: 2006 Journal:   The Journal of the Acoustical Society of America Vol: 120 (5)Pages: 2421-2424   Publisher: Acoustical Society of America

Abstract

An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as “place green at B 4 now.” Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.

Keywords:
Intelligibility (philosophy) Computer science Speech recognition QUIET Speech corpus Perception Audio mining Audio visual Speech perception Motor theory of speech perception Acoustic model Natural language processing Speech processing Speech synthesis Multimedia Psychology

Metrics

1143
Cited By
9.66
FWCI (Field Weighted Citation Impact)
16
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hearing Loss and Rehabilitation
Life Sciences →  Neuroscience →  Cognitive Neuroscience
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Multisensory perception and integration
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.