JOURNAL ARTICLE

Articulatory features for robust visual speech recognition

Abstract

Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions. The idea is to use a set of parallel classifiers to extract different articulatory attributes from the input images, and then combine their decisions to obtain higher-level units, such as visemes or words. We evaluate our approach in a preliminary experiment on a small audio-visual database, using several image noise conditions, and compare it to the standard viseme-based modeling approach.

Keywords:
Viseme Computer science Speech recognition Set (abstract data type) Audio visual Artificial intelligence Noise (video) Speech processing Pattern recognition (psychology) Acoustic model Image (mathematics) Multimedia

Metrics

45
Cited By
3.32
FWCI (Field Weighted Citation Impact)
37
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Hearing Loss and Rehabilitation
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.