Articulatory features for robust visual speech recognition

Kate Saenko; Trevor Darrell; James Glass

doi:10.1145/1027933.1027960

ScienceGate Book Chapters

JOURNAL ARTICLE

Articulatory features for robust visual speech recognition

Kate Saenko Trevor Darrell James Glass

Year: 2004 Pages: 152-158

DOI: 10.1145/1027933.1027960

Get Full-Text PDF Get Analytical Report

Abstract

Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions. The idea is to use a set of parallel classifiers to extract different articulatory attributes from the input images, and then combine their decisions to obtain higher-level units, such as visemes or words. We evaluate our approach in a preliminary experiment on a small audio-visual database, using several image noise conditions, and compare it to the standard viseme-based modeling approach.

Keywords:

Viseme Computer science Speech recognition Set (abstract data type) Audio visual Artificial intelligence Noise (video) Speech processing Pattern recognition (psychology) Acoustic model Image (mathematics) Multimedia

Metrics

Cited By

3.32

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Articulatory features for robust visual speech recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Robust speech recognition combining cepstral and articulatory features

Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition

Robust speech recognition with articulatory features using dynamic Bayesian networks

Articulatory features for "meeting" speech recognition

Speech recognition using cepstral articulatory features