Comparing audio and visual information for speech processing

David Dean; Patrick Lucey; Sridha Sridharan; Timothy J. Wark

doi:10.1109/isspa.2005.1580195

ScienceGate Book Chapters

JOURNAL ARTICLE

Comparing audio and visual information for speech processing

David Dean Patrick Lucey Sridha Sridharan Timothy J. Wark

Year: 2006 Vol: 1 Pages: 58-61

DOI: 10.1109/isspa.2005.1580195

Get Full-Text PDF Get Analytical Report

Abstract

This paper examines the utility of audio-visual speech for the two related tasks of speech and speaker recognition. A study of the confusion that exists between speaker and speech elements was performed to show that principal component analysis (PCA) based visual speech is considerably better for the task of speaker recognition than for speech. Decision fusion speech and speaker recognition engines were also tested under various levels of acoustic degradation to find that the optimal fusion configuration for speaker recognition was substantially different than that for speech. These results highlight the problem of employing similar visual features for both speech and speaker recognition.

Keywords:

Speech recognition Speaker recognition Computer science Speech analytics Audio mining Speech processing Voice activity detection Task (project management) Speaker diarisation Confusion Acoustic model Artificial intelligence Psychology

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.04

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Comparing audio and visual information for speech processing

Abstract

Metrics

Topics

Related Documents

Effects of visual information on audio-visual speech processing

Audio-visual Speech Processing

Audio-Visual Speech Processing

Fusion of Audio-Visual Information for Integrated Speech Processing

Some Experiments in Audio-Visual Speech Processing