JOURNAL ARTICLE

Comparing audio and visual information for speech processing

Abstract

This paper examines the utility of audio-visual speech for the two related tasks of speech and speaker recognition. A study of the confusion that exists between speaker and speech elements was performed to show that principal component analysis (PCA) based visual speech is considerably better for the task of speaker recognition than for speech. Decision fusion speech and speaker recognition engines were also tested under various levels of acoustic degradation to find that the optimal fusion configuration for speaker recognition was substantially different than that for speech. These results highlight the problem of employing similar visual features for both speech and speaker recognition.

Keywords:
Speech recognition Speaker recognition Computer science Speech analytics Audio mining Speech processing Voice activity detection Task (project management) Speaker diarisation Confusion Acoustic model Artificial intelligence Psychology

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
4
Refs
0.04
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Effects of visual information on audio-visual speech processing

Satoko HisanagaKaoru SekiyamaTomohiko IgasakiNobuki Murayama

Journal:   The Proceedings of the Annual Convention of the Japanese Psychological Association Year: 2011 Vol: 75 (0)Pages: 2AM061-2AM061
BOOK-CHAPTER

Audio-visual Speech Processing

Ruth Campbell

Elsevier eBooks Year: 2006 Pages: 562-569
BOOK-CHAPTER

Audio-Visual Speech Processing

Simon Lucey

Encyclopedia of Biometrics Year: 2009 Pages: 43-43
BOOK-CHAPTER

Fusion of Audio-Visual Information for Integrated Speech Processing

Satoshi Nakamura

Lecture notes in computer science Year: 2001 Pages: 127-143
© 2026 ScienceGate Book Chapters — All rights reserved.