JOURNAL ARTICLE

Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition

Akriti Bahal

Year: 2012 Journal:   IOSR Journal of Computer Engineering Vol: 5 (1)Pages: 31-36   Publisher: International Organization Of Scientific Research (IOSR)

Abstract

Major developments in the field of finding more natural ways of interacting with computers have been taking place.The clear focus lies on making technology more approachable to people.The concept that computers can comprehend our various gestures by eyes, voices, touch and our different movements to interact is called the Natural User Interface (NUI).Today, many of these elements are available in mobile phones, PCs, and in other devices.Speech technologies particularly play a substantial role in this evolution.Significant advancement in automatic speech recognition (ASR) for well defined applications like dictation and medium vocabulary transaction processing assignments in comparatively controlled environments have been made.But, automatic speech recognition still has to reach a level needed for speech to become a completely pervasive user interface because even in clean acoustic surroundings, the state of ASR system performance falls behind human speech perception.Visual speech recognition, however, is a promising source of extra speech information and it has successfully exhibited to enhance noise robustness of automatic speech recognizers, thereby promising to expand their usability in the human computer interaction.In this paper, the main components of audio-visual speech recognition, i.e., the audio and the video components are discussed, along with the latest advancements made in this field.Further, the research goes beyond the recent advancements and discusses the future scope of audio video speech recognition and mentions some likely future developments, evaluating each on the basis of its performance.Graphs are plotted based on experiments to depict the performance improvements from audioonly ASR to audio-video ASR, along with its expected performance level in future.

Keywords:
Computer science Speech recognition Audio visual Audio mining Viseme Acoustic model Artificial intelligence Speech processing Multimedia

Metrics

2
Cited By
0.24
FWCI (Field Weighted Citation Impact)
7
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Audio-only backoff in audio-visual speech recognition system

Jonathan H. Connell

Journal:   The Journal of the Acoustical Society of America Year: 2009 Vol: 125 (6)Pages: 4109-4109
JOURNAL ARTICLE

Audio visual speech recognition

Robert L. Beadles

Journal:   The Journal of the Acoustical Society of America Year: 1990 Vol: 87 (5)Pages: 2274-2274
BOOK-CHAPTER

Speech Recognition, Audio-Visual

Gerasimos Potamianos

Elsevier eBooks Year: 2006 Pages: 800-805
© 2026 ScienceGate Book Chapters — All rights reserved.