Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition

Akriti Bahal

doi:10.9790/0661-0513136

ScienceGate Book Chapters

JOURNAL ARTICLE

Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition

Akriti Bahal

Year: 2012 Journal: IOSR Journal of Computer Engineering Vol: 5 (1)Pages: 31-36 Publisher: International Organization Of Scientific Research (IOSR)

DOI: 10.9790/0661-0513136

Get Full-Text PDF Get Analytical Report

Abstract

Major developments in the field of finding more natural ways of interacting with computers have been taking place.The clear focus lies on making technology more approachable to people.The concept that computers can comprehend our various gestures by eyes, voices, touch and our different movements to interact is called the Natural User Interface (NUI).Today, many of these elements are available in mobile phones, PCs, and in other devices.Speech technologies particularly play a substantial role in this evolution.Significant advancement in automatic speech recognition (ASR) for well defined applications like dictation and medium vocabulary transaction processing assignments in comparatively controlled environments have been made.But, automatic speech recognition still has to reach a level needed for speech to become a completely pervasive user interface because even in clean acoustic surroundings, the state of ASR system performance falls behind human speech perception.Visual speech recognition, however, is a promising source of extra speech information and it has successfully exhibited to enhance noise robustness of automatic speech recognizers, thereby promising to expand their usability in the human computer interaction.In this paper, the main components of audio-visual speech recognition, i.e., the audio and the video components are discussed, along with the latest advancements made in this field.Further, the research goes beyond the recent advancements and discusses the future scope of audio video speech recognition and mentions some likely future developments, evaluating each on the basis of its performance.Graphs are plotted based on experiments to depict the performance improvements from audioonly ASR to audio-video ASR, along with its expected performance level in future.

Keywords:

Computer science Speech recognition Audio visual Audio mining Viseme Acoustic model Artificial intelligence Speech processing Multimedia

Metrics

Cited By

0.24

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-only backoff in audio-visual speech recognition system

Audio visual speech recognition

Speech Recognition, Audio-Visual

Automatic speech recognition using audio visual cues

Audio-Visual and Visual-Only Speech and Speaker Recognition