JOURNAL ARTICLE

Audio-Visual Speech Recognition based on Machine Learning approach

Pinki RoySaswati Debnath

Year: 2018 Journal:   International Journal of Advanced Intelligence Paradigms Vol: 1 (1)Pages: 1-1   Publisher: Inderscience Publishers

Abstract

Audio-visual speech recognition by machine plays an important role when research in automatic speech recognition reaches its highest performance. Audio alone also gives good performance, but adding the visual information potentially gives more convenient recognition system when an audio signal degrades in a noisy environment and may vary because of the environmental channel. This paper proposes an audio-visual automatic speech recognition (AV-ASR) system based on machine learning approaches. Visual information is captured from lip contour. Pseudo Zernike moments (PZMs) and 19th order Mel frequency cepstral coefficients (MFCCs) are extracted to obtain visual information and audio feature respectively. Machine learning approach, artificial neural networks (ANN) and support vector machines (SVM) are used to recognise speech for audio and visual modality. After the individual recognition of two systems, a combined decision is taken. This paper also evaluates the individual performance of both audio and visual speech recognition by machine learning approach.

Keywords:
Computer science Speech recognition Artificial intelligence Support vector machine Audio visual Audio mining Modality (human–computer interaction) Feature (linguistics) Visualization Audio signal Mel-frequency cepstrum Pattern recognition (psychology) Feature extraction Machine learning Voice activity detection Speech processing Speech coding Multimedia

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.29
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Audio-visual speech recognition based on machine learning approach

Saswati DebnathPinki Roy

Journal:   International Journal of Advanced Intelligence Paradigms Year: 2022 Vol: 21 (3/4)Pages: 211-211
BOOK-CHAPTER

Multimodal Learning of Audio-Visual Speech Recognition with Liquid State Machine

Xuhu YuLei WangChanghao ChenJunbo TieShasha Guo

Communications in computer and information science Year: 2023 Pages: 552-563
JOURNAL ARTICLE

A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach

Kah Phooi SengLi-Minn AngChien Shing Ooi

Journal:   IEEE Transactions on Affective Computing Year: 2016 Vol: 9 (1)Pages: 3-13
© 2026 ScienceGate Book Chapters — All rights reserved.