JOURNAL ARTICLE

Feature extraction using multimodal convolutional neural networks for visual speech recognition

Abstract

This article addresses the problem of continuous speech recognition from visual information only, without exploiting any audio signal. Our approach combines a video camera and an ultrasound imaging system for monitoring simultaneously the speaker's lips and the movement of the tongue. We investigate the use of convolutional neural networks (CNN) to extract visual features directly from the raw ultrasound and video images. We propose different architectures among which a multimodal CNN processing jointly the two visual modalities. Combined with an HMM-GMM decoder, the CNN-based approach outperforms our previous baseline based on Principal Component Analysis. Importantly, the recognition accuracy is only 4% lower than the one obtained when decoding the audio signal, which makes it a good candidate for a practical visual speech recognition system.

Keywords:
Computer science Convolutional neural network Artificial intelligence Feature extraction Speech recognition Feature (linguistics) Pattern recognition (psychology) Hidden Markov model Decoding methods Computer vision

Metrics

59
Cited By
4.25
FWCI (Field Weighted Citation Impact)
30
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Indoor and Outdoor Localization Technologies
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Audio-Visual Speech Recognition using 3D Convolutional Neural Networks

Ceren BelhanDamla FikirdanisOvgu CimenPelin PasinliZeynep AkgünZeynep Ovgu YayciMehmet Türkan

Journal:   2021 Innovations in Intelligent Systems and Applications Conference (ASYU) Year: 2021 Pages: 1-5
JOURNAL ARTICLE

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Jen-Cheng HouSyu‐Siang WangYing-Hui LaiYu TsaoHsiu-Wen ChangHsin‐Min Wang

Journal:   IEEE Transactions on Emerging Topics in Computational Intelligence Year: 2018 Vol: 2 (2)Pages: 117-128
JOURNAL ARTICLE

Speech Recognition Using Convolutional Neural Networks

D. NagajyothiP. Siddaiah

Journal:   International Journal of Engineering & Technology Year: 2018 Vol: 7 (4.6)Pages: 133-137
BOOK-CHAPTER

Speech feature extraction using neural networks

Mahesan NiranjanF. Fallside

Lecture notes in computer science Year: 1990 Pages: 197-204
© 2026 ScienceGate Book Chapters — All rights reserved.