Deep Learning Cross-Modal Learning for Audio-Visual Speech Recognition

Dalal Abdulmohsin Hammood; Hayder jasim habil; Ibtehal Shakir Mahmoud; Effariza Hanafi; Universiti Malaya; Kuala Lumpur

doi:10.63463/kjes1090

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Learning Cross-Modal Learning for Audio-Visual Speech Recognition

Dalal Abdulmohsin Hammood Hayder jasim habil Ibtehal Shakir Mahmoud Effariza Hanafi Universiti Malaya; Kuala Lumpur

Year: 2024 Journal: Kerbala Journal for Engineering Sciences Vol: 4 (1)Pages: 1-14

DOI: 10.63463/kjes1090

Get Full-Text PDF Get Analytical Report

Abstract

The ability to relate information about languages heard through visual and audio data is a crucial aspect of audio-visual speech recognition (AVSR), which has uses in data manipulation for audio-visual correspondence, including AVE-Net and SyncNet. The technique described in this research uses feature disentanglement to simultaneously handle the tasks listed above. By developing cross-modal standard learning methods, this model can transform visual or aural linguistic characteristics into modality-independent representations. AVE-Net and SyncNet can all be performed with the help of such derived linguistic expressions. Furthermore, audio and visual data output can be modified based on the required subject identity and linguistic content information. We do comprehensive trials on various recognition and synthesis tasks on both tasks separately, and that solution can successfully take on both audio-visual learning problems. The system gives great results in the enhanced video with 91.5% with 5 frames, while this will increase with the increase of frames with 99.03% with 15 frames, which is more efficient than the previous methods.

Keywords:

Audio visual Computer science Speech recognition Modal Deep learning Artificial intelligence Natural language processing Multimedia

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.37

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Deep Learning Cross-Modal Learning for Audio-Visual Speech Recognition

Abstract

Metrics

Topics

Related Documents

Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation

Audio-visual speech recognition using deep learning

Cross-modal Deep Learning Applications: Audio-Visual Retrieval

Deep multimodal learning for Audio-Visual Speech Recognition

Attention based Multi Modal Learning for Audio Visual Speech Recognition