JOURNAL ARTICLE

LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Abstract

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past.Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turn-or chunk-basis.Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect.However, early fusion is known to be more effective in many other multimodal recognition tasks.We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification.This strategy also allows for a combined feature-space optimization which will be discussed herein.The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

Keywords:
Computer science Feature (linguistics) Modality (human–computer interaction) Artificial intelligence Speech recognition Pattern recognition (psychology) Voting Frame (networking) Feature vector Emotion recognition Machine learning

Metrics

49
Cited By
3.69
FWCI (Field Weighted Citation Impact)
16
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Enhanced multi-modal emotion recognition using the feature level fusion

Aziguli WulamuYuheng WuXin LiuYao ZhangJinghan XuYang Zhang

Journal:   Engineering Applications of Artificial Intelligence Year: 2025 Vol: 162 Pages: 112447-112447
JOURNAL ARTICLE

Multi-Modal Emotion Recognition Fusing Video and Audio

Chao XuPu-Feng DuZhiyong FengZhaopeng MengTianyi CaoCaichao Dong

Journal:   Applied Mathematics & Information Sciences Year: 2013 Vol: 7 (2)Pages: 455-462
JOURNAL ARTICLE

Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition

Weizhi NieYan YanDan SongKun Wang

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 80 (11)Pages: 16205-16214
© 2026 ScienceGate Book Chapters — All rights reserved.