LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Matthias Wimmer; Björn W. Schuller; Dejan Arsić; Gerhard Rigoll; Bernd Radig

doi:10.5220/0001082801450151

ScienceGate Book Chapters

JOURNAL ARTICLE

LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Matthias Wimmer Björn W. Schuller Dejan Arsić Gerhard Rigoll Bernd Radig

Year: 2008 Pages: 145-151

DOI: 10.5220/0001082801450151

Get Full-Text PDF Get Analytical Report

Abstract

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past.Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turn-or chunk-basis.Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect.However, early fusion is known to be more effective in many other multimodal recognition tasks.We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification.This strategy also allows for a combined feature-space optimization which will be discussed herein.The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

Keywords:

Computer science Feature (linguistics) Modality (human–computer interaction) Artificial intelligence Speech recognition Pattern recognition (psychology) Voting Frame (networking) Feature vector Emotion recognition Machine learning

Metrics

Cited By

3.69

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Abstract

Metrics

Citation History

Topics

Related Documents

Enhanced multi-modal emotion recognition using the feature level fusion

Multi-Modal Emotion Recognition Fusing Video and Audio

Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition

Speech Emotion Recognition Using Multi-Modal Feature Fusion Network

Multi-level feature fusion for group-level emotion recognition