JOURNAL ARTICLE

Cross-culture Continuous Emotion Recognition with Multimodal Features

Abstract

Automatic emotion recognition is a challenging task that can make great impact on improving natural human-computer interactions. In this paper, we present our automatic prediction of dimensional emotional state for Cross-cultural Emotion Sub-Challenge (AVEC 2018), which uses multi-features and fusion across visual, audio and text modalities. Single-feature predictions are modeled at first with support vector regression (SVR). The multimodal fusion of these modalities is then performed with a multiple linear regression model. Besides the baseline features, we extract one-gram and two-gram features from text, and some types of convolutional neural networks (CNNs) feature from video. Our multimodal fusion reached CCC=0.599 on the development set for arousal, 0.617 for valence and 0.289 for likability.

Keywords:
Computer science Modalities Convolutional neural network Artificial intelligence Support vector machine Pattern recognition (psychology) Emotion recognition Feature (linguistics) Valence (chemistry) Feature extraction Modality (human–computer interaction) Affective computing Speech recognition Emotion classification Arousal

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
21
Refs
0.25
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.