Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network

Qi Fan; Xiaoshan Yang; Changsheng Xu

doi:10.1145/3474085.3475647

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network

Qi Fan Xiaoshan Yang Changsheng Xu

Year: 2021 Pages: 1074-1083

DOI: 10.1145/3474085.3475647

Get Full-Text PDF Get Analytical Report

Abstract

Recognizing human emotions from videos has attracted significant attention in numerous computer vision and multimedia applications, such as human-computer interaction and health care. It aims to understand the emotional response of humans, where candidate emotion categories are generally defined by specific psychological theories. However, with the development of psychological theories, emotion categories become increasingly diverse and fine-grained, samples are also increasingly difficult to collect. In this paper, we investigate a new task of zero-shot video emotion recognition, which aims to recognize rare unseen emotions. Specifically, we propose a novel multimodal protagonist-aware transformer network, which is composed of two branches: one is equipped with a novel dynamic emotional attention mechanism and a visual transformer to learn better visual representations; the other is an acoustic transformer for learning discriminative acoustic representations. We manage to align the visual and acoustic representations with semantic embeddings of fine-grained emotion labels through jointly mapping them into a common space under a noise contrastive estimation objective. Extensive experimental results on three datasets demonstrate the effectiveness of the proposed method.

Keywords:

Discriminative model Computer science Transformer Emotion recognition Semantic space Speech recognition Artificial intelligence Human–computer interaction

Metrics

Cited By

1.83

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network

Abstract

Metrics

Citation History

Topics

Related Documents

Topic and Style-aware Transformer for Multimodal Emotion Recognition

Multimodal Spatiotemporal Semisupervised Transformer Network for Video-Based Group-Level Emotion Recognition

ZeroES: Zero-Shot Ensemble for Open-Vocabulary Video Emotion Recognition with Large Multimodal Models

A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding