JOURNAL ARTICLE

Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training

Abstract

This paper presents our pioneering effort in addressing a new and realistic scenario in multi-modal dialogue systems called Multi-modal Real-time Emotion Pre-recognition in Conversations (MREPC). The objective is to predict the emotion of a forthcoming target utterance that is highly likely to occur. We believe that this task can enhance the dialogue system's understanding of the interlocutor's state of mind, enabling it to prepare an appropriate response in advance. However, addressing MREPC poses the following challenges:1) Previous studies on emotion elicitation typically focus on textual modality and perform sentiment forecasting within a fixed contextual scenario. 2) Previous studies on multi-modal emotion recognition aim to predict the emotion of existing utterances, making it difficult to extend these approaches to MREPC due to the absence of the target utterance. To tackle these challenges, we construct two benchmark multi-modal datasets for MREPC and propose a task-specific multi-modal contrastive pre-training approach. This approach leverages large-scale unlabeled multi-modal dialogues to facilitate emotion pre-recognition for potential utterances of specific target speakers. Through detailed experiments and extensive analysis, we demonstrate that our proposed multi-modal contrastive pre-training architecture effectively enhances the performance of multi-modal real-time emotion pre-recognition in conversations.

Keywords:
Computer science Utterance Modal Focus (optics) Benchmark (surveying) Task (project management) Natural language processing Modality (human–computer interaction) Artificial intelligence Speech recognition Emotion recognition Engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
33
Refs
0.18
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Humor Studies and Applications
Social Sciences →  Psychology →  Social Psychology

Related Documents

JOURNAL ARTICLE

Multi-Modal Pre-Training for Automated Speech Recognition

David M. ChanShalini GhoshDebmalya ChakrabartyBjörn Hoffmeister

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Vol: 2 Pages: 246-250
JOURNAL ARTICLE

Cross-Modal Contrastive Pre-Training for Few-Shot Skeleton Action Recognition

Mingqi LuSiyuan YangXiaobo LuJun Liu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2024 Vol: 34 (10)Pages: 9798-9807
JOURNAL ARTICLE

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Bandi Dixitha

Journal:   INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT Year: 2025 Vol: 09 (06)Pages: 1-9
© 2026 ScienceGate Book Chapters — All rights reserved.