CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis

Tingzhen Zhao; Ming Kong; Tian Liang; Qiang Zhu; Kun Kuang; Fei Wu

doi:10.1145/3591106.3592296

ScienceGate Book Chapters

JOURNAL ARTICLE

CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis

Tingzhen Zhao Ming Kong Tian Liang Qiang Zhu Kun Kuang Fei Wu

Year: 2023 Pages: 622-626

DOI: 10.1145/3591106.3592296

Get Full-Text PDF Get Analytical Report

Abstract

Multi-modal Sentiment Analysis (MSA) is a hotspot of multi-modal fusion. To make full use of the correlation and complementarity between modalities in the process of fusing multi-modal data, we propose a two-stage framework of Contrastive Language-Audio Pre-training (CLAP) for the MSA task: 1) Making contrastive pre-training on an unlabeled large-scaled external data to yield better single-modal representations; 2) Adopting a Transformer-based multi-modal fusion module, to achieve further single-modal feature optimization and sentiment prediction via the task-driven training process. Our work fully demonstrates the importance and necessity of core elements such as pre-training, contrastive learning, and representation learning for the MSA task and significantly outperforms existing methods on two well-recognized MSA benchmarks.

Keywords:

Computer science Modal Artificial intelligence Natural language processing Sentiment analysis Speech recognition

Metrics

Cited By

1.79

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Modal Contrastive Pre-training for Recommendation

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

MGeo: Multi-Modal Geographic Language Model Pre-Training

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training

Exploring Neural Audio Codec-based Contrastive Language-Audio Pre-training