Zhongliang WeiChang GeLijun ZhuJinmin Ye
Speech Emotion Recognition (SER) has become a pivotal topic within affective computing and human–computer interaction, where the core challenge lies in jointly capturing both the time–frequency structure and the semantic context of speech. To overcome the shortcomings of current approaches—including single-view feature representation, the lack of emotional discriminability in self-supervised models, and suboptimal complementarity among fusion strategies—this study proposes a parallel dual-branch fusion architecture for SER. The framework consists of a wav2vec 2.0 branch and a CNN–Transformer spectrogram branch, which respectively extract contextual semantic representations from raw waveforms and explicit time–frequency features from spectrograms. A logistic regression fusion mechanism is further introduced at the decision level to achieve adaptive weighting in the probability space, thereby fully leveraging the complementary strengths of the two feature types. Experiments carried out on the RAVDESS audio subset showed that the proposed model surpassed several mainstream baselines (e.g., CNN-n-GRU and RELUEM), achieving 92.7% accuracy and 92.2% Macro-F1, with an average improvement of about 3.2 percentage points. The layer unfreezing studies confirmed the effectiveness of partial fine-tuning for transferring pretrained features, while the comparative experiments on fusion strategies validated the superiority of probability-space fusion in both performance and stability. Overall, the proposed framework achieves simultaneous gains in accuracy and robustness through feature complementarity, branch decoupling, and lightweight fusion. Future work will explore cross-lingual generalization, multimodal extensions, lightweight deployment, and dynamic emotion modeling, contributing to more efficient affective computing and intelligent interaction systems.
Darshana PrisayadTharindu FernandoSridha SridharanSimon DenmanClinton Fookes
Sanghyun LeeDavid K. HanHanseok Ko
Shaode YuJiajian MengBing ZhuQiurui Sun
Le WangYuchen ChangKaiping Wang
Carlos Ortego-ResaIgnacio López MorenoDaniel RamosJoaquín González-Rodríguez