TANG Yuhao, MAO Qirong, GAO Lijian
In continuous dimensional emotion recognition,the part of highlighting emotional expression varies in each modality,and different modalities also have different influence on emotional states.To address the problem,by learning modal features and fusing them in a reasonable way,this paper proposes a multimodal dimensional emotion recognition model based on Hierarchical Attention Mechanism(HAM).Frequency attention mechanism is added to the audio modality to learn the context information in frequency domain,and the video features are fused with the audio features by using the multimodal attention mechanism.Then the problem of missing modalities is relieved by using the improved loss function to improve the robustness and emotion recognition performance.Experimental results on public datasets show that compared with methods such as Convolutional Neural Network(CNN) and Long Short Term Memory(LSTM) networks,this method has improved the Concordance Correlation Coefficient(CCC) index,and has higher recognition efficiency.It is applicable to dimensional emotion recognition of large volumes of data.
LU JianZHAO BoZHANG QiLI Xuanfeng
H. TanSheng QinXuanyu ZhaoJin Zeng
刘天宝 Liu Tianbao张凌涛 Zhang Lingtao于文涛 Yu Wentao魏东川 Wei Dongchuan范轶军 Fan Yijun
Mingyan ZhaoCheng ChengLin Feng