This paper addresses the Bipolar Disorder sub-challenge of the Audio/Visual Emotion recognition Challenge (AVEC) 2018, where the objective is to classify patients suffering from bipolar disorder into states of remission, hypo-mania, and mania, from audio-visual recordings of structured interviews. To this end, we propose 'turbulence features' to capture sudden, erratic changes in feature contours from audio and visual modalities, and demonstrate their efficacy for the task at hand. We introduce Fisher Vector encoding of ComParE low level descriptors (LLDs) and demonstrate that these features are viable for screening of bipolar disorder from speech. We also perform several experiments with standard feature sets from the OpenSmile toolkit as well as multi-modal fusion. The best result achieved on the test set is a UAR = 57.41%, which matches the best result published as the official baseline.
Elvan ÇiftçiHeysem KayaHüseyin GüleçAlbert Ali Salah
Yan LiLe YangHaifeng ChenDongmei JiangHichem Sahli
Monica GoriMaria Bianca AmadeoAndrea EscelsiorGiuseppe EspositoAlberto InuggiRiccardo GuglielmoLuis PolenaJuxhin BodeBeatriz Pereira da SilvaMario AmoreGianluca Serafini
Alexey KarpovAndrey RonzhinIrina KipyatkovaAndrey RonzhinVasilisa VerkhodanovaAnton SavelievMiloš Železný
Guangwei LiXuenan XuMengyue WuKai Yu