JOURNAL ARTICLE

Automated Screening for Bipolar Disorder from Audio/Visual Modalities

Abstract

This paper addresses the Bipolar Disorder sub-challenge of the Audio/Visual Emotion recognition Challenge (AVEC) 2018, where the objective is to classify patients suffering from bipolar disorder into states of remission, hypo-mania, and mania, from audio-visual recordings of structured interviews. To this end, we propose 'turbulence features' to capture sudden, erratic changes in feature contours from audio and visual modalities, and demonstrate their efficacy for the task at hand. We introduce Fisher Vector encoding of ComParE low level descriptors (LLDs) and demonstrate that these features are viable for screening of bipolar disorder from speech. We also perform several experiments with standard feature sets from the OpenSmile toolkit as well as multi-modal fusion. The best result achieved on the test set is a UAR = 57.41%, which matches the best result published as the official baseline.

Keywords:
Bipolar disorder Mania Feature (linguistics) Computer science Set (abstract data type) Modalities Speech recognition Pattern recognition (psychology) Artificial intelligence Audio visual Psychology Cognition Psychiatry Multimedia

Metrics

33
Cited By
4.52
FWCI (Field Weighted Citation Impact)
37
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.