JOURNAL ARTICLE

Audio Visual Multimodal Classification of Bipolar Disorder Episodes

Abstract

Bipolar disorder is a highly prevalent and complex medical syndrome of multifactorial origin. In this paper, we propose an audio visual multi-modal framework for classifying the different episodes (Remission, Hypomania or Mania) of bipolar disorder. To represent the temporal dynamics of face and body poses, we propose to compute the Motion History Histogram (MHH) of facial landmarks as well as Histogram of Displacement Range (HDR) of body keypoints as the visual features. For audio features, functionals of the low level descriptors (LLDs) of speech are computed as global features. Each feature stream is input into a Convolutional Neural Network (CNN) to get the initial classification result of the patient's episode, which are then concatenated into a vector and fed into a random forest for the final classification. Experimental results on the development set of Audio Visual Emotion Challenge (AVEC2018) Bipolar Disorder Sub-Challenge demonstrate that the proposed visual features and bipolar disorder classification framework achieve promising results with the unweighted average recall (UAR) reaching 0.749, which is better or comparable with the state of the art results.

Keywords:
Hypomania Bipolar disorder Histogram Computer science Artificial intelligence Pattern recognition (psychology) Convolutional neural network Mania Feature (linguistics) Feature extraction Cognition Psychology Image (mathematics)

Metrics

6
Cited By
0.82
FWCI (Field Weighted Citation Impact)
33
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.