This paper proposes a real time sociometric system to analyze social behavior from audio-visual recordings of two-person face-to-face conversations in English. The novelty of the proposed system lies in this automatic inference of ten social indicators in real time. The system comprises of a Microsoft kinect device that captures RGB and depth data to compute visual cues and microphones to capture speech cues from an on-going conversation. With these non-verbal cues as features, machine learning algorithms are implemented in the system to extract multiple indicators of social behavior including empathy, confusion and politeness. The system is trained and tested on two carefully annotated corpora that consist of two person dialogs. Based on leave-one-out cross-validation test, the accuracy range of developed algorithms to infer social behaviors is 50% - 86% for audio corpus, and 62% - 92% for audio-visual corpus.
Umer RasheedYasir TahirShoko DauwelsJustin DauwelsDaniël ThalmannNadia Magnenat‐Thalmann
Fotios TalantzisAristodemos PnevmatikakisLazaros Polymenakos
Jayesh JaiswalSarvesh BhaspaleOmkar HalpatraoChittosh MeshramKrishna GuptaSwati B.Patil
Jayesh JaiswalSarvesh BhaspaleOmkar HalpatraoChittosh MeshramKrishna GuptaSwati B.Patil
Souheil Ben-YacoubJ. LuttinK. JonssonJiřı́ MatasJosef Kittler