Abstract

This paper proposes a real time sociometric system to analyze social behavior from audio-visual recordings of two-person face-to-face conversations in English. The novelty of the proposed system lies in this automatic inference of ten social indicators in real time. The system comprises of a Microsoft kinect device that captures RGB and depth data to compute visual cues and microphones to capture speech cues from an on-going conversation. With these non-verbal cues as features, machine learning algorithms are implemented in the system to extract multiple indicators of social behavior including empathy, confusion and politeness. The system is trained and tested on two carefully annotated corpora that consist of two person dialogs. Based on leave-one-out cross-validation test, the accuracy range of developed algorithms to infer social behaviors is 50% - 86% for audio corpus, and 62% - 92% for audio-visual corpus.

Keywords:
Computer science Conversation Speech recognition Artificial intelligence Natural language processing Novelty Phrase Inference Politeness Face (sociological concept) Human–computer interaction Psychology Communication

Metrics

8
Cited By
1.26
FWCI (Field Weighted Citation Impact)
26
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.