JOURNAL ARTICLE

Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning

Abstract

Discovering social relations, e.g., kinship, friendship, etc., from visual contents can make machines better interpret the behaviors and emotions of human beings. Existing studies mainly focus on recognizing social relations from still images while neglecting another important media--video. On one hand, the actions and storylines in videos provide more important cues for social relation recognition. On the other hand, the key persons may appear at arbitrary spatial-temporal locations, even not in one same image from beginning to the end. To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. For the spatial representation, we not only adopt a temporal segment network to learn global action and scene information, but also design a Triple Graphs model to capture visual relations between persons and objects. For the temporal domain, we propose a Pyramid Graph Convolutional Network to perform temporal reasoning with multi-scale receptive fields, which can obtain both long-term and short-term storylines in videos. By this means, MSTR can comprehensively explore the multi-scale actions and storylines in spatial-temporal dimensions for social relation reasoning in videos. Extensive experiments on a new large-scale Video Social Relation dataset demonstrate the effectiveness of the proposed framework.

Keywords:
Computer science Spatial relation Relation (database) Artificial intelligence Pyramid (geometry) Visual reasoning Representation (politics) Scale (ratio) Graph Theoretical computer science Data mining Mathematics Geography Cartography

Metrics

77
Cited By
5.88
FWCI (Field Weighted Citation Impact)
43
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Spatial–Temporal Relation Reasoning for Action Prediction in Videos

Xinxiao WuRuiqi WangJingyi HouHanxi LinJiebo Luo

Journal:   International Journal of Computer Vision Year: 2021 Vol: 129 (5)Pages: 1484-1505
JOURNAL ARTICLE

Multi-Scale Graph Reasoning Model for Video Social Relation Recognition

飞 许

Journal:   Computer Science and Application Year: 2021 Vol: 11 (02)Pages: 423-434
BOOK-CHAPTER

Multi-stream Fusion Model for Social Relation Recognition from Videos

Jinna LvWu LiuLili ZhouBin WuHuadóng Ma

Lecture notes in computer science Year: 2018 Pages: 355-368
© 2026 ScienceGate Book Chapters — All rights reserved.