JOURNAL ARTICLE

Knowledge-driven Egocentric Multimodal Activity Recognition

Yi HuangXiaoshan YangJunyu GaoJitao SangChangsheng Xu

Year: 2020 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 16 (4)Pages: 1-133   Publisher: Association for Computing Machinery

Abstract

Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity of different modalities. However, since high-dimensional videos contain rich high-level semantic information while low-dimensional sensor signals describe simple motion patterns of the wearer, the large modality gap between the videos and the sensor signals raises a challenge for fusing the raw data. Moreover, the lack of large-scale egocentric multimodal datasets due to the cost of data collection and annotation processes makes another challenge for employing complex deep learning models. To jointly deal with the above two challenges, we propose a knowledge-driven multimodal activity recognition framework that exploits external knowledge to fuse multimodal data and reduce the dependence on large-scale training samples. Specifically, we design a dual-GCLSTM (Graph Convolutional LSTM) and a multi-layer GCN (Graph Convolutional Network) to collectively model the relations among activities and intermediate objects. The dual-GCLSTM is designed to fuse temporal multimodal features with top-down relation-aware guidance. In addition, we apply a co-attention mechanism to adaptively attend to the features of different modalities at different timesteps. The multi-layer GCN aims to learn relation-aware classifiers of activity categories. Experimental results on three publicly available egocentric multimodal datasets show the effectiveness of the proposed model.

Keywords:
Computer science Exploit Modalities Artificial intelligence Wearable computer Activity recognition Multimodal learning Machine learning Modality (human–computer interaction) Fuse (electrical) Graph Deep learning Convolutional neural network Human–computer interaction Pattern recognition (psychology)

Metrics

25
Cited By
1.68
FWCI (Field Weighted Citation Impact)
64
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Context-Aware Activity Recognition Systems
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.