Abstract

In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

Keywords:
Computer science Pooling Artificial intelligence Convolutional neural network Deep learning Activity recognition Constraint (computer-aided design) Sensor fusion Pattern recognition (psychology) Machine learning Computer vision

Metrics

76
Cited By
5.02
FWCI (Field Weighted Citation Impact)
32
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Context-Aware Activity Recognition Systems
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering

Related Documents

JOURNAL ARTICLE

Knowledge-driven Egocentric Multimodal Activity Recognition

Yi HuangXiaoshan YangJunyu GaoJitao SangChangsheng Xu

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2020 Vol: 16 (4)Pages: 1-133
JOURNAL ARTICLE

Multi-modal egocentric activity recognition using multi-kernel learning

Mehmet Ali ArabacıFatih ÖzkanElif SürerPeter JančovičAlptekin Temizel

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 80 (11)Pages: 16299-16328
JOURNAL ARTICLE

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition

Yansong TangZian WangJiwen LuJianjiang FengJie Zhou

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2018 Vol: 29 (10)Pages: 3001-3015
© 2026 ScienceGate Book Chapters — All rights reserved.