Multimodal Egocentric Activity Recognition Using Multi-stream CNN

Javed Imran; Balasubramanian Raman

doi:10.1145/3293353.3293363

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Egocentric Activity Recognition Using Multi-stream CNN

Javed Imran Balasubramanian Raman

Year: 2018 Pages: 1-8

DOI: 10.1145/3293353.3293363

Get Full-Text PDF Get Analytical Report

Abstract

Egocentric activity recognition (EAR) is an emerging area in the field of computer vision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition using visual (RGB videos) and sensor stream (accelerometer, gyroscope, etc.). In order to effectively capture the spatio-temporal information contained in RGB videos, two types of modalities are extracted from visual data: Approximate Dynamic Image (ADI) and Stacked Difference Image (SDI). These image-based representations are generated both at clip level as well as entire video level, and are then utilized to finetune a pretrained 2D-CNN called MobileNet, which is specifically designed for mobile vision applications. Similarly for sensor data, each training sample is divided into three segments, and a deep 1D-CNN network is trained (corresponding to each type of sensor stream) from scratch. During testing, the softmax scores of all the streams (visual + sensor) are combined by late fusion. The experiments performed on multimodal egocentric activity dataset demonstrates that our proposed approach can achieve state-of-the-art results, outperforming the current best handcrafted and deep learning based techniques.

Keywords:

Computer science Artificial intelligence Pattern recognition (psychology) Computer vision Speech recognition

Metrics

Cited By

0.14

FWCI (Field Weighted Citation Impact)

Refs

0.51

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Context-Aware Activity Recognition Systems

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Gait Recognition and Analysis

Physical Sciences → Engineering → Biomedical Engineering

Multimodal Egocentric Activity Recognition Using Multi-stream CNN

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Knowledge-driven Egocentric Multimodal Activity Recognition

Few-shot Egocentric Multimodal Activity Recognition

Multi-modal egocentric activity recognition using multi-kernel learning

Egocentric activity recognition with multimodal fisher vector