Multi-level Multi-modal Feature Fusion for Action Recognition in Videos

Xinghang Hu; Yanli Ji; Gedamu Alemu Kumie

doi:10.1145/3552458.3556449

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-level Multi-modal Feature Fusion for Action Recognition in Videos

Xinghang Hu Yanli Ji Gedamu Alemu Kumie

Year: 2022 Pages: 25-33

DOI: 10.1145/3552458.3556449

Get Full-Text PDF Get Analytical Report

Abstract

Several multi-modal feature fusion approaches have been proposed in recent years in order to improve action recognition in videos. These approaches do not take full advantage of the multi-modal information in the videos, since they are biased towards a single modality or treat modalities separately. To address the multi-modal problem, we propose a Multi-Level Multi-modal feature Fusion (MLMF) for action recognition in videos. The MLMF projects each modality to shared and specific feature spaces. According to the similarity between the two modal shared features space, we augment the features in the specific feature space. As a result, the fused features not only incorporate the unique characteristics of the two modalities, but also explicitly emphasize their similarities. Moreover, the video's action segments differ in length, so the model needs to consider different-level feature ensembling for fine-grained action recognition. The optimal multi-level unified action feature representation is achieved by aggregating features at different levels. Our approach is evaluated in the EPIC-KITCHEN 100 dataset, and achieved encouraging results of action recognition in videos.

Keywords:

Modal Modality (human–computer interaction) Feature (linguistics) Computer science Artificial intelligence Action (physics) Pattern recognition (psychology) Representation (politics) Modalities Feature vector Action recognition Space (punctuation)

Metrics

Cited By

0.12

FWCI (Field Weighted Citation Impact)

Refs

0.39

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Gait Recognition and Analysis

Physical Sciences → Engineering → Biomedical Engineering

Diabetic Foot Ulcer Assessment and Management

Health Sciences → Medicine → Endocrinology, Diabetes and Metabolism

Multi-level Multi-modal Feature Fusion for Action Recognition in Videos

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-level Fusion for Multi-modal Human Action Recognition

Human Action Recognition Based On Multi-level Feature Fusion

Human Action Recognition Based on Multi-level Feature Fusion

Enhanced multi-modal emotion recognition using the feature level fusion

Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos