JOURNAL ARTICLE

Multi-modal Transformer for Indoor Human Action Recognition

Jeonghyeok DoMunchurl Kim

Year: 2022 Journal:   2022 22nd International Conference on Control, Automation and Systems (ICCAS) Pages: 1155-1160

Abstract

Indoor human action recognition is used in various fields. For example, we can use it to recognize exercise movements in the fitness industry, which can significantly help improve the health of modern people. With the development of sensors, it has become possible to easily acquire multiple data modalities of RGB, IR, depth, and skeleton in the same scene. Since each data modality is complementary, proper fusion is beneficial in recognizing human action. However, existing studies have limitations in utilizing the advantages of each modality. Therefore, we propose a Multi-Modal Transformer (MMT) to use RGB and skeleton data simultaneously in this work. Using the transformer-based structure, MMT can capture the correlation between non-local joints in skeleton data modality. In addition, MMT does not require additional training phases or multiple trained networks as the number of people on the scene changes. In experiments on public benchmark datasets, MMT shows comparable results using only eight input frames.

Keywords:
Computer science Modality (human–computer interaction) Transformer Artificial intelligence Modalities RGB color model Modal Action recognition Computer vision Benchmark (surveying) Sensor fusion Pattern recognition (psychology) Engineering Class (philosophy) Voltage

Metrics

2
Cited By
0.14
FWCI (Field Weighted Citation Impact)
34
Refs
0.46
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Cmf-transformer: cross-modal fusion transformer for human action recognition

Jun WangLimin XiaXin Wen

Journal:   Machine Vision and Applications Year: 2024 Vol: 35 (5)
BOOK-CHAPTER

Multi-level Fusion for Multi-modal Human Action Recognition

Ziliang GanLei JinXiaojuan Wang

Lecture notes in electrical engineering Year: 2025 Pages: 132-142
BOOK-CHAPTER

Hybrid Multi-modal Fusion for Human Action Recognition

Bassem SeddikSami GazzahNajoua Essoukri Ben Amara

Lecture notes in computer science Year: 2017 Pages: 201-209
© 2026 ScienceGate Book Chapters — All rights reserved.