Learning Priors of Human Motion With Vision Transformers

Placido Falqueto; Alberto Sanfeliu; Luigi Palopoli; Daniele Fontanelli

doi:10.1109/compsac61105.2024.00060

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Priors of Human Motion With Vision Transformers

Placido Falqueto Alberto Sanfeliu Luigi Palopoli Daniele Fontanelli

Year: 2024 Pages: 382-389

DOI: 10.1109/compsac61105.2024.00060

Get Full-Text PDF Get Analytical Report

Abstract

A clear understanding of where humans move in a scenario, their usual paths and speeds, and where they stop, is very important for different applications, such as mobility studies in urban areas or robot navigation tasks within human-populated environments. We propose in this article, a neural architecture based on Vision Transformers (ViTs) to provide this information. This solution can arguably capture spatial correlations more effectively than Convolutional Neural Networks (CNNs). In the paper, we describe the methodology and proposed neural architecture and show the experiments' results with a standard dataset. We show that the proposed ViT architecture improves the metrics compared to a method based on a CNN.

Keywords:

Computer vision Prior probability Artificial intelligence Computer science Human motion Transformer Motion (physics) Bayesian probability Engineering Electrical engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Learning Priors of Human Motion With Vision Transformers

Abstract

Metrics

Topics

Related Documents

Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Learning Imbalanced Data with Vision Transformers

Learning motion blur robust vision transformers for real-time UAV tracking

HeroMaker: Human-centric Video Editing with Motion Priors