3D human pose estimation in video with temporal and spatial transformer

Sha Peng; Jiwei Hu

doi:10.1117/12.2681195

ScienceGate Book Chapters

JOURNAL ARTICLE

3D human pose estimation in video with temporal and spatial transformer

Sha Peng Jiwei Hu

Year: 2023

DOI: 10.1117/12.2681195

Get Full-Text PDF Get Analytical Report

Abstract

Previous works on 3D human pose estimation have concentrated on predicting the 3D pose of the human body from a single image, ignoring correlation between adjacent frames in video. We design a transformer network structure that can extract video temporal information, and enhance the accuracy of human pose prediction by encoding relative position with temporal fusion transformer structure to enhance local feature learning capability. On Human3.6M, we quantitatively and qualitatively analyze our method. Research suggests that our TSFormer achieves state-of-the-art performance.

Keywords:

Pose Artificial intelligence Computer science Transformer Computer vision Pattern recognition (psychology) 3D pose estimation Feature extraction Encoding (memory) Engineering Voltage

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.05

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Gait Recognition and Analysis

Physical Sciences → Engineering → Biomedical Engineering

3D human pose estimation in video with temporal and spatial transformer

Abstract

Metrics

Topics

Related Documents

Spatial-temporal-spectral transformer for 3D human pose estimation

Three-Dimensional Human Pose Estimation with Spatial–Temporal Interaction Enhancement Transformer

Multi-scale spatial-temporal transformer for 3D human pose estimation

STRFormer: Spatial–Temporal–ReTemporal Transformer for 3D human pose estimation

Fusion with temporal and spatial attention for video human pose estimation