Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Jinyu Liu; Ying Cheng; Yuejie Zhang; Rui-Wei Zhao; Rui Feng

doi:10.1109/icme52920.2022.9859802

ScienceGate Book Chapters

JOURNAL ARTICLE

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Jinyu Liu Ying Cheng Yuejie Zhang Rui-Wei Zhao Rui Feng

Year: 2022 Journal: 2022 IEEE International Conference on Multimedia and Expo (ICME) Pages: 1-6

DOI: 10.1109/icme52920.2022.9859802

Get Full-Text PDF Get Analytical Report

Abstract

Visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn video representations by utilizing contrastive learning or designing specific pretext tasks. However, some models are likely to focus on the background, which is unimportant for learning video representations. To alleviate this problem, we propose a new view called long-range residual frame to obtain more motion-specific information. Based on this, we propose the Motion-Contrastive Perception Network (MCPNet), which consists of two branches, namely, Motion Information Perception (MIP) and Contrastive Instance Perception (CIP), to learn generic video representations by focusing on the changing areas in videos. Specifically, the MIP branch aims to learn fine-grained motion features, and the CIP branch performs contrastive learning to learn overall semantics information for each instance. Experiments on two benchmark datasets UCF-101 and HMDB-51 show that our method outperforms current state-of-the-art visual-only self-supervised approaches.

Keywords:

Computer science Artificial intelligence Motion (physics) Feature learning Semantics (computer science) Representation (politics) Perception Benchmark (surveying) Focus (optics) Frame (networking) Machine learning Pattern recognition (psychology)

Metrics

Cited By

0.07

FWCI (Field Weighted Citation Impact)

Refs

0.23

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Abstract

Metrics

Citation History

Topics

Related Documents

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Video Motion Perception for Self-supervised Representation Learning

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Multitask Contrastive Learning for Self-Supervised Video Representation

Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound