Self-Supervised Video Representation Learning with Meta-Contrastive Network

Yuanze Lin; Xun Guo; Yan Lu

doi:10.1109/iccv48922.2021.00813

ScienceGate Book Chapters

JOURNAL ARTICLE

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Yuanze Lin Xun Guo Yan Lu

Year: 2021 Journal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pages: 8219-8229

DOI: 10.1109/iccv48922.2021.00813

Get Full-Text PDF Get Analytical Report

Abstract

Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of our method. For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be more specific, with R(2+1)D backbone, MCN achieves Top-1 accuracies of 84.8% and 54.5% for video action recognition, as well as 52.5% and 23.7% for video retrieval.

Keywords:

Computer science Leverage (statistics) Artificial intelligence Generalization Machine learning Feature learning Task (project management) Domain adaptation Representation (politics) Natural language processing Pattern recognition (psychology) Classifier (UML)

Metrics

Cited By

2.28

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Abstract

Metrics

Citation History

Topics

Related Documents

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

Multitask Contrastive Learning for Self-Supervised Video Representation

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound

Cut-in maneuver detection with self-supervised contrastive video representation learning