JOURNAL ARTICLE

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

Abstract

We propose a self-supervised method to learn feature representations from\nvideos. A standard approach in traditional self-supervised methods uses\npositive-negative data pairs to train with contrastive learning strategy. In\nsuch a case, different modalities of the same video are treated as positives\nand video clips from a different video are treated as negatives. Because the\nspatio-temporal information is important for video representation, we extend\nthe negative samples by introducing intra-negative samples, which are\ntransformed from the same anchor video by breaking temporal relations in video\nclips. With the proposed Inter-Intra Contrastive (IIC) framework, we can train\nspatio-temporal convolutional networks to learn video representations. There\nare many flexible options in our IIC framework and we conduct experiments by\nusing several different configurations. Evaluations are conducted on video\nretrieval and video recognition tasks using the learned video representation.\nOur proposed IIC outperforms current state-of-the-art results by a large\nmargin, such as 16.7% and 9.5% points improvements in top-1 accuracy on UCF101\nand HMDB51 datasets for video retrieval, respectively. For video recognition,\nimprovements can also be obtained on these two benchmark datasets. Code is\navailable at\nhttps://github.com/BestJuly/Inter-intra-video-contrastive-learning.\n

Keywords:
Computer science Margin (machine learning) Benchmark (surveying) Artificial intelligence Feature (linguistics) Representation (politics) Feature learning Categorization Pattern recognition (psychology) CLIPS Convolutional neural network Feature extraction Machine learning

Metrics

108
Cited By
8.61
FWCI (Field Weighted Citation Impact)
47
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

An Improved Inter-Intra Contrastive Learning Framework on Self-Supervised Video Representation

Tao LiXueting WangToshihiko Yamasaki

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2022 Vol: 32 (8)Pages: 5266-5280
JOURNAL ARTICLE

Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering

Jiutong WeiGuan LuoBing LiWeiming Hu

Journal:   2022 26th International Conference on Pattern Recognition (ICPR) Year: 2022 Pages: 4815-4821
JOURNAL ARTICLE

Multitask Contrastive Learning for Self-Supervised Video Representation

东风 单

Journal:   Computer Science and Application Year: 2023 Vol: 13 (03)Pages: 433-443
© 2026 ScienceGate Book Chapters — All rights reserved.