JOURNAL ARTICLE

Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency

Zhiwu QingShiwei ZhangZiyuan HuangYi XuXiang WangChangxin GaoRong JinNong Sang

Year: 2023 Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 45 (10)Pages: 12408-12426   Publisher: IEEE Computer Society

Abstract

Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most previous efforts to learn spatio-temporal representations rely on manually trimmed videos, such as Kinetics dataset (Carreira and Zisserman 2017), resulting in limited diversity in visual patterns and limited performance gains. In this work, we aim to improve video representations by leveraging the rich information in natural untrimmed videos. For this purpose, we propose learning a hierarchy of temporal consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span, and clip pairs that share similar topics when separated by a long time span. Specifically, we present a Hierarchical Consistency (HiCo++) learning framework, in which the visually consistent pairs are encouraged to share the same feature representations by contrastive learning, while topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topic-related, i.e., from the same untrimmed video. Additionally, we impose a gradual sampling algorithm for the proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that HiCo++ can not only generate stronger representations on untrimmed videos, but also improve the representation quality when applied to trimmed videos. This contrasts with standard contrastive learning, which fails to learn powerful representations from untrimmed videos. Source code will be made available here.

Keywords:
Artificial intelligence Computer science Consistency (knowledge bases) Feature learning Machine learning Classifier (UML) Pattern recognition (psychology) Visualization

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
144
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Cancer-related molecular mechanisms research
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Cancer Research

Related Documents

JOURNAL ARTICLE

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Zhiwu QingShiwei ZhangZiyuan HuangYi XuXiang WangMingqian TangChangxin GaoRong JinNong Sang

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 13811-13821
JOURNAL ARTICLE

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Dezhao LuoYu ZhouBo FangYucan ZhouDayan WuWeiping Wang

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2022 Vol: 18 (1s)Pages: 1-21
DISSERTATION

Self-supervised and cross-modal learning from videos

Almut Sophia Koepke

University:   Oxford University Research Archive (ORA) (University of Oxford) Year: 2019
© 2026 ScienceGate Book Chapters — All rights reserved.