JOURNAL ARTICLE

Contrastive Masked Autoencoders for Self-Supervised Video Hashing

Yuting WangJinpeng WangBin ChenZiyun ZengShu‐Tao Xia

Year: 2023 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (3)Pages: 2733-2741   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision, facilitating large-scale video retrieval efficiency and attracting increasing research attention. The success of SSVH lies in the understanding of video content and the ability to capture the semantic relation among unlabeled videos. Typically, state-of-the-art SSVH methods consider these two points in a two-stage training pipeline, where they firstly train an auxiliary network by instance-wise mask-and-predict tasks and secondly train a hashing model to preserve the pseudo-neighborhood structure transferred from the auxiliary network. This consecutive training strategy is inflexible and also unnecessary. In this paper, we propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding in a single stage. To capture video semantic information for better hashing learning, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames. Particularly, we find that a higher masking ratio helps video understanding. Besides, we fully exploit the similarity relationship between videos by maximizing agreement between two augmented views of a video, which contributes to more discriminative and robust hash codes. Extensive experiments on three large-scale video datasets (i.e., FCVID, ActivityNet and YFCC) indicate that ConMH achieves state-of-the-art results. Code is available at https://github.com/huangmozhi9527/ConMH.

Keywords:
Computer science Hash function Discriminative model Artificial intelligence Encoder Pipeline (software) Binary code Similarity (geometry) Exploit Machine learning Binary number Pattern recognition (psychology) Image (mathematics)

Metrics

20
Cited By
1.61
FWCI (Field Weighted Citation Impact)
42
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

CMAE-3D: Contrastive Masked AutoEncoders for Self-Supervised 3D Object Detection

Yanan ZhangJiaxin ChenDi Huang

Journal:   International Journal of Computer Vision Year: 2024 Vol: 133 (5)Pages: 2783-2804
JOURNAL ARTICLE

Correction: CMAE-3D: Contrastive Masked AutoEncoders for Self-Supervised 3D Object Detection

Yanan ZhangJiaxin ChenDi Huang

Journal:   International Journal of Computer Vision Year: 2025 Vol: 133 (6)Pages: 3803-3803
BOOK-CHAPTER

Attention-Guided Contrastive Masked Autoencoders for Self-supervised Cross-Modal Biometric Matching

Haichuan ZhangJiaxiang WangH. C. ShiZhenda YuLin Yin

Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Year: 2025 Pages: 162-173
© 2026 ScienceGate Book Chapters — All rights reserved.