Multimodal contrastive learning for unsupervised video representation learning

Anup Hiremath; Avideh Zakhor

doi:10.2352/ei.2023.35.14.coimg-173

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal contrastive learning for unsupervised video representation learning

Anup Hiremath Avideh Zakhor

Year: 2023 Journal: Electronic Imaging Vol: 35 (14)Pages: 173-1

DOI: 10.2352/ei.2023.35.14.coimg-173

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we propose a multimodal unsupervised video learning algorithm designed to incorporate information from any number of modalities present in the data. We cooperatively train a network corresponding to each modality: at each stage of training, one of these networks is selected to be trained using the output of the other networks. To verify our algorithm, we train a model using RGB, optical flow, and audio. We then evaluate the effectiveness of our unsupervised learning model by performing action classification and nearest neighbor retrieval on a supervised dataset. We compare this triple modality model to contrastive learning models using one or two modalities, and find that using all three modalities in tandem provides a 1.5% improvement in UCF101 classification accuracy, a 1.4% improvement in R@1 retrieval recall, a 3.5% improvement in R@5 retrieval recall, and a 2.4% improvement in R@10 retrieval recall as compared to using only RGB and optical flow, demonstrating the merit of utilizing as many modalities as possible in a cooperative learning model.

Keywords:

Computer science Artificial intelligence Modalities Modality (human–computer interaction) Recall Feature learning Unsupervised learning Deep learning Machine learning Optical flow Multimodal learning RGB color model Representation (politics) Pattern recognition (psychology) Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.01

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Multimodal contrastive learning for unsupervised video representation learning

Abstract

Metrics

Topics

Related Documents

Kalman contrastive unsupervised representation learning

Spatiotemporal Contrastive Video Representation Learning

Relative Contrastive Loss for Unsupervised Representation Learning

Contrastive Learning for Unsupervised Video Highlight Detection

Enhancing Unsupervised Video Representation Learning by Temporal Contrastive Modelling Using 2D CNN