Multimodal Retrieval with Contrastive Pretraining

Hüseyin Fuat Alsan; Ekrem Yildiz; Ege Burak Safdil; Furkan Arslan; Taner Arsan

doi:10.1109/inista52262.2021.9548414

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Retrieval with Contrastive Pretraining

Hüseyin Fuat Alsan Ekrem Yildiz Ege Burak Safdil Furkan Arslan Taner Arsan

Year: 2021 Journal: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) Pages: 1-5

DOI: 10.1109/inista52262.2021.9548414

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks.

Keywords:

Computer science Artificial intelligence Encoder Convolutional neural network Image retrieval ENCODE Pattern recognition (psychology) Encoding (memory) Autoencoder Dual (grammatical number) Natural language processing Deep learning Image (mathematics)

Metrics

Cited By

0.19

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Retrieval with Contrastive Pretraining

Abstract

Metrics

Citation History

Topics

Related Documents

CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval

RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining

COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems

Privileged Contrastive Pretraining for Multimodal Affect Modelling

Multimodal Pain Recognition Based on Contrastive Adversarial Autoencoder Pretraining