Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

Sanghyun Seo; Juntae Kim

doi:10.1145/3297156.3297244

ScienceGate Book Chapters

JOURNAL ARTICLE

Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

Sanghyun Seo Juntae Kim

Year: 2018 Journal: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence Pages: 350-353

DOI: 10.1145/3297156.3297244

Get Full-Text PDF Get Analytical Report

Abstract

Heterogeneous data embedding is a process of mapping different kinds of data into a common vector space of a certain dimension. Image-text embedding also means mapping image and text data that have completely different characteristics into a common vector space. In this paper, we propose an image-text embedding method using hierarchical knowledge such as coarse and fine labels of text data. The proposed method improves the training efficiency of the embedding model by fixing the coarse label vectors. In addition, the loss function is designed by arbitrarily selecting the negative sample from the fine labels having a hierarchical relationship with the coarse label, so that the difference between the vectors of the fine labels which have same coarse label becomes larger. So, when the images that are visual data is mapped into a common vector space, the semantic of images becomes clear. Experimental results show that embedding with hierarchical knowledge has been successfully performed using the proposed methodology and that cross-modal retrieval can be efficiently performed through embedding model.

Keywords:

Embedding Computer science Dimension (graph theory) Image (mathematics) Vector space Artificial intelligence Modal Pattern recognition (psychology) Image retrieval Space (punctuation) Mathematics Combinatorics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.24

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

Abstract

Metrics

Topics

Related Documents

Image–Text Cross-Modal Retrieval with Instance Contrastive Embedding

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Super Visual Semantic Embedding for Cross-Modal Image-Text Retrieval

Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval

Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval