Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Hai He; Haibo Yang

doi:10.1155/2021/6654071

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Hai He Haibo Yang

Year: 2021 Journal: Mathematical Problems in Engineering Vol: 2021 Pages: 1-8 Publisher: Hindawi Publishing Corporation

DOI: 10.1155/2021/6654071

Get Full-Text PDF Get Analytical Report

Abstract

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.

Keywords:

Computer science Initialization Word (group theory) Embedding Word embedding Artificial intelligence Encoder Natural language processing Key (lock) Rank (graph theory) Language model Encoding (memory) Visual Word Image retrieval Code (set theory) Information retrieval Image (mathematics)

Metrics

Cited By

0.61

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Abstract

Metrics

Citation History

Topics

Related Documents

Multilabel Deep Visual-Semantic Embedding

Visual Word Embedding for Text Classification

Text Semantic Steganalysis Based on Word Embedding

Enhancing Semantic Word Representations by Embedding Deep Word Relationships

Retaining Semantic Data in Binarized Word Embedding