JOURNAL ARTICLE

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Hai HeHaibo Yang

Year: 2021 Journal:   Mathematical Problems in Engineering Vol: 2021 Pages: 1-8   Publisher: Hindawi Publishing Corporation

Abstract

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.

Keywords:
Computer science Initialization Word (group theory) Embedding Word embedding Artificial intelligence Encoder Natural language processing Key (lock) Rank (graph theory) Language model Encoding (memory) Visual Word Image retrieval Code (set theory) Information retrieval Image (mathematics)

Metrics

7
Cited By
0.61
FWCI (Field Weighted Citation Impact)
41
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Multilabel Deep Visual-Semantic Embedding

Mei-Chen YehYinan Li

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2019 Vol: 42 (6)Pages: 1530-1536
BOOK-CHAPTER

Visual Word Embedding for Text Classification

Ignazio GalloShah NawazNicola LandroRiccardo La Grassa

Lecture notes in computer science Year: 2021 Pages: 339-352
BOOK-CHAPTER

Text Semantic Steganalysis Based on Word Embedding

Xin ZuoHuanhuan HuWeiming ZhangNenghai Yu

Lecture notes in computer science Year: 2018 Pages: 485-495
© 2026 ScienceGate Book Chapters — All rights reserved.