Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Huan Lin; Fandong Meng; Jinsong Su; Yongjing Yin; Zhengyuan Yang; Yubin Ge; Jie Zhou; Jiebo Luo

doi:10.1145/3394171.3413715

ScienceGate Book Chapters

JOURNAL ARTICLE

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Huan Lin Fandong Meng Jinsong Su Yongjing Yin Zhengyuan Yang Yubin Ge Jie Zhou Jiebo Luo

Year: 2020 Pages: 1320-1329

DOI: 10.1145/3394171.3413715

Get Full-Text PDF Get Analytical Report

Abstract

Multimodal machine translation (MMT), which mainly focuses on enhancing\ntext-only translation with visual features, has attracted considerable\nattention from both computer vision and natural language processing\ncommunities. Most current MMT models resort to attention mechanism, global\ncontext modeling or multimodal joint representation learning to utilize visual\nfeatures. However, the attention mechanism lacks sufficient semantic\ninteractions between modalities while the other two provide fixed visual\ncontext, which is unsuitable for modeling the observed variability when\ngenerating translation. To address the above issues, in this paper, we propose\na novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at\neach timestep of decoding, we first employ the conventional source-target\nattention to produce a timestep-specific source-side context vector. Next, DCCN\ntakes this vector as input and uses it to guide the iterative extraction of\nrelated visual features via a context-guided dynamic routing mechanism.\nParticularly, we represent the input image with global and regional visual\nfeatures, we introduce two parallel DCCNs to model multimodal context vectors\nwith visual features at different granularities. Finally, we obtain two\nmultimodal context vectors, which are fused and incorporated into the decoder\nfor the prediction of the target word. Experimental results on the Multi30K\ndataset of English-to-German and English-to-French translation demonstrate the\nsuperiority of DCCN. Our code is available on\nhttps://github.com/DeepLearnXMU/MM-DCCN.\n

Keywords:

Metrics

Cited By

3.15

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Abstract

Metrics

Citation History

Topics

Related Documents

Reference Context Guided Vector to Achieve Multimodal Machine Translation

Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation

Word-Region Alignment-Guided Multimodal Neural Machine Translation

GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation

Multimodal Machine Translation