RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

Y Wang; Yawen Zeng; Junjie Liang; Xiaofen Xing; Jin Xu; Xiangmin Xu

doi:10.1145/3652583.3658018

ScienceGate Book Chapters

JOURNAL ARTICLE

RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

Y Wang Yawen Zeng Junjie Liang Xiaofen Xing Jin Xu Xiangmin Xu

Year: 2024 Pages: 860-868

DOI: 10.1145/3652583.3658018

Get Full-Text PDF Get Analytical Report

Abstract

As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an auxiliary modality through concepts or latent semantics, which are typically based on the Transformer framework. However, current approaches often ignore one modality to design numerous handcrafted features (e.g. visual concept extraction) and require training of all parameters in their framework. Therefore, it is worthwhile to explore multi-modal concepts or features to enhance performance and an efficient approach to incorporate visual information with minimal cost. Meanwhile, with the development of multi-modal large language models (MLLMs), they are faced with the visual hallucination issue of compromising performance, despite their powerful capabilities. Inspired by pioneering techniques in the multi-modal field, such as prompt learning and MLLMs, this paper innovatively explores the possibility of applying multi-modal prompt learning to this multi-modal machine translation task.

Keywords:

Computer science Modal Machine translation Modality (human–computer interaction) Artificial intelligence Machine learning Semantics (computer science) Natural language processing Programming language

Metrics

Cited By

1.59

FWCI (Field Weighted Citation Impact)

Refs

0.74

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Genomics and Phylogenetic Studies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

Abstract

Metrics

Citation History

Topics

Related Documents

MaPLe: Multi-modal Prompt Learning

Entity-level Cross-modal Learning Improves Multi-modal Machine Translation

Constrained Bipartite Graph Learning for Imbalanced Multi-Modal Retrieval

Unsupervised Multi-Modal Neural Machine Translation

Video Pivoting Unsupervised Multi-Modal Machine Translation