Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Ziheng Li; Shaohan Huang; Zihan Zhang; Zhihong Deng; Qiang Lou; Haizhen Huang; Jian Jiao; Furu Wei; Weiwei Deng; Qi Zhang

doi:10.60692/cf1cv-5k145

ScienceGate Book Chapters

JOURNAL ARTICLE

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Ziheng Li Shaohan Huang Zihan Zhang Zhihong Deng Qiang Lou Haizhen Huang Jian Jiao Furu Wei Weiwei Deng Qi Zhang

Year: 2023 Journal: Greater South Information System

DOI: 10.60692/cf1cv-5k145

Get Full-Text PDF Get Analytical Report

Abstract

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding.However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously.Based on our findings, we propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding that incorporates both sentence-level and token-level alignment.To achieve this, we introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.This reconstruction objective encourages the model to embed translation information into the token representation.Compared to other token-level alignment methods such as translation language modeling, RTL is more suitable for dual encoder architectures and is computationally efficient.Extensive experiments on three sentencelevel cross-lingual benchmarks demonstrate that our approach can significantly improve sentence embedding.

Keywords:

Sentence Embedding Translation (biology) Encoder Representation (politics) Security token Machine translation Task (project management)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.31

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Graph Neural Networks

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Abstract

Metrics

Topics

Related Documents

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Enhancing Cross-lingual Sentence Embedding for Low-resource Languages with Word Alignment

Cross-lingual Sentence Embedding using Multi-Task Learning

Cross-lingual Sentence Embedding using Multi-Task Learning