Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages

Qi Gou; Zehua Xia; Wenzhe Du

doi:10.18653/v1/2023.dialdoc-1.1

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages

Qi Gou Zehua Xia Wenzhe Du

Year: 2023 Pages: 1-7

DOI: 10.18653/v1/2023.dialdoc-1.1

Get Full-Text PDF Get Analytical Report

Abstract

This paper proposes a framework to address the issue of data scarcity in Document-Grounded Dialogue Systems(DGDS). Our model leverages high-resource languages to enhance the capability of dialogue generation in low-resource languages. Specifically, We present a novel pipeline CLEM (Cross-Lingual Enhanced Model) including adversarial training retrieval (Retriever and Re-ranker), and Fid (fusion-in-decoder) generator. To further leverage high-resource language, we also propose an innovative architecture to conduct alignment across different languages with translated training. Extensive experiment results demonstrate the effectiveness of our model and we achieved 4th place in the DialDoc 2023 Competition. Therefore, CLEM can serve as a solution to resource scarcity in DGDS and provide useful guidance for multi-lingual alignment tasks.

Keywords:

Leverage (statistics) Computer science Dialog box Resource (disambiguation) Artificial intelligence Natural language processing Architecture Pipeline (software) Scarcity Human–computer interaction World Wide Web Programming language

Metrics

Cited By

0.77

FWCI (Field Weighted Citation Impact)

Refs

0.72

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages

Abstract

Metrics

Citation History

Topics

Related Documents

Enhancing NER Performance in Low-Resource Pakistani Languages using Cross-Lingual Data Augmentation

Cross-Lingual Text Augmentation: A Contrastive Learning Approach for Low-Resource Languages

Improving Low-Resource Question Answering with Cross-Lingual Data Augmentation Strategies

Large and Small models for collaborative cross-lingual data augmentation in entity relationship extraction for low-resource languages

DG2: Data Augmentation Through Document Grounded Dialogue Generation