JOURNAL ARTICLE

DG2: Data Augmentation Through Document Grounded Dialogue Generation

Abstract

Collecting data for training dialog systems can be extremely expensive due to the involvement of human participants and the need for extensive annotation. Especially in document-grounded dialog systems, human experts need to carefully read the unstructured documents to answer the users’ questions. As a result, existing document-grounded dialog datasets are relatively small-scale and obstruct the effective training of dialogue systems. In this paper, we propose an automatic data augmentation technique grounded on documents through a generative dialogue model. The dialogue model consists of a user bot and agent bot that can synthesize diverse dialogues given an input document, which is then used to train a downstream model. When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods. We also achieve great performance in the low-resource setting.

Keywords:
Dialog box Computer science Dialog system Generative grammar Annotation Grounded theory Artificial intelligence Generative model Resource (disambiguation) Natural language processing Training set Information retrieval Data science Human–computer interaction World Wide Web Qualitative research

Metrics

7
Cited By
1.37
FWCI (Field Weighted Citation Impact)
32
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.