What Makes Data-to-Text Generation Hard for Pretrained Language Models?

Moniba Keymanesh; Adrian Benton; Mark Dredze

doi:10.18653/v1/2022.gem-1.50

ScienceGate Book Chapters

JOURNAL ARTICLE

What Makes Data-to-Text Generation Hard for Pretrained Language Models?

Moniba Keymanesh Adrian Benton Mark Dredze

Year: 2022 Pages: 539-554

DOI: 10.18653/v1/2022.gem-1.50

Get Full-Text PDF Get Analytical Report

Abstract

Expressing natural language descriptions of structured facts or relations – data-to-text generation (D2T) – increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models (PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data. On the other hand, while auto-regressive PLMs can generalize from a few task examples, their efficacy at D2T is largely unexplored. Furthermore, we have an incomplete understanding of the limits of PLMs on D2T. In this work, we conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset. We consider their performance as a function of the amount of task-specific data and how the data is incorporated into the models: zero and few-shot learning, and fine-tuning of model weights. In addition, we probe the limits of PLMs by measuring performance on subsets of the evaluation data: novel predicates and abstractive test examples. To improve the performance on these subsets, we investigate two techniques: providing predicate descriptions in the context and re-ranking generated candidates by information reflected in the source. Finally, we conduct a human evaluation of model errors and show that D2T generation tasks would benefit from datasets with more careful manual curation.

Keywords:

Computer science Artificial intelligence Language model Machine learning Task (project management) Ranking (information retrieval) Natural language processing Natural language understanding Predicate (mathematical logic) Context (archaeology) Classifier (UML) Training set Natural language Programming language

Metrics

Cited By

1.96

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

What Makes Data-to-Text Generation Hard for Pretrained Language Models?

Abstract

Metrics

Citation History

Topics

Related Documents

ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

Progressive Generation of Long Text with Pretrained Language Models

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models