JOURNAL ARTICLE

Crowd-sourcing NLG Data: Pictures Elicit Better Data.

Abstract

Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances.In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit data.We show that pictorial MRs result in better NL data being collected than logicbased MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs.As the MR becomes more complex, the benefits of pictorial stimuli increase.The collected data will be released as part of this submission.

Keywords:
Crowdsourcing Natural language generation Computer science Meaning (existential) Quality (philosophy) Natural language processing Point (geometry) Natural language Artificial intelligence Data quality Scale (ratio) Labeled data Training set Psychology World Wide Web Mathematics

Metrics

47
Cited By
6.20
FWCI (Field Weighted Citation Impact)
21
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Crowd-Based Data Sourcing

Tova Milo

Lecture notes in computer science Year: 2011 Pages: 64-67
JOURNAL ARTICLE

Crowd-sourcing the COVID data

Sandeep PatelShahnawaz KhanRitesh KumarSiddhartha SharmaVishal Kumar

Journal:   Journal of Indira Gandhi Institute Of Medical Sciences Year: 2022 Vol: 8 (1)Pages: 66-68
JOURNAL ARTICLE

Foundations of Crowd Data Sourcing

Yael AmsterdamerTova Milo

Journal:   ACM SIGMOD Record Year: 2015 Vol: 43 (4)Pages: 5-14
© 2026 ScienceGate Book Chapters — All rights reserved.