A major opportunity for NLP to have a realworld impact is in helping educators score student writing, particularly content-based writing (i.e., the task of automated short answer scoring).A major challenge in this enterprise is that scored responses to a particular question (i.e., labeled data) are valuable for modeling but limited in quantity.Additional information from the scoring guidelines for humans, such as exemplars for each score level and descriptions of key concepts, can also be used.Here, we explore methods for integrating scoring guidelines and labeled responses, and we find that stacked generalization (Wolpert, 1992) improves performance, especially for small training sets.
Jill BursteinBeata Beigman-KlebanovNitin MadnaniAdam Faulkner
Tarandeep Singh WaliaGurpreet Singh JosanAmarpal Singh
Tasuku SatoHiroaki FunayamaKazuaki HanawaKentaro Inui
Ramesh DadiSuresh Kumar Sanampudi