JOURNAL ARTICLE

Predicting Web Survey Breakoffs Using Machine Learning Models

Zeming ChenAlexandru CernatNatalie Shlomo

Year: 2022 Journal:   Social Science Computer Review Vol: 41 (2)Pages: 573-591   Publisher: SAGE Publishing

Abstract

Web surveys are becoming increasingly popular but tend to have more breakoffs compared to the interviewer-administered surveys. Survey breakoffs occur when respondents quit the survey partway through. The Cox survival model is commonly used to understand patterns of breakoffs. Nevertheless, there is a trend to using more data-driven models when the purpose is prediction, such as classification machine learning models. It is unclear in the breakoff literature what are the best statistical models for predicting question-level breakoffs. Additionally, there is no consensus about the treatment of time-varying question-level predictors, such as question response time and question word count. While some researchers use the current values, others aggregate the value from the beginning of the survey. This study develops and compares both survival models and classification models along with different treatments of time-varying variables. Based on the level of agreement between the predicted and actual breakoff, we find that the Cox model and gradient boosting outperform other survival models and classification models respectively. We also find that using the values of time-varying predictors concurrent to the breakoff status is more predictive of breakoff, compared to aggregating their values from the beginning of the survey, implying that respondents’ breakoff behaviour is more driven by the current response burden.

Keywords:
Boosting (machine learning) Predictive value Predictive modelling Computer science Statistics Machine learning Artificial intelligence Econometrics Psychology Medicine Mathematics

Metrics

4
Cited By
1.93
FWCI (Field Weighted Citation Impact)
30
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Survey Methodology and Nonresponse
Social Sciences →  Social Sciences →  Sociology and Political Science
Statistical Methods and Bayesian Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Data-Driven Disease Surveillance
Health Sciences →  Medicine →  Epidemiology
© 2026 ScienceGate Book Chapters — All rights reserved.