The purpose of this paper is to predict the health insurance premium through a variety of machine learning algorithms, and compare and analyze the prediction effect of different algorithms. An open source dataset was selected for the study, and the experiments involved three machine learning models: linear regression, decision trees, and random forests. By testing these models, we obtain their performance in health insurance premium forecasts. The results show that the prediction performance of the random forest regression model is better than other models, and its score reaches 0.8564, which is the best algorithm among the three models. Second, the linear regression model has a score of 0.7584, and although its performance is not as good as that of random forest, it still shows some predictive power. Finally, the prediction effect of decision tree model is relatively poor, and the score is only 0.7097. To sum up, the experiments in this paper prove that the random forest model is undoubtedly the best choice in the prediction of health insurance premiums, which not only has good prediction accuracy, but also shows strong data processing ability. In the context of the growing importance of health insurance premium collection and analysis, the use of advanced machine learning algorithms such as Random Forest for forecasting will, to some extent, help insurance companies better price and assess risk. Therefore, it can be concluded that random forest regression model has the best performance for health insurance premium prediction and is an effective tool to achieve accurate prediction.
Lee SijieFlorence SiaRayner AlfredErvin Gubin Moung
Rodrigo M. JesusMiguel A. BritoDuarte N. Duarte
A. Chidvilas ReddyM. Trinadh ChowdaryP. Renukadevi