Customer churn is a persistent and costly challenge for companies in the telecommunications sector, where maintaining existing subscribers is often more profitable than acquiring new ones. Accurately identifying customers who are likely to leave is critical for enabling targeted retention strategies. However, churn prediction is complicated by significant class imbalance, as the number of churners typically represents a small fraction of the overall customer base. This thesis explores the application of machine learning techniques to the churn prediction problem using a structured experimental approach. Five experimental settings were designed to evaluate and improve model performance under imbalanced data conditions: a baseline scenario using the original dataset, a cost-sensitive learning setup with class weighting, a recall-optimized configuration through hyperparameter tuning, an experiment incorporating synthetic oversampling (SMOTE) and a final experiment using the top 20 important features . A variety of classification models were assessed, including both traditional machine learning algorithms and neural networks. The study aims to investigate how different learning strategies and evaluation criteria affect model behavior and performance in the context of churn prediction. Emphasis is placed on addressing the imbalance issue, optimizing recall of the minority class, and comparing the effectiveness of algorithmic and data-driven solutions. The findings provide insights into the trade-offs and considerations involved in developing fair and practical predictive models for real-world customer churn scenarios.
O. PandithuraiHager SalehHrudhai Narayan. SB. SrimanR Seetha
Sharmila K. WaghKishor S. Wagh
Sharmila K. WaghAishwarya A. AndhaleKishor S. WaghJayshree R. PansareSarita AmbadekarS. H. Gawande