Banks play a pivotal role in facilitating economic activities, allocating financial resources, and managing risks. A fundamental function of banks is the provision of loans. This research is centered on the subject of "Predicting Borrower's Integrity in Loan Repayment," aimed at mitigating risks and ensuring prudent financial decision-making. To conduct our predictive analysis, we leveraged a comprehensive loan lending dataset provided by Lending Club Bank. This dataset consists of 2.2 million records, each associated with 151 distinct features. Performing machine learning predictions on such a substantial dataset, totaling 1.3 gigabytes, presents a formidable challenge. Consequently, we harnessed machine learning techniques and the power of Apache Spark as our primary tool for handling big data. For optimal utilization of Spark's capabilities, we engaged Google Cloud's Dataproc platform. Through feature selection techniques, we identified 28 significant features from the original 151. Notably, data transformation was applied to the selected features for model understanding. Logistic Regression and Random Forest Classification models were employed for the prediction of loan statuses, categorizing them as either 'fully paid' or 'charged off.' These models achieved impressive accuracies of 95.9 percent and 86 percent, respectively. This research contributes significantly to the evolution of loan assessment practices and the refinement of risk management strategies within the banking sector.
Siddharth ThakarDeep PatelVaibhav GandhiDharma Trivedi
Racha GhayadMohamad BalouzaMohammad Zaraket
Dr.Vikas SinghalPrashant Tiwari