The study titled"Mitigating Algorithmic Bias in Machine Learning through Synthetic Tabular Data Generation" probes the ways in which synthetic data methods might improve the accuracy and fairness of machine learning models. The study incorporates survey and experimental studies to assess the practical effect of synthetic data and perceptions of bias, using a quantitative research approach. Data scientists, artificial intelligence researchers, and machine learning practitioners were among the 135 people polled in a structured survey on their views on the use of synthetic data and their familiarity with the causes of bias. Model training on both natural and artificially enhanced datasets utilizing techniques like SMOTE, GANs, and Variational Autoencoders (VAE) was also part of the experimental assessment. To investigate causal links between data augmentation methods and model fairness results, statisticians used tools including frequency analysis, t-tests, and analysis of variance (ANOVA). It was shown that synthetic data production significantly improves fairness and lowers data imbalance without sacrificing accuracy. According to the results, synthetic tabular data provides a happy medium between model performance, data privacy, and fairness, making it a promising method for ethical AI research. Adding to the continuing body of work in ethical AI, this study provides evidence that synthetic data may help reduce algorithmic bias in ML applications.
M. NikolićDanilo NikolićMiroslav StefanovićSara KoprivicaDarko Stefanović
Manjunath MahendraChaithra UmeshKristian SchultzOlaf WolkenhauerSaptarshi Bej
Patricia A. ApellánizAna JiménezBorja Arroyo GalendeJuan ParrasSantiago Zazo