MITIGATING ALGORITHMIC BIAS IN MACHINE LEARNING THROUGH SYNTHETIC TABULAR DATA GENERATION

Harer Savita Laxman Research Scholar, Department of Computer Science Engineering, Glocal University, Saharanpur, U.P. Dr. Shashank Swami Research Supervisor,Department of Computer Science Engineering, Glocal University, Saharanpur, U.P.

doi:10.5281/zenodo.17379246

JOURNAL ARTICLE

MITIGATING ALGORITHMIC BIAS IN MACHINE LEARNING THROUGH SYNTHETIC TABULAR DATA GENERATION

Harer Savita Laxman Research Scholar, Department of Computer Science Engineering, Glocal University, Saharanpur, U.P. Dr. Shashank Swami Research Supervisor,Department of Computer Science Engineering, Glocal University, Saharanpur, U.P.

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.17379246

Get Full-Text PDF Get Analytical Report

Abstract

The study titled"Mitigating Algorithmic Bias in Machine Learning through Synthetic Tabular Data Generation" probes the ways in which synthetic data methods might improve the accuracy and fairness of machine learning models. The study incorporates survey and experimental studies to assess the practical effect of synthetic data and perceptions of bias, using a quantitative research approach. Data scientists, artificial intelligence researchers, and machine learning practitioners were among the 135 people polled in a structured survey on their views on the use of synthetic data and their familiarity with the causes of bias. Model training on both natural and artificially enhanced datasets utilizing techniques like SMOTE, GANs, and Variational Autoencoders (VAE) was also part of the experimental assessment. To investigate causal links between data augmentation methods and model fairness results, statisticians used tools including frequency analysis, t-tests, and analysis of variance (ANOVA). It was shown that synthetic data production significantly improves fairness and lowers data imbalance without sacrificing accuracy. According to the results, synthetic tabular data provides a happy medium between model performance, data privacy, and fairness, making it a promising method for ethical AI research. Adding to the continuing body of work in ethical AI, this study provides evidence that synthetic data may help reduce algorithmic bias in ML applications.

Keywords:

Synthetic data Experimental data Variance (accounting) Training set Noisy data Data modeling Raw data Data collection

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Ethics and Social Impacts of AI

Social Sciences → Social Sciences → Safety Research

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning and Data Classification

Physical Sciences → Computer Science → Artificial Intelligence

MITIGATING ALGORITHMIC BIAS IN MACHINE LEARNING THROUGH SYNTHETIC TABULAR DATA GENERATION

Abstract

Metrics

Topics

Related Documents

MITIGATING ALGORITHMIC BIAS IN MACHINE LEARNING THROUGH SYNTHETIC TABULAR DATA GENERATION

Mitigating Algorithmic Bias Through Probability Calibration: A Case Study on Lead Generation Data

Convex space learning for tabular synthetic data generation

Mitigating Discrimination in Machine Learning through Fair Synthetic Data Practices

Artificial inductive bias for synthetic tabular data generation in data-scarce scenarios