Generative Adversarial Networks (GANs) have become a powerful tool for generating synthetic data, and they fill in the important gap concerning the lack or imbalance of real data with respect to healthcare applications. Annotated data for rare oncological diseases that would allow training machine learning models is not available. A modern system based on GANs, which uses conditional GANs (cGANs) and Wasserstein, is described here. The goal is to extend existing datasets and improve the outcomes of classifications for rare diseases. This is achieved by extensive preprocessing, the introduction of noise to avoid overfitting, and carefully executed validation procedures after synthesis to retain biological consistency and statistical coherence. Based on the experimental results presented, classifiers trained on augmented data produce much better sensitivity, specificity, and F1 scores than the baseline models, provided that the classes are significantly imbalanced. This study uses heatmap correlation analysis and distributional assessments between synthetic and real samples to measure data realism within a modular framework that fuses adversarial training and strict validation of synthetic data for augmentation in rare cases. Outcomes of the study support the idea that GAN-generated datasets offer a promising way to improve robust diagnostic models, thus addressing the data shortage that is rampant in oncology research. This research broadens the use of GANs in synthesising medical data, which enriches the growing toolkit of computational approaches to strengthen the early detection and categorisation of rare cancers that benefit from data-based techniques.
Xinyue ZhuYifan LiuJiahong LiTao WanZengchang Qin
Vincent MendezC. LhosteSilvestro Micera
Leipu WangJun SunJingming SunJunpeng Yu
Oleksandr ChaikovskyiArtem VolokytaArtemi KyrianovHeorhii Loutskii