JOURNAL ARTICLE

An Optimal Transport-Based Undersampling Technique for Handling Imbalanced Datasets

Sungjun SeoMohammad AfraziKooktae Lee

Year: 2025 Journal:   Journal of Dynamic Systems Measurement and Control Vol: 148 (3)   Publisher: ASM International

Abstract

Abstract This paper investigates a novel undersampling technique based on optimal transport (OT) for managing imbalanced datasets in classification tasks. Undersampling is crucial for reducing dataset size while preserving essential statistical properties, improving both classification performance and computational efficiency. Existing methods, such as random undersampling, NearMiss, Tomek Links, and Edited Nearest Neighbor, often fail to adequately preserve the underlying data distribution. To address this limitation, we propose a Wasserstein distance-based undersampling method that formulates an optimization problem aimed at minimizing distributional distortion. By leveraging the Wasserstein distance to quantify differences between probability distributions, the proposed approach ensures that the reduced dataset retains key geometric and statistical characteristics of the original majority class. Furthermore, we provide a computational complexity analysis and establish a stability property that bounds the Wasserstein deviation introduced by support reduction. Simulation results on synthetically generated imbalanced datasets demonstrate that the proposed method preserves the structural characteristics of the original data more effectively than existing resampling techniques, while achieving balanced classification performance across both majority and minority classes. These results highlight the potential of the proposed approach as an effective and scalable solution for addressing class imbalance in practical classification problems.

Keywords:
© 2026 ScienceGate Book Chapters — All rights reserved.