JOURNAL ARTICLE

Learning from Imbalanced Data

Haibo HeEdwardo A. Garcia

Year: 2009 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 21 (9)Pages: 1263-1284   Publisher: IEEE Computer Society

Abstract

With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.

Keywords:
Computer science Data science Raw data Machine learning Artificial intelligence Big data Field (mathematics) Data mining

Metrics

9025
Cited By
120.82
FWCI (Field Weighted Citation Impact)
193
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Electricity Theft Detection Techniques
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Learning From Imbalanced Data

Lincy Meera MathewsHari Seetha

IGI Global eBooks Year: 2017 Pages: 1825-1834
BOOK-CHAPTER

Learning From Imbalanced Data

Lincy Meera MathewsHari Seetha

Advances in computer and electrical engineering book series Year: 2018 Pages: 403-414
BOOK-CHAPTER

Learning from Imbalanced Data

Sarah Vluymans

Studies in computational intelligence Year: 2018 Pages: 81-110
JOURNAL ARTICLE

Learning from Imbalanced Data Distribution

Wang, Wentao

Journal:   Michigan State University Libraries Year: 2024
© 2026 ScienceGate Book Chapters — All rights reserved.