P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification

Jingjing Wang; Wen Feng Lu; Han Tong Loh

doi:10.1115/detc2011-47313

ScienceGate Book Chapters

JOURNAL ARTICLE

P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification

Jingjing Wang Wen Feng Lu Han Tong Loh

Year: 2011 Pages: 1089-1098

DOI: 10.1115/detc2011-47313

Get Full-Text PDF Get Analytical Report

Abstract

The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results.

Keywords:

Oversampling Support vector machine Computer science Classifier (UML) Artificial intelligence Machine learning Class (philosophy) Data mining Statistical classification Pattern recognition (psychology) Bandwidth (computing)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.15

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Intellectual Property and Patents

Social Sciences → Business, Management and Accounting → Management of Technology and Innovation

P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification

Abstract

Metrics

Topics

Related Documents

COVID-19 Fatality Rate Classification Using Synthetic Minority Oversampling Technique (SMOTE) for Imbalanced Class

MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification

T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

Investigating the minority class oversampling technique (SMOTE) on an imbalanced Cardiovascular Disease (CVD) dataset.