JOURNAL ARTICLE

P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification

Abstract

The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results.

Keywords:
Oversampling Support vector machine Computer science Classifier (UML) Artificial intelligence Machine learning Class (philosophy) Data mining Statistical classification Pattern recognition (psychology) Bandwidth (computing)

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.15
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Intellectual Property and Patents
Social Sciences →  Business, Management and Accounting →  Management of Technology and Innovation
© 2026 ScienceGate Book Chapters — All rights reserved.