JOURNAL ARTICLE

Keyword-Based Semi-Supervised Text Classification

Abstract

Industrial organizations generate massive volumes of data during their routine business and production activities. Such data may be structured (numerical or categorical), or it may be unstructured and textual. Both structured and unstructured data contain a wealth of knowledge that can help organizations improve their operations. Organizations find it easy to automatically extract knowledge from structured data. Unstructured data, however, must be mined and interpreted manually which is cumbersome, error-prone and time consuming. This paper focuses on how to automatically analyze unstructured text data to extract important business value. It proposes a semi-supervised natural language (NL) approach to analyze a corpus of documents associated with accounts receivable disputes at a large corporation. The name semi-supervised derives from the philosophy underlying the methodology, where a set of categories and the keywords associated with these categories are defined in consultation with the domain experts. Subsequently, these categories and their associated keywords are supplied as input to the algorithm, which classifies the disputes automatically into these predefined categories. The performance of the semi-supervised methodology is very comparable to that of the random forest, which is a supervised learning approach. The paper discusses the benefits of the semi-supervised approach over supervised learning; namely, a considerable reduction in the manual effort to analyze, understand and label training data set, without any noticeable degradation in performance.

Keywords:
Computer science Supervised learning Unstructured data Categorical variable Set (abstract data type) Semi-supervised learning Artificial intelligence Machine learning Data mining Natural language processing Big data Artificial neural network

Metrics

6
Cited By
0.46
FWCI (Field Weighted Citation Impact)
20
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Weakly-supervised Text Classification Based on Keyword Graph

Lu ZhangJiandong DingYi XuYingyao LiuShuigeng Zhou

Journal:   Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Year: 2021
BOOK-CHAPTER

Semi-supervised Collaborative Text Classification

Rong JinMing‐Tsang WuRahul Sukthankar

Lecture notes in computer science Year: 2007 Pages: 600-607
BOOK-CHAPTER

Semi-Supervised Text Classification Using EM

Nigam KamalMcCallum AndrewMitchell Tom

The MIT Press eBooks Year: 2006 Pages: 32-55
© 2026 ScienceGate Book Chapters — All rights reserved.