JOURNAL ARTICLE

Feature Engineering with Word2vec on Text Classification Using The K-Nearest Neighbor Algorithm

Syopiansyah Jaya PutraMuhamad Nur GunawanArief Akbar Hidayat

Year: 2022 Journal:   2022 10th International Conference on Cyber and IT Service Management (CITSM) Vol: pp Pages: 1-6

Abstract

Text feature extraction is the process of convering unstructured text data into structured so that machine learning algorithms can process it. One of the commonly used text feature extraction techniques is tf-idf. This technique has the potential to produce high-dimensional data which results in longer computational time and affects accuracy results. This study aims to compare feature extraction between word2vec and TF-IDF. The study uses a data explore 4 step approach with a text classification process whose modeling uses the KNN algorithm. The results showed that the highest accuracy value of TF-IDF with the KNN algorithm was 73% in the 7:3 scenario with 8133 features. The highest accuracy value of Wod2vec with the KNN algorithm was 74% in scenario 9: 1 with 300 features. IDF where word2vec produces data with fewer dimensions. This study can prove that feature extraction with word2vec can be done for machine learning research, not only for deep learning. This study can also be used as a comparison of classification per-formance measurement with different feature extraction which can later be applied in web or mobile apps.

Keywords:
Word2vec Computer science Feature extraction Artificial intelligence k-nearest neighbors algorithm tf–idf Feature (linguistics) Statistical classification Data mining Pattern recognition (psychology) Machine learning Algorithm

Metrics

6
Cited By
0.71
FWCI (Field Weighted Citation Impact)
23
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Edcuational Technology Systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Information Retrieval and Data Mining
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Assamese Text Classification using k Nearest Neighbor

Moromi GogoiShikhar Kumar Sarma

Journal:   International Journal of Recent Technology and Engineering (IJRTE) Year: 2019 Vol: 8 (4)Pages: 8185-8188
JOURNAL ARTICLE

Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor

Nur Ghaniaviyanto Ramadhan

Journal:   Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Year: 2021 Vol: 5 (6)Pages: 1083-1089
© 2026 ScienceGate Book Chapters — All rights reserved.