JOURNAL ARTICLE

Feature Selection Using Support Vector Machines

Janez BrankMarko GrobelnikN. Milić-FraylingDunja Mladenić

Year: 2002 Journal:   WIT transactions on information and communication technologies Vol: 28   Publisher: WIT Press

Abstract

Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMs). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVM -based feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratioor information gainbased feature selection when linear SVM classifiers are used.

Keywords:
Support vector machine Feature selection Computer science Feature vector Pattern recognition (psychology) Artificial intelligence Classifier (UML) Hyperplane Feature (linguistics) Linear classifier Selection (genetic algorithm) Data mining Machine learning Mathematics

Metrics

72
Cited By
2.60
FWCI (Field Weighted Citation Impact)
17
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Feature Selection using Fuzzy Support Vector Machines

Hong XiaBao Qing Hu

Journal:   Fuzzy Optimization and Decision Making Year: 2006 Vol: 5 (2)Pages: 187-192
JOURNAL ARTICLE

Feature Selection Using Linear Support Vector Machines

Janez BrankMarko Grobelnik

Journal:   The Journal of Physiology Year: 2002 Vol: 491 ( Pt 3) Pages: 18-18
JOURNAL ARTICLE

Feature selection for support vector machines

L. HermesJoachim M. Buhmann

Year: 2002 Vol: 2 Pages: 712-715
JOURNAL ARTICLE

FEATURE SELECTION FOR SUPPORT VECTOR MACHINES USING GENETIC ALGORITHMS

Holger FröhlichOlivier ChapelleBernhard Schölkopf

Journal:   International Journal of Artificial Intelligence Tools Year: 2004 Vol: 13 (04)Pages: 791-800
© 2026 ScienceGate Book Chapters — All rights reserved.