JOURNAL ARTICLE

"Missing is useful": missing values in cost-sensitive decision trees

Shichao ZhangZhen QinCharles X. LingSi-yuan SHENG

Year: 2005 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 17 (12)Pages: 1689-1693   Publisher: IEEE Computer Society

Abstract

Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.

Keywords:
Missing data Decision tree Computer science Imputation (statistics) Data mining Machine learning Decision tree learning Artificial intelligence Statistics Mathematics

Metrics

193
Cited By
9.59
FWCI (Field Weighted Citation Impact)
37
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Cost-Time Sensitive Decision Tree with Missing Values

Shichao ZhangXiaofeng ZhuJilian ZhangChengqi Zhang

Lecture notes in computer science Year: 2007 Pages: 447-459
JOURNAL ARTICLE

Test-cost sensitive classification on data with missing values

Qiang YangCharles X. LingXian ChaiRong Pan

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2006 Vol: 18 (5)Pages: 626-638
JOURNAL ARTICLE

Multi-criteria feature selection on cost-sensitive data with missing values

Wenhao ShuHong Shen

Journal:   Pattern Recognition Year: 2015 Vol: 51 Pages: 268-280
© 2026 ScienceGate Book Chapters — All rights reserved.