Imputation for missing data is an important task of data mining, which may influence the data mining result. In this paper, Missing Categorical Data Imputation Based on Similarity (MIBOS) is proposed to solve this problem. The algorithm defines a similarity model between objects with incomplete data, constructing the similarity matrix of objects and further gets the nearest undifferentiated object sets of each object to impute the missing data iteratively. In the imputing process, the imputed value will be directly applied to the same iteration and the following iterations. Experiments with three UCI benchmark data sets show the improvement of the proposed algorithm from perspectives of complete rate, accuracy and time efficiency.
Mulugeta GebregziabherStacia M. DeSantis
Xiaochen ShaoSen WuXiaodong FengRui Song
Muhan ZhouYulei HeMandi YuChiu-Hsieh Hsu
Jaroslav HorníčekHana Řezanková
Muhammad IshaqSana ZahirLaila iftikharMohammad Farhad BulbulSeungmin RhoMi Young Lee