In the last few years, distributed learning has been the focus of much attention due to the explosion of big databases, in some cases distributed across different nodes. However, the great majority of current selection and classification algorithms are designed for centralized learning, i.e. they use the whole dataset at once. In this paper, a new approach for learning on vertically partitioned data is presented, which covers both feature selection and classification. The approach splits the data by features, and then uses the chi-square filter and the naive Bayes classifier to learn at each node. Finally, a merging procedure is performed, which updates the learned model in an incremental fashion. The experimental results on five representative datasets show that the execution time is shortened considerably whereas the classification performance is maintained as the number of nodes increases.
Maylinna Rahayu NingsihJumanto JumantoHabib al FarihMuch Aziz Muslim
Nadisa Karina PutriZuherman RustamDevvi Sarwinda
Benedictus Benny SihotangYulison Herry ChrisnantoMelina Melina
Min-Ling ZhangJosé M. PeñaVı́ctor Robles