This paper targets the problem of clustering very large datasets as one of the most challenging tasks for data mining and processing. We propose an improved MapReduce design of Kmeans algorithm with an iteration reducing method. Experiments show that this method reduces the number of iterations and the execution time of the Kmeans algorithm while keeping 80% of the clustering accuracy. The employment of MapReduce programming paradigm and iterations reducing techniques offers the possibility to process the huge volume of data generated by stock exchanges daily transactions which performs a better decision making by analysts.
Amira BoukhdhirOussama LachihebMohamed Salah Gouider
Robson L. F. CordeiroCaetano TrainaAgma J. M. TrainaJulio LópezU KangChristos Faloutsos
Fangjun LuanJiadi LiuKeyan Cao
Trong Nhan PhanJosef KüngTran Khanh Dang