Abstract The Hadoop platform forms a complete large-scale ecological distribution system, including HDFS, MapReduce, HBase and other subsystems. This paper analyzes the parallel processing of Hadoop platform and applies it in the field of data mining algorithms. In order to obtain better algorithm efficiency, a K-Modes clustering algorithm based on big data platform is proposed. It uses cluster mode to replace the central node. The mining process uses naive Bayes to improve mining efficiency. The experimental results show that it has better adaptability, saves time and improves the efficiency of the algorithm.
Jianyong PengXinhao ZhangLina WangFang ZhuNana ZhouYansong ZuoTao ZhouYuan Gao
Yi Fan ZhangYong Tao QianTai Yu LiuShu Wu
Cong HuangYang YaoHuajun WangXueyu ZhangJinquan ZhaoJun Wan