Big Data Processing refers to processing, analyzing, and extracting information from a large amount of data. The traditional processing techniques of big data are no longer viable. Therefore this large volume of data is processed by distributed computing frameworks like Apache Hadoop. MapReduce frameworks are an excellent way to process larger datasets on the cloud platform. The automation of Hadoop is achieved on OpenStack based private cloud. Auto-scaling refers to the dynamic addition of nodes to or removal of the nodes from the existing cluster to use the resources effectively. The Quality of Service(QoS) has to be maintained while auto-scaling the resources on-demand. There is a requirement for auto-scaling of the Hadoop cluster when the load increases to adhere to the Service Level Agreements(SLAs). The paper presents the auto-scaling method of Hadoop clusters using predictive algorithms. To effectively implement auto-scaling, the prediction techniques are employed. The proposed model obtains the CPU utilization metrics and scales up or scales down the cluster based on the time series analysis. The experimental result using the OpenStack-based private cloud testbed reveals the effectiveness of the proposed mechanism.
Antonio CorradiLuca FoschiniValerio PipoloAlessandro Pernafini
Zheng ZhangHao XuKe ChenPingping Shan
Asmath Fahad ThahaManvir SinghAnang Hudaya Muhamad AminNazrul M. AhmadSubarmaniam Kannan
Binod Kumar AdhikariWanli ZuoRamesh Maharjan