Geo-Distributed Big Data Analytics Systems

Yixin Bao; Chuan Wu

doi:10.1201/9780429507670-8

ScienceGate Book Chapters

BOOK-CHAPTER

Geo-Distributed Big Data Analytics Systems

Yixin Bao Chuan Wu

Year: 2019 Pages: 161-190

DOI: 10.1201/9780429507670-8

Get Full-Text PDF Get Analytical Report

Abstract

This chapter investigates online optimal deployment of big data analytics jobs across geo-distributed regions, with unknown and uncertain information of inter-datacenter bandwidths and task execution durations on different virtual machines (VM). Geo-distributed big data analytics systems, which extend a single cluster-based MapReduce, Spark, or parameter server-based system to the Wide Area Network (WAN), to process data generated in different geographic locations. The centralized processing approach is time-consuming due to transmitting large volumes of data over bandwidth-constrained WAN links, and is costly for resource consumption. The chapter provides an online learning-based algorithm which does not rely on offline training, but can learn the near-optimal decisions for placing each type of jobs over time. The algorithm to compute task deployment in each stage of each job finishes within 1600ms for 500 tasks, 10 data centers, and 9 VM types. The chapter also investigates the multiple job scheduling problem with resource constraints in similar cases of runtime uncertainties.

Keywords:

Big data Analytics Computer science Data science Data analysis Data mining

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.05

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Graph Theory and Algorithms

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Distributed and Parallel Computing Systems

Physical Sciences → Computer Science → Computer Networks and Communications

Geo-Distributed Big Data Analytics Systems

Abstract

Metrics

Topics

Related Documents

Multi-objective Optimizations in Geo-Distributed Data Analytics Systems

Low Latency Geo-distributed Data Analytics

Low Latency Geo-distributed Data Analytics

Network Cost-Aware Geo-Distributed Data Analytics System

DAG-Aware Optimization for Geo-Distributed Data Analytics