BOOK-CHAPTER

Geo-Distributed Big Data Analytics Systems

Abstract

This chapter investigates online optimal deployment of big data analytics jobs across geo-distributed regions, with unknown and uncertain information of inter-datacenter bandwidths and task execution durations on different virtual machines (VM). Geo-distributed big data analytics systems, which extend a single cluster-based MapReduce, Spark, or parameter server-based system to the Wide Area Network (WAN), to process data generated in different geographic locations. The centralized processing approach is time-consuming due to transmitting large volumes of data over bandwidth-constrained WAN links, and is costly for resource consumption. The chapter provides an online learning-based algorithm which does not rely on offline training, but can learn the near-optimal decisions for placing each type of jobs over time. The algorithm to compute task deployment in each stage of each job finishes within 1600ms for 500 tasks, 10 data centers, and 9 VM types. The chapter also investigates the multiple job scheduling problem with resource constraints in similar cases of runtime uncertainties.

Keywords:
Big data Analytics Computer science Data science Data analysis Data mining

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
1
Refs
0.05
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Graph Theory and Algorithms
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Distributed and Parallel Computing Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.