JOURNAL ARTICLE

GeoClone: Online Task Replication and Scheduling for Geo-Distributed Analytics under Uncertainties

Abstract

The execution and completion of analytics jobs can be significantly inflated by the slowest tasks contained. Despite task replication is well-adopted to reduce such straggler latency, existing replication strategies are unsuitable for geo-distributed analytics environments that are highly dynamic, uncertain, and heterogeneous. In this paper, we firstly model the task replication and scheduling problem over time, capturing the geo-analytics features. Afterwards, we design an online algorithm, GeoClone, to select tasks to replicate and select sites to execute the task replicas in an irrevocably online manner, through jointly considering the execution progress of each job and the resource performance in each site. We rigorously prove the competitive ratio to exhibit the theoretical performance guarantee of GeoClone, compared against the offline optimal algorithm which knows all the inputs at once beforehand. Finally, we implement GeoClone with Spark and Yarn for experiments and also conduct extensive large-scale simulations, which confirms GeoClone's practical superiority over multiple state-of-the-art replication strategies.

Keywords:
Computer science Replication (statistics) Distributed computing Analytics Scheduling (production processes) Latency (audio) Replicate Task (project management) SPARK (programming language) Data science Mathematical optimization

Metrics

5
Cited By
1.42
FWCI (Field Weighted Citation Impact)
45
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Optimization and Search Problems
Physical Sciences →  Computer Science →  Computer Networks and Communications
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing

Laiping ZhaoYanan YangAli MunirAlex X. LiuYue LiWenyu Qu

Journal:   IEEE Transactions on Parallel and Distributed Systems Year: 2019 Vol: 31 (2)Pages: 279-293
JOURNAL ARTICLE

Task Scheduling in Geo-Distributed Computing: A Survey

Y. WuShanjiang TangCe YuYang BinChao SunJian XiaoHutong WuJinghua Feng

Journal:   IEEE Transactions on Parallel and Distributed Systems Year: 2025 Vol: 36 (10)Pages: 2073-2088
JOURNAL ARTICLE

MapReduce Task Scheduling in Heterogeneous Geo-Distributed Data Centers

Xiaoping LiChen Fu-chaoRubén RuízJie Zhu

Journal:   IEEE Transactions on Services Computing Year: 2021 Vol: 15 (6)Pages: 3317-3329
JOURNAL ARTICLE

Cost-Minimizing Online Algorithms for Geo-Distributed Data Analytics

Jiao Ying HuangJing HuangShang GaoBo Yang

Journal:   IEEE Access Year: 2019 Vol: 7 Pages: 163515-163525
© 2026 ScienceGate Book Chapters — All rights reserved.