JOURNAL ARTICLE

An Improved Speculative Strategy for Heterogeneous Spark Cluster

Pengfei ZhangZonghuai Guo

Year: 2018 Journal:   MATEC Web of Conferences Vol: 173 Pages: 01018-01018   Publisher: EDP Sciences

Abstract

Apache Spark is an open-source in-memory cluster-computing framework. Spark decomposes an application into numerous tasks and assigns them to computing nodes for higher efficiency. However, in heterogeneous environments, some tasks become stragglers because of poor performance of some computing nodes, data skew, etc. These stragglers can affect cluster performance seriously since a job completes just when the last undertaking completions. To mitigate stragglers, Spark uses speculative execution which recognizes slow tasks and picks the node to run speculative task, but the low accuracy in identification and simple way of backing up will further extend the execution time. Then we develop an improved speculative strategy, DBMTPE (Data-Based Multiple Phases Time Estimation), which selects stragglers by estimating their remaining time and chooses a proper way to run speculative task according to the cause. Experiment results show that DBMTPE can run applications up to 10.5% faster over Spark-Native and save computing resource at the same time.

Keywords:
SPARK (programming language) Computer science Speculative execution Task (project management) Node (physics) Skew Cluster (spacecraft) Big data Distributed computing Resource (disambiguation) Operating system Computer network

Metrics

3
Cited By
1.34
FWCI (Field Weighted Citation Impact)
4
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

A Spark Scheduling Strategy for Heterogeneous Cluster

Xuewen Zhang

Journal:   Cmc-computers Materials & Continua Year: 2018 Vol: 55 (3)Pages: 405-417
JOURNAL ARTICLE

Adaptive Scheduling Strategy for Heterogeneous Spark Cluster

佳俊 徐

Journal:   Computer Science and Application Year: 2016 Vol: 06 (11)Pages: 692-704
JOURNAL ARTICLE

Adaptive Task Scheduling Strategy for Heterogeneous Spark Cluster

YANG Zhiwei,ZHENG Quan,WANG Song,YANG Jian,ZHOU Lele

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2016
JOURNAL ARTICLE

Optimizing Speculative Execution in Spark Heterogeneous Environments

Zhongming FuZhuo Tang

Journal:   IEEE Transactions on Cloud Computing Year: 2019 Vol: 10 (1)Pages: 568-582
© 2026 ScienceGate Book Chapters — All rights reserved.