An Improved Speculative Strategy for Heterogeneous Spark Cluster

Pengfei Zhang; Zonghuai Guo

doi:10.1051/matecconf/201817301018

ScienceGate Book Chapters

JOURNAL ARTICLE

An Improved Speculative Strategy for Heterogeneous Spark Cluster

Pengfei Zhang Zonghuai Guo

Year: 2018 Journal: MATEC Web of Conferences Vol: 173 Pages: 01018-01018 Publisher: EDP Sciences

DOI: 10.1051/matecconf/201817301018

Get Full-Text PDF Get Analytical Report

Abstract

Apache Spark is an open-source in-memory cluster-computing framework. Spark decomposes an application into numerous tasks and assigns them to computing nodes for higher efficiency. However, in heterogeneous environments, some tasks become stragglers because of poor performance of some computing nodes, data skew, etc. These stragglers can affect cluster performance seriously since a job completes just when the last undertaking completions. To mitigate stragglers, Spark uses speculative execution which recognizes slow tasks and picks the node to run speculative task, but the low accuracy in identification and simple way of backing up will further extend the execution time. Then we develop an improved speculative strategy, DBMTPE (Data-Based Multiple Phases Time Estimation), which selects stragglers by estimating their remaining time and chooses a proper way to run speculative task according to the cause. Experiment results show that DBMTPE can run applications up to 10.5% faster over Spark-Native and save computing resource at the same time.

Keywords:

SPARK (programming language) Computer science Speculative execution Task (project management) Node (physics) Skew Cluster (spacecraft) Big data Distributed computing Resource (disambiguation) Operating system Computer network

Metrics

Cited By

1.34

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Software System Performance and Reliability

Physical Sciences → Computer Science → Computer Networks and Communications

An Improved Speculative Strategy for Heterogeneous Spark Cluster

Abstract

Metrics

Citation History

Topics

Related Documents

Optimized Speculative Execution Strategy for Different Workload Levels in Heterogeneous Spark Cluster

A Spark Scheduling Strategy for Heterogeneous Cluster

Adaptive Scheduling Strategy for Heterogeneous Spark Cluster

Adaptive Task Scheduling Strategy for Heterogeneous Spark Cluster

Optimizing Speculative Execution in Spark Heterogeneous Environments