JOURNAL ARTICLE

Failure Analysis and Characterization of Scheduling Jobs in Google Cluster Trace

Abstract

Most public and private cloud providers have experienced failure in one of their services that may affect numerous applications and websites. Thus, in order to understand the causes of different types of failures and remediate the issue, failure analysis is one of the most critical steps. Failure analysis has been developed based on monitoring the most significant metrics of the system in order to study the behavior and frequency changes in the systems. Then, the monitored data will be stored in log files to be utilized for analysis and prediction tasks. In this paper, we primarily focus on analyzing and interpreting the characteristic behavior of finished/failed jobs in association with physically available resources using a publicly available dataset, Google cluster trace. The primary objective of our work is to enhance the understanding of job failure in cloud computing environments. Our results show a clear correlation between failed jobs and requested resources including memory, CPU, and disk space. Based on our results, we find that many techniques can be applied to increase the reliability and availability of cloud applications, such as developing scheduling algorithms, predicting job failure, limiting task resubmission or changing the priority policies.

Keywords:
Computer science Cloud computing Scheduling (production processes) TRACE (psycholinguistics) Limiting Reliability (semiconductor) Cluster (spacecraft) Task (project management) Distributed computing Operating system Engineering

Metrics

30
Cited By
4.46
FWCI (Field Weighted Citation Impact)
9
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.