DISSERTATION

Performance Evaluation of Byzantine Fault Detection in Primary/Backup Systems

Abstract

ZooKeeper masks crash failure of servers to provide a highly available, distributed coordination kernel; however, in production, not all failures are crash failures. Bugs in underlying software systems and hardware can corrupt the ZooKeeper replicas, leading to a data loss. Since ZooKeeper is used as a ‘source of truth’ for mission-critical applications, it should handle such arbitrary faults to safeguard reliability. Byzantine fault-tolerant (BFT) protocols were developed to handle such faults. However, these protocols are not suitable to build practical systems as they are expensive in all important dimensions: development, deployment, complexity, and performance. ZooKeeper takes an alternative approach that focuses on detecting faulty behavior rather than tolerating it and thus providing improved reliability without paying the full expense of BFT protocols. In this thesis, we studied various techniques used for detecting non-malicious Byzantine faults in the ZooKeeper. We also analyzed the impact of using these techniques on the reliability and the performance of the overall system. Our evaluation shows that a realtime digest-based fault detection technique can be employed in the production to provide improved reliability with a minimal performance penalty and no additional operational cost. We hope that our analysis and evaluation can help guide the design of next-generation primary-backup systems aiming to provide high reliability.

Keywords:
Backup Byzantine fault tolerance Computer science Quantum Byzantine agreement Replication (statistics) Reliability (semiconductor) Fault tolerance Reliability engineering Distributed computing Software deployment Software fault tolerance Crash Engineering Operating system

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
19
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Distributed systems and fault tolerance
Physical Sciences →  Computer Science →  Computer Networks and Communications
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.