JOURNAL ARTICLE

Performance Implications of SSDs in Virtualized Hadoop Clusters

Abstract

BigData manipulates a massive volume of data for which the traditional techniques are not effective. Apache Hadoop is currently a most popular software framework supporting BigData analysis. As the scale of Hadoop cluster grows larger, building Hadoop clusters in virtualized environment draws a great attention. However, the performance optimization of Hadoop cluster in virtualized environment is difficult because of the virtualization overhead. In this paper the performance implications of SSDs in virtualized Hadoop clusters is identified and the overhead of virtualization is shown to be minimized with SSDs. The study presented in this paper reveals that the main virtualization overhead is I/O bottleneck due to fragmented and randomized I/O workload aggravated by virtualization. However, SSDs are more tolerable to the workload than HDDs. As a result, the virtualization overhead with SSDs is much less than with HDDs. Also, in the case of SSDs, the virtualized Hadoop cluster sustains good performance regardless of the number of VMs.

Keywords:
Virtualization Computer science Operating system Bottleneck Big data Overhead (engineering) Workload Cluster (spacecraft) Virtual machine Embedded system Cloud computing

Metrics

3
Cited By
1.61
FWCI (Field Weighted Citation Impact)
16
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
Advanced Data Storage Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.