Optimizing communication and cooling costs in HPC data centers via intelligent job allocation

Fulya Kaplan; Jie Meng; Ayse K. Coskun

doi:10.1109/igcc.2013.6604521

ScienceGate Book Chapters

JOURNAL ARTICLE

Optimizing communication and cooling costs in HPC data centers via intelligent job allocation

Fulya Kaplan Jie Meng Ayse K. Coskun

Year: 2013 Pages: 1-10

DOI: 10.1109/igcc.2013.6604521

Get Full-Text PDF Get Analytical Report

Abstract

Nearly half of the energy in the computing clusters today is consumed by the cooling infrastructure. It is possible to reduce the cooling cost by allowing the data center temperatures to rise; however, component reliability constraints impose thermal thresholds as failure rates are exponentially dependent on the processor temperatures. Existing thermally-aware job allocation policies optimize the cooling costs by minimizing the peak inlet temperatures of the server nodes. An important constraint in high performance computing (HPC) data centers, however, is performance. Specifically, HPC data centers run multi-threaded applications with significant communication among the threads. Thus, performance of such applications is strongly affected by the job allocation decisions. This paper proposes a novel job allocation methodology to jointly minimize communication cost of an HPC application while also reducing the cooling energy. The proposed method also considers temperature-dependent hardware reliability as part of the optimization.

Keywords:

Computer science Reliability (semiconductor) Data center Component (thermodynamics) Distributed computing Efficient energy use Constraint (computer-aided design) Reliability engineering Real-time computing Computer network Engineering

Metrics

Cited By

4.09

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Distributed and Parallel Computing Systems

Physical Sciences → Computer Science → Computer Networks and Communications

Parallel Computing and Optimization Techniques

Physical Sciences → Computer Science → Hardware and Architecture

Optimizing communication and cooling costs in HPC data centers via intelligent job allocation

Abstract

Metrics

Citation History

Topics

Related Documents

Communication and cooling aware job allocation in data centers for communication-intensive workloads

Optimizing resource allocation in intelligent communication networks

Optimizing Resource Allocation in Hierarchically Distributed Data Centers

Optimizing Network-Aware Resource Allocation in Cloud Data Centers

Optimizing Data Centre Energy Efficiency with Dynamic Resource Allocation and Intelligent Cooling Management through Machine Learning