Title: Software Architecture for Dynamic Thermal Management in Datacenters
1Software Architecture for Dynamic Thermal
Management in Datacenters
- Tridib Mukherjee
- Graduate Research Assistant
- IMPACT Lab (www.impact.asu.edu)
- Department of Comp. Sc. Engg.
- Arizona State University
2Outline
- Motivation
- Dynamic Thermal Management in Datacenters
- Thermal-aware task scheduling
- Software Architecture
- Conclusions and Future work
3Motivation
- Computing clusters are increasingly deployed in
current datacenters limited by power and thermal
capacity - High server density to achieve higher computation
capability - Leads to high heat density - Reliability and longevity of the overheated
servers is affected - System downtime may
increase - Rising cost for datacenters
- Large scale datacenters can run into millions of
dollars - Cooling cost comprises almost half of
this - Current trend of overcooling based on worst case
thermal characteristics lead to high utilities
cost
- A dynamic thermal-aware control platform is
necessary for online thermal evaluation that can
achieve a tradeoff between these extremes.
4Thermal Management of Datacenter
- Motivation and significance
- Compute Intensive Applications (Online Gaming,
Computer Movie Animation, Data Mining) requiring
increased utilization of Data Center - Maximizing computing capacity is a demanding
requirement - New blade servers can be packed more densely
- Energy cost is rising dramatically
- Goal
- Improving thermal performance
- Lowering hardware failure rate
- Reducing energy cost
5Typical layout of a datacenter
- Rack outlet temperature Tout
- Rack inlet temperature Tin
- Air conditioner supply temperature Ts
6Schematic View of Thermal Management
7Research Issues of Thermal Management in
Datacenter
Control
Understanding
8Task scheduling and Thermal Distribution
Co-relation
Task Assignment
Task Assignment
Cooling lowered Inlet temperature lowered Blow
redline threshold
Inlet temperature distribution without Cooling
Power Consumption Distribution
Power Consumption Distribution
Demand for cooling load /energy
Temperature Distribution
25?C
Energy Cost
Demand for cooling load/energy
- Scheduling Requirements
- Real-time measurement
- Online lightweight temperature prediction
- Thermal-awareness in the scheduling decisions
25?C
9Thermal-aware scheduling Techniques
- Uniform Task distribution (UT)
- Assigning all chassis the same amount of tasks
(power consumptions) - Uniform Outlet Profile (UOP)
- Assigning tasks in a way trying to achieve outlet
temperature balance (uniform distribution) - Minimum Computing Energy (coolest inlet) (MCE)
- Assigning tasks in a way to keep the number of
active (power on) chassis as small as possible - Recirculation Minimized Scheduling (XInt)
- Use profiling process to calculate cross
interference coefficients
10Total Energy Cost Comparisons
11System Model Cluster Set-up
- Saguaro Cluster is the main cluster maintained
by the High Performance Computing Initiative at
ASU. - 4 racks, 5 chassis per rack, 10 dual-processors
per chassis
12Cluster Management S/W Infrastructure
Moab Cluster Management GUI
- We used Moab scheduler for job allocation in this
cluster. - Easy to use
- Provides good graphical interface in the form of
Moab Cluster Manager (MCM). - Job re-allocation is allowed based on priority
- uses of the underlying resource management
software (such as torque) and enforces the
scheduling policies (such as fair-share) selected
from the GUI - Thermal awareness is integrated into the Moab
Scheduler. - Priority is set as a function of temperature,
utilization, etc. - PHP based datacenter visualization.
Moab Server
Resource Management (Torque)
Data Center
13Chassis Level Sensor Data Collection
3 housing Temperature sensors at middle of the
chassis
- SNMP based script periodically queries sensors
and updates server database - PHP script periodically accesses the database for
presenting the thermal history in the webpage
Sensor Placement at each chassis
11 outlet Temperature sensors at back of the
chassis
There is only one inlet sensor at the front of
the chassis
14Visualization and Scheduler Integration
- Temperature data is included as Generic Metric
(GMETRIC) in Moab. - Node priority is set based on moab GMETRIC data.
15Putting it all together Software Architecture
Presentation
Scheduling Control
Datacenter Servers
Access data from the chassis level sensors
16Modularized Implementation of Thermal Awareness
in Task Scheduling
17Conclusions
- Proposed Architecture
- enables dynamic on-line thermal management
during datacenter operation. - provides visualization of thermal distribution
- Implemented in fully operational ASU datacenter.
- Prototype development and demonstration at the
Research _at_ Intel day.
18Questions ??