Software Architecture for Dynamic Thermal Management in Datacenters - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Software Architecture for Dynamic Thermal Management in Datacenters

Description:

High server density to achieve higher computation capability ... Reliability and longevity of the overheated servers is affected - System downtime may increase ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 19
Provided by: int6117
Category:

less

Transcript and Presenter's Notes

Title: Software Architecture for Dynamic Thermal Management in Datacenters


1
Software Architecture for Dynamic Thermal
Management in Datacenters
  • Tridib Mukherjee
  • Graduate Research Assistant
  • IMPACT Lab (www.impact.asu.edu)
  • Department of Comp. Sc. Engg.
  • Arizona State University

2
Outline
  • Motivation
  • Dynamic Thermal Management in Datacenters
  • Thermal-aware task scheduling
  • Software Architecture
  • Conclusions and Future work

3
Motivation
  • Computing clusters are increasingly deployed in
    current datacenters limited by power and thermal
    capacity
  • High server density to achieve higher computation
    capability - Leads to high heat density
  • Reliability and longevity of the overheated
    servers is affected - System downtime may
    increase
  • Rising cost for datacenters
  • Large scale datacenters can run into millions of
    dollars - Cooling cost comprises almost half of
    this
  • Current trend of overcooling based on worst case
    thermal characteristics lead to high utilities
    cost
  • A dynamic thermal-aware control platform is
    necessary for online thermal evaluation that can
    achieve a tradeoff between these extremes.

4
Thermal Management of Datacenter
  • Motivation and significance
  • Compute Intensive Applications (Online Gaming,
    Computer Movie Animation, Data Mining) requiring
    increased utilization of Data Center
  • Maximizing computing capacity is a demanding
    requirement
  • New blade servers can be packed more densely
  • Energy cost is rising dramatically
  • Goal
  • Improving thermal performance
  • Lowering hardware failure rate
  • Reducing energy cost

5
Typical layout of a datacenter
  • Rack outlet temperature Tout
  • Rack inlet temperature Tin
  • Air conditioner supply temperature Ts

6
Schematic View of Thermal Management
7
Research Issues of Thermal Management in
Datacenter
Control
Understanding
8
Task scheduling and Thermal Distribution
Co-relation
Task Assignment
Task Assignment
Cooling lowered Inlet temperature lowered Blow
redline threshold
Inlet temperature distribution without Cooling
Power Consumption Distribution
Power Consumption Distribution
Demand for cooling load /energy
Temperature Distribution
25?C
  • Reaction Chain

Energy Cost
Demand for cooling load/energy
  • Scheduling Requirements
  • Real-time measurement
  • Online lightweight temperature prediction
  • Thermal-awareness in the scheduling decisions

25?C
9
Thermal-aware scheduling Techniques
  • Uniform Task distribution (UT)
  • Assigning all chassis the same amount of tasks
    (power consumptions)
  • Uniform Outlet Profile (UOP)
  • Assigning tasks in a way trying to achieve outlet
    temperature balance (uniform distribution)
  • Minimum Computing Energy (coolest inlet) (MCE)
  • Assigning tasks in a way to keep the number of
    active (power on) chassis as small as possible
  • Recirculation Minimized Scheduling (XInt)
  • Use profiling process to calculate cross
    interference coefficients

10
Total Energy Cost Comparisons
11
System Model Cluster Set-up
  • Saguaro Cluster is the main cluster maintained
    by the High Performance Computing Initiative at
    ASU.
  • 4 racks, 5 chassis per rack, 10 dual-processors
    per chassis

12
Cluster Management S/W Infrastructure
Moab Cluster Management GUI
  • We used Moab scheduler for job allocation in this
    cluster.
  • Easy to use
  • Provides good graphical interface in the form of
    Moab Cluster Manager (MCM).
  • Job re-allocation is allowed based on priority
  • uses of the underlying resource management
    software (such as torque) and enforces the
    scheduling policies (such as fair-share) selected
    from the GUI
  • Thermal awareness is integrated into the Moab
    Scheduler.
  • Priority is set as a function of temperature,
    utilization, etc.
  • PHP based datacenter visualization.

Moab Server
Resource Management (Torque)
Data Center
13
Chassis Level Sensor Data Collection
3 housing Temperature sensors at middle of the
chassis
  • SNMP based script periodically queries sensors
    and updates server database
  • PHP script periodically accesses the database for
    presenting the thermal history in the webpage

Sensor Placement at each chassis
11 outlet Temperature sensors at back of the
chassis
There is only one inlet sensor at the front of
the chassis
14
Visualization and Scheduler Integration
  • Temperature data is included as Generic Metric
    (GMETRIC) in Moab.
  • Node priority is set based on moab GMETRIC data.

15
Putting it all together Software Architecture
Presentation
Scheduling Control
Datacenter Servers
Access data from the chassis level sensors
16
Modularized Implementation of Thermal Awareness
in Task Scheduling
17
Conclusions
  • Proposed Architecture
  • enables dynamic on-line thermal management
    during datacenter operation.
  • provides visualization of thermal distribution
  • Implemented in fully operational ASU datacenter.
  • Prototype development and demonstration at the
    Research _at_ Intel day.

18
Questions ??
Write a Comment
User Comments (0)
About PowerShow.com