Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster

Description:

Low-performance but more energy efficient ... Benchmarks: NAS Parallel Benchmarks (NPB) 13 of 17. Results: Multiple Gear Benefit ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 18
Provided by: huaxi
Category:

less

Transcript and Presenter's Notes

Title: Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster


1
Using Multiple Energy Gears in MPI Programs on a
Power-Scalable Cluster
  • Vincent W. Freeh, David K. Lowenthal, Feng Pan,
    and Nandani Kappiah
  • Presented by Huaxia Xia
  • CSAG, CSE of UCSD

2
Introduction
  • Power-aware Computing
  • HPC Uses Large-scale Systems, Has High Power
    Consumption
  • Two extremes
  • Performance-at-all-costs
  • Low-performance but more energy efficient
  • This paper targets to save energy with little
    performance penalty

3
Related Work
  • Server/Desktop Systems
  • Minimize the number of servers needed to handle
    the load, and set other servers into low-energy
    state (standby or power-off)
  • Set node voltage independently
  • Disk
  • Modulate the speed of disks dynamically
  • Improve cache policy
  • Aggregate disk accesses to have burst requests
  • Mobile Systems
  • Energy-aware OS
  • Voltage-changeable CPU
  • Disk spindown
  • Memory
  • Network

4
Assumptions
  • HPC Applications
  • Performance is the Primary Concern
  • Highly Regular and Predictable
  • CPU has Multiple Gears
  • Variable Frequency
  • Variable Voltage
  • CPU is a Major Power Consumer
  • Energy consumption of disks/memory/network is
    not considered

5
Methodology Profile-Directed
  • Get Program Trace
  • Divide the Program into Blocks
  • Merge the Blocks into Phases
  • Search the Best Gear for Each Phase Heuristically

6
Divide Codes into Blocks
  • Rule 1 Any MPI operation demarcates a block
    boundary.
  • Rule 2 If the memory pressure changes abruptly,
    a block boundary occurs at this change.
  • Use operations per miss (OPM) as a measure of the
    memory pressure

7
Merge Blocks into Phases
  • Two adjacent blocks are merged into a phase if
    their corresponding memory pressure is within the
    same threshold
  • OPM in Trace of LU (Class C)

8
Data Collection
  • Use MPI-jack
  • Intercept any MPI call transparently
  • Can execute arbitrary codes before/after an
    intercepted call
  • Insert pseudo MPI calls at non-MPI phase
    boundaries
  • Collect information of time, operations, L2
    misses
  • Question Mutual Dependence?
  • Trace data ?? Block boundaries

9
Solution Search (1)
  • Metrics Energy-Time Tradeoff
  • Normalized energy and time
  • Total system energy
  • A larger negative number indicates a near
    vertical slope and a significant energy saving
  • Question How to measure energy consumption
    accurately?

10
Solution Search (2)
  • Phase Prioritization
  • Sort the phases in the order of OPM (low?high)
  • Question why is sorting necessary?
  • Novel Heuristic Search
  • Find the local optimal gear for each phase one by
    one
  • Running time is at most ng

11
Solution Search (3)
12
Experiments
  • 10 AMD Athlon-64 CPUs
  • Frequency-scalable 800-2000MHz
  • Voltage-scalable 0.9-1.5V
  • 1GB main memory
  • 128KB L1 cache, 512KB L2 cache
  • 100Mb/s network
  • CPU Consumes 45-55 of Overall System Energy
  • Benchmarks NAS Parallel Benchmarks (NPB)

13
Results Multiple Gear Benefit
  • IS 16 energy saving with 1 extra time
  • BT 10 energy saving with 5 extra time
  • MG 11 energy saving with 4 extra time

14
Results Single Gear Benefit
  • CG 8 energy saving with 3 extra time
  • SP 15 energy saving with 7 extra time

The order of phases matters!
15
Results No Benefit
16
Conclusions and Future Work
  • Use Profile-directed Method to Achieve Good
    Energy-Time Tradeoff for HPC Applications
  • Future work
  • Enhance profile-directed techniques
  • Consider Inter-node bottlenecks
  • Automate the entire process

17
Discussion
  • How important is power consumption to HPC?
  • 10 energy ? ? ? 5 time
  • Is Profile-directed method practical?
  • Effective for applications that run repeatedly
  • How much degree of automatic?
  • Is OPM (Operations Per Miss) a good metric to
    find phases?
  • Key Purpose to identify CPU utilization
  • Other options Instructions Per Second, CPU Usage
  • Is OPM a good metric to sort phases?
Write a Comment
User Comments (0)
About PowerShow.com