Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster

About This Presentation

Title:

Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster

Description:

Low-performance but more energy efficient ... Benchmarks: NAS Parallel Benchmarks (NPB) 13 of 17. Results: Multiple Gear Benefit ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 18

Provided by: huaxi

Category:

more less

Transcript and Presenter's Notes

Title: Using Multiple Energy Gears in MPI Programs on a PowerScalable Cluster

1
Using Multiple Energy Gears in MPI Programs on a
Power-Scalable Cluster

Vincent W. Freeh, David K. Lowenthal, Feng Pan,
and Nandani Kappiah
Presented by Huaxia Xia
CSAG, CSE of UCSD

2
Introduction

Power-aware Computing
HPC Uses Large-scale Systems, Has High Power
Consumption
Two extremes
Performance-at-all-costs
Low-performance but more energy efficient
This paper targets to save energy with little
performance penalty

3
Related Work

Server/Desktop Systems
Minimize the number of servers needed to handle
the load, and set other servers into low-energy
state (standby or power-off)
Set node voltage independently
Disk
Modulate the speed of disks dynamically
Improve cache policy
Aggregate disk accesses to have burst requests
Mobile Systems
Energy-aware OS
Voltage-changeable CPU
Disk spindown
Memory
Network

4
Assumptions

HPC Applications
Performance is the Primary Concern
Highly Regular and Predictable
CPU has Multiple Gears
Variable Frequency
Variable Voltage
CPU is a Major Power Consumer
Energy consumption of disks/memory/network is
not considered

5
Methodology Profile-Directed

Get Program Trace
Divide the Program into Blocks
Merge the Blocks into Phases
Search the Best Gear for Each Phase Heuristically

6
Divide Codes into Blocks

Rule 1 Any MPI operation demarcates a block
boundary.
Rule 2 If the memory pressure changes abruptly,
a block boundary occurs at this change.
Use operations per miss (OPM) as a measure of the
memory pressure

7
Merge Blocks into Phases

Two adjacent blocks are merged into a phase if
their corresponding memory pressure is within the
same threshold
OPM in Trace of LU (Class C)

8
Data Collection

Use MPI-jack
Intercept any MPI call transparently
Can execute arbitrary codes before/after an
intercepted call
Insert pseudo MPI calls at non-MPI phase
boundaries
Collect information of time, operations, L2
misses
Question Mutual Dependence?
Trace data ?? Block boundaries

9
Solution Search (1)

Metrics Energy-Time Tradeoff
Normalized energy and time
Total system energy
A larger negative number indicates a near
vertical slope and a significant energy saving
Question How to measure energy consumption
accurately?

10
Solution Search (2)

Phase Prioritization
Sort the phases in the order of OPM (low?high)
Question why is sorting necessary?
Novel Heuristic Search
Find the local optimal gear for each phase one by
one
Running time is at most ng

11
Solution Search (3)
12
Experiments

10 AMD Athlon-64 CPUs
Frequency-scalable 800-2000MHz
Voltage-scalable 0.9-1.5V
1GB main memory
128KB L1 cache, 512KB L2 cache
100Mb/s network
CPU Consumes 45-55 of Overall System Energy
Benchmarks NAS Parallel Benchmarks (NPB)

13
Results Multiple Gear Benefit

IS 16 energy saving with 1 extra time
BT 10 energy saving with 5 extra time
MG 11 energy saving with 4 extra time

14
Results Single Gear Benefit

CG 8 energy saving with 3 extra time
SP 15 energy saving with 7 extra time

The order of phases matters!
15
Results No Benefit
16
Conclusions and Future Work

Use Profile-directed Method to Achieve Good
Energy-Time Tradeoff for HPC Applications
Future work
Enhance profile-directed techniques
Consider Inter-node bottlenecks
Automate the entire process

17
Discussion