Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications - PowerPoint PPT Presentation

About This Presentation

Title:

Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

Description:

Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications – PowerPoint PPT presentation

Number of Views:131

Avg rating:3.0/5.0

Slides: 23

Provided by: Danie640

Learn more at: http://cs.sou.edu

Category:

more less

Transcript and Presenter's Notes

Title: Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

1
Latency Hiding in Dynamic Partitioning and Load
Balancing of Grid Computing Applications

Sajal K. Das and Daniel J. Harvey
Department of Computer Science and Engineering
The University of Texas at Arlington
E-mail das,harvey_at_cse.uta.edu
Rupak Biswas
NASA Ames Research Center
E-mail rbiswas_at_nas.nasa.gov

2
Presentation Overview

The Information Power Grid (IPG)
Motivations
Load Balancing and Partitioning
Our Contributions
The new MinEX Partitioner
Experimental Study
Performance Results
Conclusions and Ongoing Research

3
The Information Power Grid (IPG)

Harness the power of geographically separated
resources
Developed by NASA and other collaborative
partners
Utilize a distributed environment to solve
large-scale computational problems
Additional relevant applications identified by
I-Way experiment
Remote access to large databases with high-end
graphics facilities
Remote virtual reality access to instruments
Remote interactions with supercomputer simulations

4
Motivations

Develop techniques to enhance the feasibility of
running applications on the IPG
Effective load-balancer/partitioner for a
distributed environment
Allow for latency tolerance to overcome low
bandwidths
Predict application performance by simulationof
IPG

5
Load Balancing and Partitioning

GOAL Distribute workload evenly among
processors
Static load balancers
Balance load prior to execution
Examples smart-compilers, schedulers
Dynamic load balancers
Balance as application is processed
Examples adaptive contracting, gradient,
symmetric broadcast networks
Semi-dynamic load balancers
Temporarily stop processing to balance workload
Utilize a partitioning technique
Examples MeTiS, Jostle, PLUM

6
Our Contributions

Limitations of existing partitioners
Separate partitioning and data redistribution
steps
Lack of latency tolerance
Balance loads with excessive communication and
data movement
Propose a new partitioner (MinEX) for IPG
environment
Minimize total runtime rather than balancing
workload
Compensate for high latency on the IPG
Compare with existing methods

7
The MinEX Partitioner

Diffusive algorithm with goal to minimize total
runtime
User-supplied function for latency tolerance
Account for data redistribution cost during
partitioning
Collapse pairs of vertices incrementally
Partition the contracted graph
Refine graph gradually to original in reverse
order
Vertex reassignment considered at each refinement

8
Metrics Utilized

Processing Weight
Wgtv PWgtv x Procc
Communication Cost
Comm
???CWgt(v,w) x Connect(cp,cq)
Redistribution Cost
Remap
RWgtv x Connect(Cp,Cq) if p q
Weighted Queue Length
QWgt(p) ??(Wgtv Comm Remap )

Heaviest load (MaxQWgt)
Lightest load (MinQWgt)
Average load (AvgQWgt)
Total system load QWgtToT ?QWgt(p)
Load Imbalance Factor
LoadImb MaxQWgt/AvgQWgt

v p
v p

v p
v p
9
MinVar, Gain, and ThroTTle

Processor workload variance from MinQWgt
MinVar ?p(QWgt(p) - MinQWgt)2
?MinVar reflects the improvement in MinVar after
a vertex reassignment
Gain is the change(?QWgtToT) to total system load
resulting from a vertex reassignment
ThroTTle is a user defined parameter
Vertex moves that improve ?MinVar are allowed if
Gain/Throttle lt ?MinVar

10
MinEX Data Structures

Mesh V, E, vTot, VMap, VList, EList
V Number of active vertices
E Total number of edges
vTot Total number of vertices
VMap Pointer to list of active vertices
VList Pointer to complete list of vertices
EList Pointer to list of edges
EList entries contains w,CWgt(v,w)
w adjacent vertex
CWgt(v,w) edge communication weight

11
MinEX Data Structures(continued)

VList (for each vertex v) PWgt, RWgt, e, e,
merge, lookup, VMap, heap, border
PWgt Computational weight
RWgt Redistribution weight
e Number of incident edges
e Pointer to the first edge
merge Vertex that merged with v (or -1)
lookup Active vertex containing v (or -1)
VMap Pointer to vs position in VMap
heap Pointer to heap entry for v
border Indicates if v is a border vertex

12
Minex Contraction Phase

Form meta-verticesby collapsing edges
Use maximalCWgt(v,w) / (RWgtvRWgtw)

Procedure Find(v)If (merge -1) Return vIf
(lookup ! -1) And (lookup lt vTot) Then
Return lookup Find(lookup) Else Return
lookup Find(merge)

13
MinEX Partition Phase

Contracted graph allows efficient partitioning
Heap with pointers is created
For each vertex, compute optimal reassignment
?MinVar, Gain, and ThroTTle criteria satisfied
Vertices are added to the Gain min-heap
The VList heap pointer is set
Heap is adjusted as vertices are reassigned
Process stops when heap becomes empty

14
MinEX Refinement Phase

Refinement proceeds in reverse order from
contraction through popping vertex pairs off the
stack
Reassignment of each refined vertex
consideredand partitioning process restarted
Vertex lookup and merge values reset by following
the merge chain when edges are accessed(if
lookup gt vTot)

15
Analysis of ThroTTle Values (P32)

Expected MaxQWgt Varying ThroTTle

Expected LoadImb Varying ThroTTle

ThroTTle Values
ThroTTle Values
16
Latency Tolerance Approach

Move data sets and edge data first
Achieve latency tolerance by overlapping
processing with communication
Optimistic view Processing completely hides the
latency
Pessimistic view No latency hiding occurs
Application passes to MinEX the latency hiding
function

1. Send data sets to be moved
2. Send edge data
3. Process vertices not waiting for edge
communication
4. Receive, unpack remapped data sets
5. Receive, unpack communication data
6. Repeat steps 2-5 until all vertices are
processed

17
Experimental StudySimulation of an IPG
Environment

Configuration File defines clusters, processors,
and interconnect slowdowns
Processors in a cluster are assumed homogeneous
Connect(c1, c2) interconnect slowdown
betweenclusters c1 and c2 (unity for no
slowdown)
If c1 c2, Connect(c1, c2) intraconnect
slowdown
Procc represents the processing slowdown
(normalized to unity) within a cluster
Configuration File mapped to processing graph by
MinEX so actual vertex assignments in the
distributed environment can be modeled

18
Test ApplicationUnstructured Adaptive Mesh

Time-dependent shock wave propagated thru
cylindrical volume
Tetrahedral mesh discretization
Coarsen previously refined elements
Mesh grows from 50K to 1.8M tets over nine
adaptation levels
Workload becomes unbalanced as mesh is adapted

19
Characteristics Of Test Application

Mesh elements interact only with immediate
neighbors
High communication and remapping costs
Numerical solver not included

20
MinEX Partitioner Performance

SBN Dynamic load-balancer based on Symmetric
Broadcast Network that was adapted for mesh
applications
PLUM Semi-dynamic framework for processing
adaptive, unstructured meshes
MinEX comparisons with SBN and PLUM

21
Experimental Results(P32)

Expected runtimes(no latency tolerance)

Expected runtimes (maximum latency tolerance)

INTERCONNECT SLOWDOWNS
INTERCONNECT SLOWDOWNS
Runtimes in thousands of units
22
Conclusions Ongoing Research

Introduced a new partitioner called MinEX and
experimented in simulated IPG environments
Runtimes increase with larger slowdowns as
clusters are added
Additional clusters increase benefits of latency
tolerance
Estimated runtimes with MinEX improved by a
factor of five over no partitioning
Currently applying MinEX to the N-body problem
(Barnes-Hut algorithm)