Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

Description:

Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 23
Provided by: Danie640
Learn more at: http://cs.sou.edu
Category:

less

Transcript and Presenter's Notes

Title: Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications


1
Latency Hiding in Dynamic Partitioning and Load
Balancing of Grid Computing Applications
  • Sajal K. Das and Daniel J. Harvey
  • Department of Computer Science and Engineering
  • The University of Texas at Arlington
  • E-mail das,harvey_at_cse.uta.edu
  • Rupak Biswas
  • NASA Ames Research Center
  • E-mail rbiswas_at_nas.nasa.gov

2
Presentation Overview
  • The Information Power Grid (IPG)
  • Motivations
  • Load Balancing and Partitioning
  • Our Contributions
  • The new MinEX Partitioner
  • Experimental Study
  • Performance Results
  • Conclusions and Ongoing Research

3
The Information Power Grid (IPG)
  • Harness the power of geographically separated
    resources
  • Developed by NASA and other collaborative
    partners
  • Utilize a distributed environment to solve
    large-scale computational problems
  • Additional relevant applications identified by
    I-Way experiment
  • Remote access to large databases with high-end
    graphics facilities
  • Remote virtual reality access to instruments
  • Remote interactions with supercomputer simulations

4
Motivations
  • Develop techniques to enhance the feasibility of
    running applications on the IPG
  • Effective load-balancer/partitioner for a
    distributed environment
  • Allow for latency tolerance to overcome low
    bandwidths
  • Predict application performance by simulationof
    IPG

5
Load Balancing and Partitioning
  • GOAL Distribute workload evenly among
    processors
  • Static load balancers
  • Balance load prior to execution
  • Examples smart-compilers, schedulers
  • Dynamic load balancers
  • Balance as application is processed
  • Examples adaptive contracting, gradient,
    symmetric broadcast networks
  • Semi-dynamic load balancers
  • Temporarily stop processing to balance workload
  • Utilize a partitioning technique
  • Examples MeTiS, Jostle, PLUM

6
Our Contributions
  • Limitations of existing partitioners
  • Separate partitioning and data redistribution
    steps
  • Lack of latency tolerance
  • Balance loads with excessive communication and
    data movement
  • Propose a new partitioner (MinEX) for IPG
    environment
  • Minimize total runtime rather than balancing
    workload
  • Compensate for high latency on the IPG
  • Compare with existing methods

7
The MinEX Partitioner
  • Diffusive algorithm with goal to minimize total
    runtime
  • User-supplied function for latency tolerance
  • Account for data redistribution cost during
    partitioning
  • Collapse pairs of vertices incrementally
  • Partition the contracted graph
  • Refine graph gradually to original in reverse
    order
  • Vertex reassignment considered at each refinement

8
Metrics Utilized
  • Processing Weight
  • Wgtv PWgtv x Procc
  • Communication Cost
  • Comm
  • ???CWgt(v,w) x Connect(cp,cq)
  • Redistribution Cost
  • Remap
  • RWgtv x Connect(Cp,Cq) if p q
  • Weighted Queue Length
  • QWgt(p) ??(Wgtv Comm Remap )
  • Heaviest load (MaxQWgt)
  • Lightest load (MinQWgt)
  • Average load (AvgQWgt)
  • Total system load QWgtToT ?QWgt(p)
  • Load Imbalance Factor
  • LoadImb MaxQWgt/AvgQWgt

v p
v p

v p
v p
9
MinVar, Gain, and ThroTTle
  • Processor workload variance from MinQWgt
  • MinVar ?p(QWgt(p) - MinQWgt)2
  • ?MinVar reflects the improvement in MinVar after
    a vertex reassignment
  • Gain is the change(?QWgtToT) to total system load
    resulting from a vertex reassignment
  • ThroTTle is a user defined parameter
  • Vertex moves that improve ?MinVar are allowed if
    Gain/Throttle lt ?MinVar

10
MinEX Data Structures
  • Mesh V, E, vTot, VMap, VList, EList
  • V Number of active vertices
  • E Total number of edges
  • vTot Total number of vertices
  • VMap Pointer to list of active vertices
  • VList Pointer to complete list of vertices
  • EList Pointer to list of edges
  • EList entries contains w,CWgt(v,w)
  • w adjacent vertex
  • CWgt(v,w) edge communication weight

11
MinEX Data Structures(continued)
  • VList (for each vertex v) PWgt, RWgt, e, e,
    merge, lookup, VMap, heap, border
  • PWgt Computational weight
  • RWgt Redistribution weight
  • e Number of incident edges
  • e Pointer to the first edge
  • merge Vertex that merged with v (or -1)
  • lookup Active vertex containing v (or -1)
  • VMap Pointer to vs position in VMap
  • heap Pointer to heap entry for v
  • border Indicates if v is a border vertex

12
Minex Contraction Phase
  • Form meta-verticesby collapsing edges
  • Use maximalCWgt(v,w) / (RWgtvRWgtw)
  • Procedure Find(v)If (merge -1) Return vIf
    (lookup ! -1) And (lookup lt vTot) Then
    Return lookup Find(lookup) Else Return
    lookup Find(merge)

13
MinEX Partition Phase
  • Contracted graph allows efficient partitioning
  • Heap with pointers is created
  • For each vertex, compute optimal reassignment
  • ?MinVar, Gain, and ThroTTle criteria satisfied
  • Vertices are added to the Gain min-heap
  • The VList heap pointer is set
  • Heap is adjusted as vertices are reassigned
  • Process stops when heap becomes empty

14
MinEX Refinement Phase
  • Refinement proceeds in reverse order from
    contraction through popping vertex pairs off the
    stack
  • Reassignment of each refined vertex
    consideredand partitioning process restarted
  • Vertex lookup and merge values reset by following
    the merge chain when edges are accessed(if
    lookup gt vTot)

15
Analysis of ThroTTle Values (P32)
  • Expected MaxQWgt Varying ThroTTle
  • Expected LoadImb Varying ThroTTle

ThroTTle Values
ThroTTle Values
16
Latency Tolerance Approach
  • Move data sets and edge data first
  • Achieve latency tolerance by overlapping
    processing with communication
  • Optimistic view Processing completely hides the
    latency
  • Pessimistic view No latency hiding occurs
  • Application passes to MinEX the latency hiding
    function
  • 1. Send data sets to be moved
  • 2. Send edge data
  • 3. Process vertices not waiting for edge
    communication
  • 4. Receive, unpack remapped data sets
  • 5. Receive, unpack communication data
  • 6. Repeat steps 2-5 until all vertices are
    processed

17
Experimental StudySimulation of an IPG
Environment
  • Configuration File defines clusters, processors,
    and interconnect slowdowns
  • Processors in a cluster are assumed homogeneous
  • Connect(c1, c2) interconnect slowdown
    betweenclusters c1 and c2 (unity for no
    slowdown)
  • If c1 c2, Connect(c1, c2) intraconnect
    slowdown
  • Procc represents the processing slowdown
    (normalized to unity) within a cluster
  • Configuration File mapped to processing graph by
    MinEX so actual vertex assignments in the
    distributed environment can be modeled

18
Test ApplicationUnstructured Adaptive Mesh
  • Time-dependent shock wave propagated thru
    cylindrical volume
  • Tetrahedral mesh discretization
  • Coarsen previously refined elements
  • Mesh grows from 50K to 1.8M tets over nine
    adaptation levels
  • Workload becomes unbalanced as mesh is adapted

19
Characteristics Of Test Application
  • Mesh elements interact only with immediate
    neighbors
  • High communication and remapping costs
  • Numerical solver not included

20
MinEX Partitioner Performance
  • SBN Dynamic load-balancer based on Symmetric
    Broadcast Network that was adapted for mesh
    applications
  • PLUM Semi-dynamic framework for processing
    adaptive, unstructured meshes
  • MinEX comparisons with SBN and PLUM

21
Experimental Results(P32)
  • Expected runtimes(no latency tolerance)
  • Expected runtimes (maximum latency tolerance)

INTERCONNECT SLOWDOWNS
INTERCONNECT SLOWDOWNS
Runtimes in thousands of units
22
Conclusions Ongoing Research
  • Introduced a new partitioner called MinEX and
    experimented in simulated IPG environments
  • Runtimes increase with larger slowdowns as
    clusters are added
  • Additional clusters increase benefits of latency
    tolerance
  • Estimated runtimes with MinEX improved by a
    factor of five over no partitioning
  • Currently applying MinEX to the N-body problem
    (Barnes-Hut algorithm)
Write a Comment
User Comments (0)
About PowerShow.com