Load-Balancing - PowerPoint PPT Presentation

About This Presentation
Title:

Load-Balancing

Description:

... common on chips today], with bank busy time (bbt) of 8 cycles between accesses. ... By the time we call a13, we (just) avoid bbt. ... – PowerPoint PPT presentation

Number of Views:429
Avg rating:3.0/5.0
Slides: 35
Provided by: universit74
Category:
Tags: balancing | bbt | load

less

Transcript and Presenter's Notes

Title: Load-Balancing


1
Load-Balancing
2
Load-Balancing
  • What is load-balancing?
  • Dividing up the total work between processes when
    running codes on a parallel machine
  • Load-balancing constraints
  • Minimize interprocess communication
  • Also called
  • partitioning, mesh partitioning, (domain
    decomposition)

3
Know your data and memory
  • Memory is organized by banks. Between access to
    any bank, there is a latency period.
  • Matrix entries are stored column-wise in
    FORTRAN.

4
Matrix addressing in FORTRAN

is addressed
5
Addressing Memory
  • For illustration purposes, lets imagine 8 banks
    128 or 256 common on chips today, with bank
    busy time (bbt) of 8 cycles between accesses.
    Thus we have
  • data a13 a23 a33 a43 a14 a24
    a34 a44
  • data a11 a21 a31 a41 a12 a22
    a32 a42
  • bank 1 2 3 4 5
    6 7 8

6
Addressing Memory
  • If we access data column-wise, we proceed through
    each bank in order. By the time we call a13, we
    (just) avoid bbt.
  • On the other hand, if we access data row-wise, we
    get a11 in bank 1, a12 in bank 5, a13 in bank 1
    again - so instead of access on clock cycle 3, we
    have to wait until cycle 9. Then we get a14 in
    bank 5 again on cycle 10, etc.

7
Indirect addressing
  • If addressing is indirect we may wind up jumping
    all over, and suffer performance hits because of
    it.

8
Shared Memory
  • Bank conflicts depend on granularity of memory
  • If N memory refs per cycle, p processors, memory
    with b cycles bbt, need pNb memory banks to see
    uninterrupted access of data
  • With B banks, granularity is
  • g B/(pNb)

9
Moral
  • Separate selection of data from its processing
  • Each subtask requires its own data structure. Be
    prepared to change structures between tasks

10
Load-balancing nomenclature
Objects get distributed among different
processes Edges represent information that need
to be shared between objects
Object
Edge
11
Partitioning
  • Divides up the work
  • 5 4 objects assigned to processes
  • Creates edge-cuts
  • Necessary communications between processes

12
Work/Edge Weights
  • Need a good measure of what the expected work may
    be
  • Molecular dynamics
  • number of molecules
  • regions
  • FEM/finite difference/finite volume, etc
  • Degrees of freedom
  • Cells/elements
  • If edge weights are used, also need a good
    measure on how strongly objects are coupled to
    each other

13
Static/Dynamic Load-Balancing
  • Static load-balancing
  • Done as a preprocessing step before the actual
    calculation
  • If the objects and edges dont change very much
    or at all, can do static load-balancing
  • Dynamic load-balancing
  • Done during the calculation
  • Significant changes in the objects and/or edges

14
Dynamic Load-Balancing Example
  • h-adapted mesh
  • Workload is changing as the computation proceeds
  • Calculate a new partition
  • Need to migrate the elements to their assigned
    process

15
Static vs. Dynamic Load Balancing
  • Static partitioning insufficient for many
    applications
  • Adaptive mesh refinement
  • Multi-phase/Multi-physics computations
  • Particle simulations
  • Crash simulations
  • Parallel mesh generation
  • Heterogeneous computers
  • Need dynamic load balancing

16
Dynamic Load-Balancing Constraints
  • Minimize load-balancing time
  • Memory constraints
  • Minimize data migration -- incremental partitions
  • Small changes in the computation should result in
    small changes in the partitioning
  • Calculating new partition and data migration
    should take less time than the amount of time
    saved by performing computations on new grid
  • Done in parallel

17
Methods of Load-Balancing
  • Geometric
  • Based on geometric location
  • Faster load-balancing time with medium quality
    results
  • Graph-based
  • Create a graph to represent the objects and their
    connections
  • Slower load-balancing time but high quality
    results
  • Incremental methods
  • Use graph representation and shuffle around
    objects

18
Choosing a Load-Balancing Algorithm/Method
  • No algorithm/method is appropriate for all
    applications!
  • Graph load-balancing algorithms for
  • Static load-balancing
  • Computations where computation to load-balancing
    time ratio is high
  • Implicit schemes with a linear and non-linear
    solution scheme

19
Choosing a Load-Balancing Algorithm/Method
  • Geometric load-balancing algorithms for
  • Computations where computation to load-balancing
    time ratio is low
  • For explicit time stepping calculations with many
    time steps and varying workload (MD, FEM crash
    simulations, etc.)
  • Problems with many load-balancing objects

20
Geometric Load-Balancing
  • Based on the objects coordinates
  • Want a unique coordinate associated with an
    object
  • Node coordinates, element centroid, molecule
    coordinate/centroid, etc.
  • Partition space which results in a partition of
    the load-balancing objects
  • Edge cuts are usually not explicitly dealt with

21
Geometric Load-Balancing Assumptions
  • Objects that are close will likely need to share
    information
  • Want compact partitions
  • High volume to surface area or high area to
    perimeter length ratios
  • Coordinate information
  • Bounded domain

22
Geometric Load-Balancing Algorithms
  • Recursive Coordinate Bisection (RCB)
  • Berger Bokhari
  • Recursive Inertial Bisection (RIB)
  • Taylor Nour-Omid
  • Space Filling Curves (SFC)
  • Warren Salmon, Ou, Ranka, Fox, Baden
    Pilkington
  • Octree Partitioning/Refinement-tree Partitioning
  • Loy Flaherty, Mitchell

23
Recursive Coordinate Bisection
  1. Choose an axis for the cut
  2. Find the proper location of the cut
  3. Group objects together according to location
    relative to cut
  4. If more partitions are needed, go to step 1

24
Recursive Inertial Bisection
  1. Choose a direction for the cut
  2. Find the proper location of the cut
  3. Group objects together according to location
    relative to cut
  4. If more partitions are needed, go to step 1

25
Space Filling Curves
A Space Filling Curve is a 1-dimensional curve
which passes through every point in an
n-dimensional domain
26
Load-Balancing with Space Filling Curves
  • The SFC gives a 1-dimensional ordering of objects
    located in an n-dimensional domain
  • Easier to work with objects in 1 dimension than
    in n dimensions
  • Algorithm
  • Sort objects by their location on the SFC
  • Calculate cuts along the SFC

27
Octree Partitioning/Refinement-Tree Partitioning
  • Tree based algorithms for applications with
    multiple levels of data, simulation accuracy,
    etc.
  • Tree is usually built from specific computational
    schemes
  • Tightly coupled with the simulation

28
Comparisons of RCB, RIB, and SFC
  • RCB and RIB usually give slightly better
    partitions than SFC
  • SFC is usually a little faster
  • SFC is a little better for incremental partitions
  • RIB can be real unstable for incremental
    partitions

29
Load-Balancing Libraries
  • There are many load-balancing libraries
    downloadable from the web
  • Mostly graph partitioning libraries
  • Static Chaco, Metis, Party, Scotch
  • Dynamic ParMetis, DRAMA, Jostle, Zoltan
  • Zoltan (www.cs.sandia.gov/Zoltan)
  • Dynamic load-balancing library with
  • SFC, RCB, RIB, Octree, ParMetis, Jostle
  • Same interface to all load-balancing algorithms

30
Methods to Avoid Communication
  • Avoiding load-balancing
  • Load-balancing not needed every time the workload
    and/or edge connectivity changes
  • Ghost cells
  • Predictive load-balancing

31
Accessing Information on Other Processors
  • Need communication between processors
  • Use ghost cells need to maintain consistency
    of data in ghost cells

32
Ghost Cells
  • Copies of cells assigned to other processors
  • Make needed information available
  • No solution values are computed at the ghost
    cells
  • Ghost cell information needs to be updated
    whenever necessary
  • Ghost cells need to be calculated dynamically
    because of changing mesh and dynamic
    load-balancing

33
Predictive Load-Balancing
  • Predict the workload and/or edge connectivity and
    load-balance with that information
  • Assumes that you can predict the workload and/or
    edge connectivity
  • Still need to perform communication but reduces
    data migration

34
Predictive Load-Balancing
  • Refine then load-balance 4 objects migrated
  • Predictive load-balance then refine 1 object
    migrated
Write a Comment
User Comments (0)
About PowerShow.com