zoltan - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

zoltan

Description:

Erik Boman, Karen Devine, Robert Heaphy, Bruce Hendrickson, William Mitchell ... Parallel, dynamic, adaptive computations need many services to obtain peak ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 40
Provided by: kdd1
Category:
Tags: boman | zoltan

less

Transcript and Presenter's Notes

Title: zoltan


1
Robert Heaphy
2
Zoltan Dynamic Load Balancing and Parallel Data
Services
  • Erik Boman, Karen Devine, Robert Heaphy, Bruce
    Hendrickson, William Mitchell (NIST), Robert
    Preis (University of Paderborn), Courtenay
    Vaughan
  • Sandia National Laboratories
  • Albuquerque, NM 87185

3
The Zoltan Toolkit
  • Parallel, dynamic, adaptive computations need
    many services to obtain peak performance.
  • Processor work loads change during computation.
  • Communication patterns are complicated.
  • Memory usage is dynamic.
  • Application developers wrote their own solutions.
  • Little expertise in such parallel algorithms.
  • No capability to compare approaches.
  • No code reuse.

Zoltan Toolkit of data services for dynamic,
unstructured, adaptive computations
4
Support for Many Applications
  • Different applications, requirements, data
    structures.

5
Applications Adaptive Mesh Refinement
  • Dynamic load balancing.
  • Redistribute elements after mesh refinement.
  • Keep data movement costs low.
  • Recursive Coordinate Bisection
  • Parent and child elements assigned to same
    processor.
  • Inexpensive.
  • Incremental.

Using RCB with AMR in SIERRA (Edwards, Rath,
Lober, et al., Sandia)
6
Applications Crash Simulations
  • Dynamic load balancing.
  • Assigns physically close surfaces to the same
    processor.
  • Recursive coordinate bisection Inexpensive
    fast incremental.
  • Multiphase simulation
  • Graph-based decomposition for finite element
    calculation.
  • RCB decomposition for contact detection.
  • Unstructured Communication package maps between
    decompositions.

Using RCB for Contact Detection in
Pronto (Attaway, Hendrickson, Plimpton, et al.,
Sandia)
7
Applications Parallel Circuit Simulation
  • Load balance matrix fill phase.
  • Load time for devices can vary by two orders of
    magnitude.
  • Problem is a network, not a mesh.
  • Load balance solve phase.
  • Equal number of rows while minimizing
    communication.
  • Trilinos solver library (Heroux, et al.)
    partitions matrix with Zoltan.
  • Apply graph partitioning to each phase.

Parallel analog circuit simulation in XYCE
(Hutchinson, Hoekstra, et al., Sandia)
8
Applications Multiphysics Simulations
  • Multiphysics simulations
  • Difficult to estimate work in advance.
  • Rebalance infrequently want high quality.
  • Dynamic load balancing
  • Multi-constraint graph partitioning.
  • Two balance criteria matrix fill and linear
    solve.
  • Using Zoltan in MPSalsa.
  • Load-balancing query functions implemented in
    lt200 lines of code.
  • All communication for migrating data done with
    Zoltans data migration tools.
  • No additional communication routines written by
    application developer.

MPSalsa multiphysicssimulations (J. Shadid, A.
Salinger, et al., Sandia)
9
Dynamic Load Balancing
  • Desirable characteristics for dynamic load
    balancing
  • Distribute work evenly among processors.
  • Minimize interprocessor communication.
  • Keep data movement costs low.
  • Incremental partitioning small changes in
    workloads produce only small changes in
    decomposition.
  • Parallel, scalable implementation.

10
No One-Size-Fits-All Solutions
  • No single partitioner works best for all
    applications.
  • Trade-offs
  • Quality vs. speed.
  • Geometric locality vs. data dependencies.
  • Low data-movement costs vs. tolerance for
    remapping.
  • Application developers may not know which
    partitioner is best for application.
  • Zoltan contains suite of partitioning methods.
  • Application changes only one parameter to switch
    methods.
  • Allows experimentation/comparisons to find most
    effective partitioner for application.
  • Advantage of toolkit approach.

11
Zoltan Suite of Partitioning Algorithms
Recursive Coordinate Bisection (Berger,
Bokhari) Recursive Inertial Bisection (Taylor,
Nour-Omid)
ParMETIS (Karypis, Schloegel, Kumar) Jostle
(Walshaw)
Space Filling Curves (Peano, Hilbert) Refinement-t
ree Partitioning (Mitchell) Octree Partitioning
(Loy, Flaherty)
12
Recursive Coordinate Bisection (RCB)
  • Developed by Berger Bokhari, 1987, for AMR.
  • Idea
  • Divide work into two equal parts using a cutting
    plane orthogonal to a coordinate axis.
  • Recursively cut the resulting subdomains.

13
RCB Advantages
  • Conceptually simple fast and inexpensive.
  • Regular subdomains.
  • Can be used for structured or unstructured
    applications.
  • All processors can inexpensively know entire
    decomposition.
  • Effective when connectivity info is not
    available.
  • Implicitly incremental.

14
RCB Disadvantages
  • No explicit control of communication costs.
  • Can generate disconnected subdomains.
  • Mediocre partition quality.

15
Variations on RCB in Zoltan
  • Recursive Inertial Bisection
  • Simon, Taylor, et al., 1991
  • Cutting planes orthogonal to principle axes of
    geometry.
  • Not incremental.
  • Point-Assign and Box-Assign.
  • Given a decomposition, determine to which
    processor(s) a new item should be added (based on
    its geometric location).
  • Useful in contact detection and multiphase
    simulations.
  • Structured-mesh support.
  • Set parameter to generate regular block
    subdomains.

16
Space-Filling Curve Partitioning (SFC)
  • Developed by Peano, 1890.
  • Space-Filling Curve
  • Mapping from R3 to R1 that completely fills a
    domain.
  • Applied recursively to obtain desired
    granularity.
  • Used for partitioning by
  • Warren and Salmon, 1993, gravitational
    simulations.
  • Pilkington and Baden, 1994, smoothed particle
    hydrodynamics.
  • Patra and Oden, 1995, adaptive mesh refinement.

17
SFC Algorithm
  • Run space-filling curve through domain.
  • Order objects according to position on curve.
  • Perform 1-D partition of curve.

18
SFC Advantages
  • Simple, fast, inexpensive.
  • Maintains geometric locality of objects in
    processors.
  • Linear ordering of objects may improve cache
    performance.
  • Implicitly incremental.

19
SFC Disadvantages
  • No explicit control of communication costs.
  • Can generate disconnected subdomains.
  • Slightly lower quality partitions than RCB.

20
Implementations of SFC in Zoltan
  • Binned Hilbert SFC
  • Heaphy, Edwards, 2001
  • Replace linear sort of objects by adaptive
    binning strategy.
  • Improved speed over traditional implementations.
  • Box-Assign and Point-Assign supported.
  • Refinement-Tree Partitioning
  • Mitchell, 1998
  • Topology-based rather than geometry-based.
  • Uses parent-child relationships in AMR to build
    tree.
  • Octree Partitioning
  • Loy, Flaherty, 1998
  • Explicitly builds octree data structure.
  • Partial tree traversals give (binned) linear
    ordering.

21
Applications using SFC
  • Adaptive hp-refinement finite element methods.
  • Assigns physically close elements to same
    processor.
  • Inexpensive incremental fast.
  • Linear ordering can be used to order elements
    for efficient memory access.

hp-refinement mesh 8 processors. Patra, et al.
(SUNY-Buffalo)
22
Graph Partitioning
  • Represent problem as a weighted graph.
  • Nodes objects to be partitioned.
  • Edges communication between objects.
  • Weights work load or amount of communication.
  • Partition graph so that
  • Partitions have equal nodal weight.
  • Weight of edges cut by subdomain boundaries is
    small.

23
Multi-Level Graph Partitioning
  • Bui Jones (1993) Hendrickson Leland (1993)
    Karypis and Kumar (1995)
  • Construct smaller approximations to graph.
  • Perform graph partitioning on coarse graph.
  • Propagate partition back, refining as needed.

24
Multi-level Graph Partitioning
  • Advantages
  • High quality partitions for many applications.
  • Explicit control of communication costs.
  • Widely used for static partitioning (Chaco,
    METIS, Party, Scotch)
  • Disadvantages
  • More expensive than geometric approaches.
  • Not incremental.

25
Diffusive Graph Partitioning
  • Cybenko (1989) Hu Blake (1995)
  • Work is moved from heavily loaded processors to
    more lightly loaded neighbors.

26
Diffusive Graph Partitioning
  • Advantages
  • Local and parallel.
  • Inexpensive.
  • Incremental.
  • Disadvantages
  • Several iterations needed for global balance.
  • Partition quality can degrade.
  • Hybrid approach may work best.

27
Graph Partitioning in Zoltan
  • Zoltan provides interfaces to popular parallel
    graph partitioning packages.
  • ParMETIS (U. Minnesota)
  • PJostle (U. Greenwich)
  • Both ParMETIS and PJostle include
  • Multilevel graph partitioning
  • Diffusive partitioning
  • Hybrids of the two strategies
  • Multi-constraint partitioning
  • Zoltan interface simple callbacks for neighbor
    lists.
  • Zoltan builds complicated graph data structures
    needed by graph-partitioning packages.

28
Zoltan Data Services
29
Zoltan Data Migration Tools
  • Data must be moved for new decomposition.
  • Depends strongly on application data structures.
  • Complicated communication patterns.
  • Zoltan can help!
  • Application supplies query functions to
    pack/unpack data.
  • Zoltan does all communication to new processors.

30
Zoltan Matrix Ordering Interface
  • Produce fill-reducing ordering for sparse matrix
    factorization.
  • Generic matrix-ordering interface in Zoltan.
  • Easy to add new ordering algorithms.
  • Specific interface to ordering methods in
    ParMETIS.

31
Zoltan Unstructured Communication Package
  • Simple primitives for efficient irregular
    communication.
  • Zoltan_Comm_Create Generates communication plan.
  • Processors and amount of data to send and
    receive.
  • Zoltan_Comm_Do Send data using plan.
  • Can reuse plan. (Same plan, different data.)
  • Zoltan_Comm_Do_Reverse Inverse communication.
  • Used for most communication in Zoltan.

32
Zoltan Dynamic Memory Package
  • Support for debugging dynamic memory usage.
  • Tracking of mallocs and frees.
  • Memory-leak warnings.
  • Source code file and line numbers of operations.
  • Simple allocation of multi-dimensional arrays.
  • Simple to use.
  • Replace calls to malloc, free with ZOLTAN_MALLOC,
    ZOLTAN_FREE.
  • Link with memory package library.

33
Zoltan Distributed Data Directory
  • Helps applications locate off-processor data.
  • Rendezvous algorithm (Pinar, 2001).
  • Directory distributed in known way (hashing)
    across processors.
  • Requests for object location sent to processor
    storing theobjects directory entry.
  • Easy to use.
  • Functions to create, update, search, destroy.
  • Customizable data storage, user distribution.
  • Scalable performance.
  • Constant communication cost for look-ups.
  • Linear total memory usage.
  • Avoids communication bottlenecks.

34
Zoltan Toolkit Summary
  • Data-structure neutral design.
  • Application need not use/build prescribed data
    structures.
  • High-quality implementations of many
    partitioners.
  • No single algorithm is appropriate for all
    applications.
  • Suite of algorithms allows experimentation and
    comparison.
  • Data management tools for dynamic applications.
  • Data migration, unstructured communication,
    memory management.
  • Uses of Zoltan
  • Effective toolkit for many different
    applications.
  • Research test-bed for new algorithm development.
  • Interface for new graph-, tree-, or
    geometry-based tools.

35
Zoltan Interface
  • Simple, easy-to-use interface.
  • Small number of callable Zoltan functions.
  • Callable from C, C, Fortran.
  • Data-structure neutral design.
  • Supports wide range of applications and data
    structures.
  • Imposes no restrictions on applications data
    structures.
  • Application does not have to build Zoltans data
    structures.
  • Only requirement unique global IDs for objects.
  • Application interface
  • Zoltan queries the application for needed info.
  • IDs of objects, coordinates, relationships to
    other objects.
  • Application provides simple functions to answer
    queries.
  • Small extra costs in memory and function-call
    overhead.

36
Zoltan Query Functions
  • Query mechanism supports
  • Geometric algorithms
  • Queries for dimensions, coordinates, etc.
  • Graph-based algorithms
  • Queries for edge lists, edge weights, etc.
  • Tree-based algorithms
  • Queries for parent/child relationships, etc.
  • Once query functions are implemented, application
    can access all Zoltan functionality.
  • Can switch between algorithms by setting
    parameters.

37
Example Zoltan Application Interface
APPLICATION
Initialize Zoltan (Zoltan_Initialize,
Zoltan_Create)
COMPUTE
Select LB Method (Zoltan_Set_Params)
Re-partition (Zoltan_LB_Partition)
Register query functions (Zoltan_Set_Fn)
Move data (Zoltan_Migrate)
Clean up (Zoltan_Destroy)
38
Current Development in Zoltan
  • Partitioning for complex objectives.
  • Communication and computation.
  • Overlapped preconditioners.
  • Work with Pinar (LBL).
  • Heterogeneous partitioning.
  • Model machine as hierarchy of components.
  • Partition each level of hierarchy.
  • Work with Flaherty (RPI), Teresco (Williams).
  • Multiconstraint geometric partitioning.
  • Find one partition that is good with respect to
    multiple weights.
  • Graph-based multiconstraint partitioning
    available in Zoltan through ParMETIS3.0.

39
For More Information...
  • Zoltan Home Page
  • http//www.cs.sandia.gov/Zoltan
  • Users and Developers Guides
  • Download Zoltan software under GNU LGPL.
  • Email
  • zoltan_at_cs.sandia.gov
  • kddevin_at_sandia.gov
  • rheaphy_at_sandia.gov
Write a Comment
User Comments (0)
About PowerShow.com