KeLPIO - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

KeLPIO

Description:

To compile in-core application. Define typedefs using KeLP XArrayX and MoverX: ... Compile in code to establish OOC backing store and enable OOC cache ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 30
Provided by: daniel265
Category:
Tags: kelpio | compile

less

Transcript and Presenter's Notes

Title: KeLPIO


1
KeLPIO
A Telescope-Ready Domain-Specific I/O Library for
Irregular Block-Structured Applications Bradley
Broom, Rob Fowler, Ken Kennedy Department of
Computer Science Rice University
2
Compiler Optimization of I/O
  • I/O is very slow, and often significantly affects
    performance.
  • I/O optimized software is often very complex
  • Structure remapping.
  • Asychronous, threaded I/O.
  • Effective compiler optimization of I/O allows
    software to be
  • Simpler,
  • More easily developed, and
  • More easily maintained.
  • We prefer very high-level, domain-specific
    language facilities
  • Simplify programming
  • Simplify compiler analysis

3
Language Extensions for I/O
  • One way is to add new language features to
    support I/O.
  • Similar to HPF directives for automatic
    parallelization.
  • But many problems
  • Limited to comparatively low-level, general
    purpose facilities.
  • Do not satisfy our desire for high-level I/O.
  • Limited acceptance by user community.
  • Limited compiler support.
  • Reduced portability.
  • Uncertain future.
  • Adds to language proliferation.
  • Should extend every language C, C, Java,
    Fortran, ...

4
Our Approach Telescoping
  • Extend compiler optimization technology so that
    library calls are treated as language primitives.
  • Programmer sees a standard language with a
    domain-specific I/O library.
  • Compiler sees a domain-specific language with
    very high-level facilities for I/O.
  • Requires an extensible compiler generation
    framework.
  • Library developer, or domain expert, (or end
    user) should be able to add optimization rules
    and generate an optimizer.
  • Libraries should be structured to facilitate
    optimization.
  • KelpIO a high-level I/O library for irregular
    block-structured applications.

5
Agenda
  • Introduction
  • What is KelpIO?
  • Why is it useful?
  • Quick KeLP Review
  • Brief Overview of KelpIO
  • Optimizing KelpIO Programs
  • Further Research

6
What is KeLPIO ?
  • KeLP Kernel Lattice Parallelism
  • A high-level C library for managing
    communication in (irregular) block-structured
    applications.
  • KeLPIO KeLP Input/Output
  • A library of KeLP-like Input/Output operations
  • Provides I/O interface at same level as
    communication
  • Uses existing low-level (parallel) libraries for
    actual I/O
  • Support for
  • Application I/O, Snapshoting, Checkpointing
  • Out-of-core execution

7
KeLPIO Goals
  • Provide a collection of intuitive, high-level I/O
    operations
  • Be implemented with reasonable efficiency as a
    library
  • Expose a multi-layered interface of increasing
    complexity, specificity, and efficiency
  • To enable (compiler) transformation of high-level
    I/O operations into more efficient, but
    lower-level operations
  • Be compatible with the KeLP library.

8
Agenda
  • Introduction
  • Quick KeLP Review
  • Brief Overview of KelpIO
  • Optimizing KelpIO Programs
  • Further Research

9
Quick Review of KeLP
  • C library for coordinating irregular
    block-structured scientific applications
  • Based on intuitive, geometric, programming
    abstractions points, regions, ...
  • Region arithmetic allows regions to be added,
    subtracted, ...
  • Dimensionality is strongly typed PointX, RegionX
  • Trailing X ? 1,2,3,4 denotes dimensionality

10
Main KeLP Concepts
  • Application arrays are subdivided into blocks
    called GridXs
  • GridX is a RegionX with processor assignment and
    data storage
  • XArrayX manages a collection of GridXs
  • Instantiates GridXs according to a FloorPlanX
  • Provides (collective) iterators for accessing all
    GridXs in an XArrayX
  • Inspector-Executor communications paradigm
  • MotionPlanX describes required communication
  • MoverX performs communication

11
Computational Building Blocks
  • Application arrays are subdivided into blocks
    called GridXs
  • A GridX consists of
  • a RegionX denoting its extent
  • a processor assignment
  • data storage
  • GridX data
  • can be accessed from C,
  • but Fortran interface often used by numerical
    kernels

12
Managing GridXs
  • XArrayX manages a collection of GridXs
  • GridXs can have overlapping RegionXs
  • Provides (collective) operations for
  • Instantiating GridXs according to a FloorPlanX
  • Accessing GridXs and their data
  • Iterating over all GridXs in an XArrayX
  • Iterating over all GridXs on the local processor

13
KeLP's Communication Model
  • Uses inspector-executor paradigm
  • MotionPlanX describes required communication
  • A collection of individual data motions, each
    containing
  • Source and destination XArrayX
  • Source and destination GridX
  • RegionX to communicate
  • MoverX performs communication described by
    MotionPlanX
  • Variety of movers within KeLP framework
  • Vectorizing Mover
  • Adding Mover

14
KeLP Example
  • Void fillGhost(DoubleArray2 X)
  • MotionPlan2 M
  • for (indexIterator1 ii(X) ii ii)
  • int i ii(0)
  • Region2 inside grow(X(i).region(), -1)
  • for (indexIterator1 jj(X) jj jj)
  • int j jj(0)
  • if (i ! j) M.CopyOnIntersection(X,i,X,j,ins
    ide)
  • DoubleArray2Mover DM(X,X,M)
  • DM.execute()

15
Agenda
  • Introduction
  • Quick KeLP Review
  • Brief Overview of KelpIO
  • Application I/O
  • Snapshoting and Checkpointing
  • Out-of-core programming
  • Optimizing KelpIO Programs
  • Further Research

16
KeLPIO Overview
  • Provides KeLP-like primitives for communicating
    array data between GridXs and external arrays
  • Designed for
  • Application I/O
  • Snapshoting
  • Checkpointing
  • Out-of-core execution
  • Independent of the underlying I/O library
  • Does not duplicate I/O library functions other
    than read and write
  • Target I/O libraries are interfaced to KeLPIO by
    a concrete implementation of the FileInterface
    abstract class

17
External Arrays
  • FileArrayX
  • Is the KelpIO interface to an external array
  • Is strongly typed in
  • Number of dimensions
  • Element type
  • Uses FileInterface object to perform I/O
  • Similar to XArrayX
  • FileArrayX manages a collection of blocks
  • DecompositionX represents which processors can
    (should) directly access regions of the array

18
I/O Plans and Movers
  • Based on same inspector-executor paradigm as KeLP
  • Classes IOPlanX and IOMoverX move data between
    GridX within an XArrayX and RegionX within a
    FileArrayX
  • Direction of movement (from XArrayX to FileArrayX
    or vice-versa) determines input or output
  • IOPlanX and IOMoverX are either all input or all
    output
  • Source and destination must have the same
    processor assignment

19
KelpIO Example (Part 1/2)
  • Void saveData (int N, DoubleArray2 X,
  • char filename, int offset)
  • PassionFile pf (MODE_WRITEONLY, filename)
  • Region2 r (1, 1, N, N)
  • Processors2 P
  • Decomposition2 T(r)
  • T.distribute(BLOCK1,BLOCK1,P)
  • FileArray2ltdoublegt fa (T, pf, offset)

20
KelpIO Example (Part 2/2)
  • IOPlan2 iop
  • for (indexIterator1 ii(X) ii ii)
  • int i ii(0)
  • Region2 inside grow(X(i).region(), -1)
  • for (indexIterator1 jj(T) jj jj)
  • int j jj(0)
  • iop.CopyOnIntersection(X,i,fa,j,inside)
  • IOMover2ltGrid2ltdoublegt, doublegt iom(X,fa,iop)
  • iom.execute()

21
Snapshoting
  • Use same strategy as application I/O
  • Create IOPlanX once, then execute IOMoverX
    repeatedly
  • Need to change external array position between
    snapshots
  • Use FileArrayXSetOffset for numerical positions
  • Directly access and change FileView otherwise

22
Checkpointing
  • If overlap doesn't need saving, use same strategy
    as application I/O
  • If overlap must be saved
  • Replicated elements generally have different
    values
  • Cannot use one-to-one correspondence between
    internal and external grid positions
  • Solution translate GridXs to eliminate overlap

23
Embeddings
  • EmbeddingX shifts GridX to new positions
  • Original GridXs continue to store data
  • Can use EmbeddingX to transform away overlap
    regions
  • Utility function RemoveOverlap does this for
    regular decompositions
  • Multi-dimensional rectangular bin-packing problem
    in general (NP-hard)
  • Application must provide mechanism
  • Future provide general-purpose (but not optimal)
    function

24
Out-of-core Programming
  • Manual conversion from in-core to out-of-core
  • Can require substantial effort to introduce
    staging areas, rearrange computation
  • Results in divergence of in-core and out-of-core
    codes, duplicated effort
  • KeLPIO enables semi-automatic conversion, with
    minimal source code changes
  • Easy to conditionally compile for either in-core
    or out-of-core
  • Basic unit of OOC decomposition is the GridX
  • Overpartition array and assign multiple GridXs
    per processor
  • Only a subset of GridXs assigned to each
    processor are in core concurrently

25
Managing Out-of-core GridXs
  • OOCXArrayX is a new GridX management class
    similar to XArrayX
  • Accessing a specific GridX from an OOCXArrayX
    forces that GridX into memory
  • Least recently accessed GridX is swapped out

26
Using OOCXArrayX
  • Application creates OOCXArrayX much like an
    XArrayX
  • Must create multiple GridXs per processor
  • Application enables OOCXArrayX
  • Determines non-overlapping EmbeddingX
  • Creates swap array (FileArrayX)
  • Sets number of GridXs to cache
  • Application uses OOCXArrayX much like an XArrayX
  • Programmer must ensure only cached GridXs are
    accessed
  • Possible, as cache behavior predictable from
    source
  • Performance implications should also be
    considered
  • Must use OOCMoverX for communicating

27
Communication Involving OOCXArrayXs
  • Standard KeLP movers inadequate for out-of-core
    data
  • Transient GridXs always require buffering
  • Swapping of GridXs should be minimized
  • When a specific GridX is swapped in, do as many
    communications involving it as possible
  • OOCMoverX designed for moving data involving
    OOCXArrayXs

28
Conditionally Compiled OOC
  • Use typedefs for all potentially OOC XArrayXs and
    MoverXs
  • To compile in-core application
  • Define typedefs using KeLP XArrayX and MoverX
  • typedef XArray2ltGrid2ltdoublegt gt DoubleArray2
  • To compile out-of-core application
  • Define typedefs using KelpIO OOCXArrayX and
    OOCMoverX classes
  • typedef OOCXArray2ltGrid2ltdoublegt gt DoubleArray2
  • Overpartition the array (create multiple GridXs
    per processor)
  • Compile in code to establish OOC backing store
    and enable OOC cache

29
Agenda
  • Introduction
  • Quick KeLP Review
  • Brief Overview of KelpIO
  • Optimizing KelpIO Programs
  • File Layout Optimization
  • File Access Optimization
  • Out-of-core Optimization
  • Further Research

30
File Layout Optimization
  • Specific file layouts often not required
  • Checkpoint files
  • Out-of-core swap files
  • Use EmbeddingX to remap GridXs into
    high-performance shape
  • Library function Pencilize can remap general
    shape into a pencil along one dimension

31
File Access Optimization
  • Interleave I/O and computation
  • Create multiple IOPlanXs and IOMoverXs
  • Interleave/combine similar I/Os
  • Merge snapshots and checkpoints
  • Use asynchronous I/O (not implemented)
  • Prefetch out-of-core GridXs

32
Out-of-core Optimization
  • Optimization of swap file layout
  • Optimization of GridX size
  • Algorithmic optimization
  • Example loop fusion
  • Explicit control of GridX cache
  • Examples renewing, aging, flushing, clearing
  • Optimization of KeLPIO primitives used
  • Adjust cache size to use available memory

33
Optimizing OOC Primitives
  • Accessing a GridX is potentially very expensive
  • May need to read from disk (and write-back old
    GridX)
  • Use utility access methods provided for non-data
    GridX methods
  • These do not affect GridX cache
  • const accesses don't make GridXs dirty
  • Clean GridXs don't need to be written back to
    disk
  • Repeatedly accessing a GridX is expensive
  • Unnecessarily checks and updates LRU data each
    access
  • Move GridX accesses out of computation loops

34
Optimized Out-of-Core Example
  • Void fillGhost(DoubleArray2 X)
  • MotionPlan2 M
  • For (indexIterator1 ii(X) ii ii)
  • int i ii(0)
  • Region2 inside grow(X.region(i), -1)
  • for (indexIterator1 jj(X) jj jj)
  • int j jj(0)
  • if (i ! j)
  • M.CopyOnIntersection(X,i,X,j,inside)
  • DoubleArray2Mover DM(X,X,M)
  • DM.execute()

35
Adjusting OOC Cache Size
  • Set the OOC Cache size to use available memory
  • However, always ensure it's large enough to
    guarantee that only cached GridXs are used
  • OOC Cache size can be adjusted dynamically
  • Periodically monitor available physical memory
    and adjust cache size appropriately
  • Allows a single executable to configure itself
    for high performance on any node in a Grid
    environment.
  • Makes use of memory when it's available
  • Avoids thrashing when it isn't

36
Runtime Overhead of OOCXArrayX
  • OOCXArrayX with all GridXs cached has similar
    performance to XArrayX for reasonable numbers of
    GridXs per processor

37
Recent and Future Status
  • Recent Developments
  • Released KeLPIO version 1.4.0 compatible with
    KeLP 1.4
  • Coming Soon
  • Fortran interface to core KeLP and KeLPIO library
    functions
  • No application C required
  • New GridX management class for multiple time step
    computations.
  • Allows more computation between communication
    steps.
  • Essentially the same computation/communication
    ratio.
  • To download software, search for KeLPIO on Google.

38
Acknowledgements
  • KelpIO is supported in part by the National
    Partnership for Advanced Computational
    Infrastructure (NPACI).
Write a Comment
User Comments (0)
About PowerShow.com