KeLPIO - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

KeLPIO

Description:

To compile in-core application. Define typedefs using KeLP XArrayX and MoverX: ... Compile in code to establish OOC backing store and enable OOC cache ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 30

Provided by: daniel265

Category:

more less

Transcript and Presenter's Notes

Title: KeLPIO

1
KeLPIO
A Telescope-Ready Domain-Specific I/O Library for
Irregular Block-Structured Applications Bradley
Broom, Rob Fowler, Ken Kennedy Department of
Computer Science Rice University
2
Compiler Optimization of I/O

I/O is very slow, and often significantly affects
performance.
I/O optimized software is often very complex
Structure remapping.
Asychronous, threaded I/O.
Effective compiler optimization of I/O allows
software to be
Simpler,
More easily developed, and
More easily maintained.
We prefer very high-level, domain-specific
language facilities
Simplify programming
Simplify compiler analysis

3
Language Extensions for I/O

One way is to add new language features to
support I/O.
Similar to HPF directives for automatic
parallelization.
But many problems
Limited to comparatively low-level, general
purpose facilities.
Do not satisfy our desire for high-level I/O.
Limited acceptance by user community.
Limited compiler support.
Reduced portability.
Uncertain future.
Adds to language proliferation.
Should extend every language C, C, Java,
Fortran, ...

4
Our Approach Telescoping

Extend compiler optimization technology so that
library calls are treated as language primitives.
Programmer sees a standard language with a
domain-specific I/O library.
Compiler sees a domain-specific language with
very high-level facilities for I/O.
Requires an extensible compiler generation
framework.
Library developer, or domain expert, (or end
user) should be able to add optimization rules
and generate an optimizer.
Libraries should be structured to facilitate
optimization.
KelpIO a high-level I/O library for irregular
block-structured applications.

5
Agenda

Introduction
What is KelpIO?
Why is it useful?
Quick KeLP Review
Brief Overview of KelpIO
Optimizing KelpIO Programs
Further Research

6
What is KeLPIO ?

KeLP Kernel Lattice Parallelism
A high-level C library for managing
communication in (irregular) block-structured
applications.
KeLPIO KeLP Input/Output
A library of KeLP-like Input/Output operations
Provides I/O interface at same level as
communication
Uses existing low-level (parallel) libraries for
actual I/O
Support for
Application I/O, Snapshoting, Checkpointing
Out-of-core execution

7
KeLPIO Goals

Provide a collection of intuitive, high-level I/O
operations
Be implemented with reasonable efficiency as a
library
Expose a multi-layered interface of increasing
complexity, specificity, and efficiency
To enable (compiler) transformation of high-level
I/O operations into more efficient, but
lower-level operations
Be compatible with the KeLP library.

8
Agenda

Introduction
Quick KeLP Review
Brief Overview of KelpIO
Optimizing KelpIO Programs
Further Research

9
Quick Review of KeLP

C library for coordinating irregular
block-structured scientific applications
Based on intuitive, geometric, programming
abstractions points, regions, ...
Region arithmetic allows regions to be added,
subtracted, ...
Dimensionality is strongly typed PointX, RegionX
Trailing X ? 1,2,3,4 denotes dimensionality

10
Main KeLP Concepts

Application arrays are subdivided into blocks
called GridXs
GridX is a RegionX with processor assignment and
data storage
XArrayX manages a collection of GridXs
Instantiates GridXs according to a FloorPlanX
Provides (collective) iterators for accessing all
GridXs in an XArrayX
Inspector-Executor communications paradigm
MotionPlanX describes required communication
MoverX performs communication

11
Computational Building Blocks

Application arrays are subdivided into blocks
called GridXs
A GridX consists of
a RegionX denoting its extent
a processor assignment
data storage
GridX data
can be accessed from C,
but Fortran interface often used by numerical
kernels

12
Managing GridXs

XArrayX manages a collection of GridXs
GridXs can have overlapping RegionXs
Provides (collective) operations for
Instantiating GridXs according to a FloorPlanX
Accessing GridXs and their data
Iterating over all GridXs in an XArrayX
Iterating over all GridXs on the local processor

13
KeLP's Communication Model

Uses inspector-executor paradigm
MotionPlanX describes required communication
A collection of individual data motions, each
containing
Source and destination XArrayX
Source and destination GridX
RegionX to communicate
MoverX performs communication described by
MotionPlanX
Variety of movers within KeLP framework
Vectorizing Mover
Adding Mover

14
KeLP Example

Void fillGhost(DoubleArray2 X)
MotionPlan2 M
for (indexIterator1 ii(X) ii ii)
int i ii(0)
Region2 inside grow(X(i).region(), -1)
for (indexIterator1 jj(X) jj jj)
int j jj(0)
if (i ! j) M.CopyOnIntersection(X,i,X,j,ins
ide)
DoubleArray2Mover DM(X,X,M)
DM.execute()

15
Agenda

Introduction
Quick KeLP Review
Brief Overview of KelpIO
Application I/O
Snapshoting and Checkpointing
Out-of-core programming
Optimizing KelpIO Programs
Further Research

16
KeLPIO Overview

Provides KeLP-like primitives for communicating
array data between GridXs and external arrays
Designed for
Application I/O
Snapshoting
Checkpointing
Out-of-core execution
Independent of the underlying I/O library
Does not duplicate I/O library functions other
than read and write
Target I/O libraries are interfaced to KeLPIO by
a concrete implementation of the FileInterface
abstract class

17
External Arrays

FileArrayX
Is the KelpIO interface to an external array
Is strongly typed in
Number of dimensions
Element type
Uses FileInterface object to perform I/O
Similar to XArrayX
FileArrayX manages a collection of blocks
DecompositionX represents which processors can
(should) directly access regions of the array

18
I/O Plans and Movers

Based on same inspector-executor paradigm as KeLP
Classes IOPlanX and IOMoverX move data between
GridX within an XArrayX and RegionX within a
FileArrayX
Direction of movement (from XArrayX to FileArrayX
or vice-versa) determines input or output
IOPlanX and IOMoverX are either all input or all
output
Source and destination must have the same
processor assignment

19
KelpIO Example (Part 1/2)

Void saveData (int N, DoubleArray2 X,
char filename, int offset)
PassionFile pf (MODE_WRITEONLY, filename)
Region2 r (1, 1, N, N)
Processors2 P
Decomposition2 T(r)
T.distribute(BLOCK1,BLOCK1,P)
FileArray2ltdoublegt fa (T, pf, offset)

20
KelpIO Example (Part 2/2)

IOPlan2 iop
for (indexIterator1 ii(X) ii ii)
int i ii(0)
Region2 inside grow(X(i).region(), -1)
for (indexIterator1 jj(T) jj jj)
int j jj(0)
iop.CopyOnIntersection(X,i,fa,j,inside)
IOMover2ltGrid2ltdoublegt, doublegt iom(X,fa,iop)
iom.execute()

21
Snapshoting

Use same strategy as application I/O
Create IOPlanX once, then execute IOMoverX
repeatedly
Need to change external array position between
snapshots
Use FileArrayXSetOffset for numerical positions
Directly access and change FileView otherwise

22
Checkpointing

If overlap doesn't need saving, use same strategy
as application I/O
If overlap must be saved
Replicated elements generally have different
values
Cannot use one-to-one correspondence between
internal and external grid positions
Solution translate GridXs to eliminate overlap

23
Embeddings

EmbeddingX shifts GridX to new positions
Original GridXs continue to store data
Can use EmbeddingX to transform away overlap
regions
Utility function RemoveOverlap does this for
regular decompositions
Multi-dimensional rectangular bin-packing problem
in general (NP-hard)
Application must provide mechanism
Future provide general-purpose (but not optimal)
function

24
Out-of-core Programming

Manual conversion from in-core to out-of-core
Can require substantial effort to introduce
staging areas, rearrange computation
Results in divergence of in-core and out-of-core
codes, duplicated effort
KeLPIO enables semi-automatic conversion, with
minimal source code changes
Easy to conditionally compile for either in-core
or out-of-core
Basic unit of OOC decomposition is the GridX
Overpartition array and assign multiple GridXs
per processor
Only a subset of GridXs assigned to each
processor are in core concurrently

25
Managing Out-of-core GridXs

OOCXArrayX is a new GridX management class
similar to XArrayX
Accessing a specific GridX from an OOCXArrayX
forces that GridX into memory
Least recently accessed GridX is swapped out

26
Using OOCXArrayX

Application creates OOCXArrayX much like an
XArrayX
Must create multiple GridXs per processor
Application enables OOCXArrayX
Determines non-overlapping EmbeddingX
Creates swap array (FileArrayX)
Sets number of GridXs to cache
Application uses OOCXArrayX much like an XArrayX
Programmer must ensure only cached GridXs are
accessed
Possible, as cache behavior predictable from
source
Performance implications should also be
considered
Must use OOCMoverX for communicating

27
Communication Involving OOCXArrayXs

Standard KeLP movers inadequate for out-of-core
data
Transient GridXs always require buffering
Swapping of GridXs should be minimized
When a specific GridX is swapped in, do as many
communications involving it as possible
OOCMoverX designed for moving data involving
OOCXArrayXs

28
Conditionally Compiled OOC

Use typedefs for all potentially OOC XArrayXs and
MoverXs
To compile in-core application
Define typedefs using KeLP XArrayX and MoverX
typedef XArray2ltGrid2ltdoublegt gt DoubleArray2
To compile out-of-core application
Define typedefs using KelpIO OOCXArrayX and
OOCMoverX classes
typedef OOCXArray2ltGrid2ltdoublegt gt DoubleArray2
Overpartition the array (create multiple GridXs
per processor)
Compile in code to establish OOC backing store
and enable OOC cache

29
Agenda

Introduction
Quick KeLP Review
Brief Overview of KelpIO
Optimizing KelpIO Programs
File Layout Optimization
File Access Optimization
Out-of-core Optimization
Further Research

30
File Layout Optimization

Specific file layouts often not required
Checkpoint files
Out-of-core swap files
Use EmbeddingX to remap GridXs into
high-performance shape
Library function Pencilize can remap general
shape into a pencil along one dimension

31
File Access Optimization

Interleave I/O and computation
Create multiple IOPlanXs and IOMoverXs
Interleave/combine similar I/Os
Merge snapshots and checkpoints
Use asynchronous I/O (not implemented)
Prefetch out-of-core GridXs

32
Out-of-core Optimization

Optimization of swap file layout
Optimization of GridX size
Algorithmic optimization
Example loop fusion
Explicit control of GridX cache
Examples renewing, aging, flushing, clearing
Optimization of KeLPIO primitives used
Adjust cache size to use available memory

33
Optimizing OOC Primitives

Accessing a GridX is potentially very expensive
May need to read from disk (and write-back old
GridX)
Use utility access methods provided for non-data
GridX methods
These do not affect GridX cache
const accesses don't make GridXs dirty
Clean GridXs don't need to be written back to
disk
Repeatedly accessing a GridX is expensive
Unnecessarily checks and updates LRU data each
access
Move GridX accesses out of computation loops

34
Optimized Out-of-Core Example

Void fillGhost(DoubleArray2 X)
MotionPlan2 M
For (indexIterator1 ii(X) ii ii)
int i ii(0)
Region2 inside grow(X.region(i), -1)
for (indexIterator1 jj(X) jj jj)
int j jj(0)
if (i ! j)
M.CopyOnIntersection(X,i,X,j,inside)
DoubleArray2Mover DM(X,X,M)
DM.execute()

35
Adjusting OOC Cache Size

Set the OOC Cache size to use available memory
However, always ensure it's large enough to
guarantee that only cached GridXs are used
OOC Cache size can be adjusted dynamically
Periodically monitor available physical memory
and adjust cache size appropriately
Allows a single executable to configure itself
for high performance on any node in a Grid
environment.
Makes use of memory when it's available
Avoids thrashing when it isn't

36
Runtime Overhead of OOCXArrayX