Title: Data Flow Pattern Analysis of Scientific Applications
1Data Flow Pattern Analysis of Scientific
Applications
- Michael Frumkin
- Parallel Systems Applications
- Intel Corporation
- May 6, 2005
2Outline
- Why Data Flow Pattern Analysis?
- CFD Applications
- The NAS Parallel Benchmarks
- The NAS Grid Benchmarks
- Trace File Analysis
- Conclusions
3Why Data Flow Pattern Analysis?
- Scientific applications
- model few natural processes
- new effects are added infrequently
- influence on the existing data flows are
insignificant - Knowledge of data flow in program helps with
- program understanding
- program optimization, parallelization,
multithreading - building application performance model
4Design of Scientific Applications
- Time represented as an outer loop
- Iterations over time step
- Space is represented by structured/unstructured
grids - Important for understanding data locality
- Data access patterns
- Spatial parallelism
- Physics is represented by an operator at each
grid point - Data flow
- Operator level of parallelism/dependence
5CFD Data Flow Patterns
- Solve the Navier-Stokes equation
- K(ui1)Lui
- u is five-dimensional vector
- K is non-linear operator
- Solver
- RHS computation
6ADI Pattern
- ADI method KKxKyKz
- Multilevel parallelism
y-solve
x-solve
Multipartition
z-solve
7BT Communication
8Explicit Operators
- Stencil operators (explicit methods)
- At each point of a 3-dimensional mesh apply
seven-point
27-point
9Lower-Upper Triangular
Dependence Matrices
)
- Two-dimensional pipeline
- Hyperplane algorithm
(
(
)
-1 0 0 1 0 0 0 -1 0 0 1 0 0 0
-1 0 0 1
10LU Communication
11Multigrid V-Cycle
Interpolation Smoothing
Projection
Interpolation Smoothing
Projection
Interpolation Smoothing
Projection
Interpolation Smoothing
Projection
Smoothing
12MG Communication
13BT x_solve (serial) Call Graph
Data Flow Analysis
do k1,ksize
do j1,jsize
do i1,isize
14 Nest Data Flow Graph
do_45
do_134
do_330
Each arc represents Affinity Relation
15NAS Parallel Benchmarks
- Application Benchmarks
- CFD
- BT, SP, LU
- Data Intensive
- DC, DT, BTIO
- Computational Chemistry
- UA
- Kernel Benchmarks
- FT, CG, MG, IS
- Verification
- Performance Model
- FORTRAN, C, HPF, Java
- Serial, MPI, OpenMP, Java Threads
Other names and brands may be claimed as the
property of others.
16NPB Performance on Altix
Other names and brands may be claimed as the
property of others. Performance tests and
ratings are measured using specific computer
systems and/or components and reflect the
approximate performance of Intel products as
measured by those tests. Any difference in
system hardware or software design or
configuration may affect actual performance.
Buyers should consult other sources of
information to evaluate the performance of
systems or components they are considering
purchasing.
17Basic Data Flow Patterns
- Shuffles
- Sorting
- FFT
- Routing
- Gather/Scatter
- Conjugate Gradient
- MD and FE codes
- Sparse matrices
- Transpose
- FFT
- Sorting
- Tree
- Parallel prefix, Reduction
- Sorting
18HPC Challenge Benchmarks
- HPL
- DGEMM
- STREAM
- PTRANS
- FFTE
- RandomAccess
- Effective Bandwidth b_eff
Other names and brands may be claimed as the
property of others.
19Programming With Directed Graphs
- Arc
- Arc newArc(Node tail, Node head)
- AttachArc(DGraph dg)
- deleArc(Arc ar)
- Node
- newNode(char name)
- Node AttachNode(DGraph dg)
- deleteNode(Node nd)
- DGraph
- DGraph newDGraph(char name)
- writeGraph(DGraph dg, char fname)
- DGraph readGraph(char fname)
do_134
20Directed Graphs Around
- Parse trees
- File Systems
- Application task graphs
- Device Schematics
Visualization and layout Tools
- VCG tool
- Edge tool
- Tom Sawyer Software
- Commercial tools
21Cart3D
- Performs CFD analysis on complex geometries
- Uses six executables
- Intersect intersects geometry
- Cubes produces Cartesian meshes
- Reorder reorders meshes
- Mgprep coarsens mesh
- flowCart convergence acceleration
- Clic analyzes the flow
- Executables communicate via files
- Returns relevant forces
- Lift, Drag, Side Force
Other names and brands may be claimed as the
property of others.
22The NAS Grid Benchmarks
- Reflect task level programming paradigm
- Contain four patterns
- Embarrassingly Distributed (ED)
- Helical Chain (HC)
- Visualization Pipeline (VP)
- Mixed Bag (MB)
23Data Dependent Patterns
- Intermittent patterns
- Useful for application performance tuning
- Visualization is important
- Allows to employ human eye ability to detect
patterns - Automatic Pattern Mining
- OLAP approach
- MPI communication patterns
24Conclusions
- Data Flow in Applications
- Application Parallelization
- Application Understanding
- Application Mapping
- Application Performance