Title: IFESTOS: A KB System for POEMS
1IFESTOS A KB System for POEMS
- Elias Houstis, John Rice, Ann Catlin,
- Naren Ramakhrisnan, and V. Verykios
- Purdue University
- Department of Computer Sciences
- August 98
2IFESTOS Architecture diagram
3IFESTOS Goals for POEMS KB
- Predict the performance of a conceptual design by
comparing it with the performance of existing
designs/implementations and assuming some user
defined computational goals and design features - Rank the various designs/implementations based
on their performance data from well designed
benchmarks with specific features and with
respect to some range values of some performance
indicators on - Estimate operational parameters of a new design
based on the performance data of similar designs
4PDE Application Benchmark (population, solvers,
and parameters) Features for POEMS KB Generation
- Problem Population
- A general elliptic PDE with a non rectangular
domain leading to a non-symmetric large FD
algebraic system - Two self-adjoint PDE for which FEM is applicable
including a 3-D PDE problem - An elliptic PDE leading to a symmetric FD large
system - SWEEP3D
- Solvers
- Finite Difference and Finite Element discretizers
- At least 5 different domain decomposition
algorithms that give significantly different
partitionings of grid/mesh data - Four grid/mesh sizes (small, moderate, large,
very large) - IIPACL Jacobi type, SOR type, and CG type
- AZTEC routines
5Machine Architectures
- Purdues SP2 (16 processors)
- Use 2, 4, 5 , 6, 8,12, 13, 16 processor
configuration - National SP2 (Large Configuration)
- LAN workstations
- Simulator
- Analytical Models
6Numerical Solution Data Collected for Each PDE
Application Run
- Boundary points found in the domain
- Boundary pieces found in the domain
- Grid size
- Solution Error in Max, L1, L2 norms
- Total elapsed time for the post-processing
module
7Application Performance Metrics Generated by
PELLPACK System (per-processor and per-run)
- Domain processor module time
- Discretization module time
- Indexing time
- Linear algebra solution module time
- Communication time
- Total elapsed time
8SP2 System Performance Metrics
- cpu_user_utilization cpu percentage allocated
for the user (mean, std) - cpu_kernel_utilization cpu percentage allocated
for the kernel (mean, std) - cpu_wait cpu percentage spent waiting (mean,
std) - cpu_idle cpu percentage spent idling (mean,
std) - cswitch the number of context or task switches
- syscalls the number of calls made into kernel
services - pagefaults the number of page faults
- total_xfers the number of DMA transfers to all
disks
- blocks_read the number of blocks of data read
from all disks - blocks_written the number of blocks of data
written to all disks - ip_packets_rcvd the number of IP protocol
packets received - ip_packets_sent the number of IP protocol
packets sent - sending_time time spent sending data
- receiving_time time spent receiving data
- broadcasting_time time spent broadcasting data
- barrier_time time spent in a barrier primitives
- allreducing_time time spent in all reduce MPI
primitives
9CPU and Communication based performance profiles
of Application/Architecture pairs
- Tcomp(p) global computation time vs. the no. of
processors - Tcomm(p) global communication time vs. number
ofprocessors - T(p) global execution time vs. no. of
processors - S(p) speed up vs. no. of processors (S(p)
T(1)/T(p)) - E(p) efficiency vs. no. of processors (
S(1)/p) - ?(p) efficacy vs. no. of processors ( S(p)2/p)
- ?busyproc no. of busy processors vs. execution
time - ?commproc no. of communicating processors vs.
execution time - ?compproc no. of computing processors vs.
execution time
10Memory and I/O based Performance Profiles
- Pagefaults vs. no. of processors
- Total_xfers vs. no. of processors
- Blocks_read vs. no. of procesrros
- Blocks_written vs. no. of processors
11Communication Overhead Profiles
- avg total no. of ip_packets_rcvd vs. no. of
processors - avg total no. of ip_packets_sent vs. no. of
processors - avg sending_time vs. no. of processors
- avg receiving_time vs. no. of processors
- avg broadcasting_time vs. no. of processors
- avg barrier_time vs. no. of processors
- avg allreducing_time vs. no. of processors
12Status of the IFESTOS project
- IFESTOS Kernel (60)
- KB1 Performance of SP2 architecture on PDE
applications (20) - KB2 Performance of parallel linear solvers on
large PDE disretization systems (10) on SP2
13IFESTOS Goals for Linear Algebra Solvers KB
- Predict the performance of linear solvers on new
problems with features similar to those in the
linear algebra benchmark population (i.e.,
someone gives the size and characteristics of the
system and then wants to find out the best solver
to use) - Rank the various linear solvers over specific
benchmarks with some (or all) of the features
present (i.e., symmetric systems, FD system, FEM
systems, non-symmetric, etc.) The ranking is made
for all machine configurations. - Estimate the iteration parameters of linear
solvers for given system based on the performance
data of similar systems
14Linear Algebra Benchmark Features (population,
solvers, and parameters) for Linear Solvers
- Problem Population
- 10 large, non-symmetric systems of 2D FD origin
- 10 large, symmetric systems of 2D and 3D FEM
origin - 10 large, symmetric systems of 2D and 3D FD
large origin - Solvers
- 5 domain decomposition algorithms with
significantly different partitionings - All applicable ITPACK routines (some apply only
to special type systems) - All applicable AZTEC routines
- How about LAPACK?
- Machines
- SP2 with 2, 4, 8, 16 processors
- SGU with 2, 4, 8, 16, 32 processors
- NOW with 2, 4, 8, 16, 32 processors
15IFESTOS Linear Algebra User Specific
Functionality
- Select the best algorithm for users linear
system. - The KB expects a list of features, an estimate of
system size, and desired bounds for
memory/execution time - The KB returns the name of algorithm, parameter
values, and estimates for the needed resources
included the machine configuration and an
exemplar to explain its decision - Verify some assumptions or answer domain specific
questions - Are iterative solvers better than direct solvers
for large systems? - Is CG an efficient method for non-symmetric
systems? - What is the best method for FD symmetric
systems? - What is the best method for FEM systems?