IFESTOS: A KB System for POEMS - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

IFESTOS: A KB System for POEMS

Description:

Four grid/mesh sizes (small, moderate, large, very large) ... Grid size. Solution Error in Max, L1, L2 norms. Total elapsed time for the post-processing module ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 16
Provided by: eliasnh
Category:
Tags: ifestos | poems | grid | system

less

Transcript and Presenter's Notes

Title: IFESTOS: A KB System for POEMS


1
IFESTOS A KB System for POEMS
  • Elias Houstis, John Rice, Ann Catlin,
  • Naren Ramakhrisnan, and V. Verykios
  • Purdue University
  • Department of Computer Sciences
  • August 98

2
IFESTOS Architecture diagram
3
IFESTOS Goals for POEMS KB
  • Predict the performance of a conceptual design by
    comparing it with the performance of existing
    designs/implementations and assuming some user
    defined computational goals and design features
  • Rank the various designs/implementations based
    on their performance data from well designed
    benchmarks with specific features and with
    respect to some range values of some performance
    indicators on
  • Estimate operational parameters of a new design
    based on the performance data of similar designs

4
PDE Application Benchmark (population, solvers,
and parameters) Features for POEMS KB Generation
  • Problem Population
  • A general elliptic PDE with a non rectangular
    domain leading to a non-symmetric large FD
    algebraic system
  • Two self-adjoint PDE for which FEM is applicable
    including a 3-D PDE problem
  • An elliptic PDE leading to a symmetric FD large
    system
  • SWEEP3D
  • Solvers
  • Finite Difference and Finite Element discretizers
  • At least 5 different domain decomposition
    algorithms that give significantly different
    partitionings of grid/mesh data
  • Four grid/mesh sizes (small, moderate, large,
    very large)
  • IIPACL Jacobi type, SOR type, and CG type
  • AZTEC routines

5
Machine Architectures
  • Purdues SP2 (16 processors)
  • Use 2, 4, 5 , 6, 8,12, 13, 16 processor
    configuration
  • National SP2 (Large Configuration)
  • LAN workstations
  • Simulator
  • Analytical Models

6
Numerical Solution Data Collected for Each PDE
Application Run
  • Boundary points found in the domain
  • Boundary pieces found in the domain
  • Grid size
  • Solution Error in Max, L1, L2 norms
  • Total elapsed time for the post-processing
    module

7
Application Performance Metrics Generated by
PELLPACK System (per-processor and per-run)
  • Domain processor module time
  • Discretization module time
  • Indexing time
  • Linear algebra solution module time
  • Communication time
  • Total elapsed time

8
SP2 System Performance Metrics
  • cpu_user_utilization cpu percentage allocated
    for the user (mean, std)
  • cpu_kernel_utilization cpu percentage allocated
    for the kernel (mean, std)
  • cpu_wait cpu percentage spent waiting (mean,
    std)
  • cpu_idle cpu percentage spent idling (mean,
    std)
  • cswitch the number of context or task switches
  • syscalls the number of calls made into kernel
    services
  • pagefaults the number of page faults
  • total_xfers the number of DMA transfers to all
    disks
  • blocks_read the number of blocks of data read
    from all disks
  • blocks_written the number of blocks of data
    written to all disks
  • ip_packets_rcvd the number of IP protocol
    packets received
  • ip_packets_sent the number of IP protocol
    packets sent
  • sending_time time spent sending data
  • receiving_time time spent receiving data
  • broadcasting_time time spent broadcasting data
  • barrier_time time spent in a barrier primitives
  • allreducing_time time spent in all reduce MPI
    primitives

9
CPU and Communication based performance profiles
of Application/Architecture pairs
  • Tcomp(p) global computation time vs. the no. of
    processors
  • Tcomm(p) global communication time vs. number
    ofprocessors
  • T(p) global execution time vs. no. of
    processors
  • S(p) speed up vs. no. of processors (S(p)
    T(1)/T(p))
  • E(p) efficiency vs. no. of processors (
    S(1)/p)
  • ?(p) efficacy vs. no. of processors ( S(p)2/p)
  • ?busyproc no. of busy processors vs. execution
    time
  • ?commproc no. of communicating processors vs.
    execution time
  • ?compproc no. of computing processors vs.
    execution time

10
Memory and I/O based Performance Profiles
  • Pagefaults vs. no. of processors
  • Total_xfers vs. no. of processors
  • Blocks_read vs. no. of procesrros
  • Blocks_written vs. no. of processors

11
Communication Overhead Profiles
  • avg total no. of ip_packets_rcvd vs. no. of
    processors
  • avg total no. of ip_packets_sent vs. no. of
    processors
  • avg sending_time vs. no. of processors
  • avg receiving_time vs. no. of processors
  • avg broadcasting_time vs. no. of processors
  • avg barrier_time vs. no. of processors
  • avg allreducing_time vs. no. of processors

12
Status of the IFESTOS project
  • IFESTOS Kernel (60)
  • KB1 Performance of SP2 architecture on PDE
    applications (20)
  • KB2 Performance of parallel linear solvers on
    large PDE disretization systems (10) on SP2

13
IFESTOS Goals for Linear Algebra Solvers KB
  • Predict the performance of linear solvers on new
    problems with features similar to those in the
    linear algebra benchmark population (i.e.,
    someone gives the size and characteristics of the
    system and then wants to find out the best solver
    to use)
  • Rank the various linear solvers over specific
    benchmarks with some (or all) of the features
    present (i.e., symmetric systems, FD system, FEM
    systems, non-symmetric, etc.) The ranking is made
    for all machine configurations.
  • Estimate the iteration parameters of linear
    solvers for given system based on the performance
    data of similar systems

14
Linear Algebra Benchmark Features (population,
solvers, and parameters) for Linear Solvers
  • Problem Population
  • 10 large, non-symmetric systems of 2D FD origin
  • 10 large, symmetric systems of 2D and 3D FEM
    origin
  • 10 large, symmetric systems of 2D and 3D FD
    large origin
  • Solvers
  • 5 domain decomposition algorithms with
    significantly different partitionings
  • All applicable ITPACK routines (some apply only
    to special type systems)
  • All applicable AZTEC routines
  • How about LAPACK?
  • Machines
  • SP2 with 2, 4, 8, 16 processors
  • SGU with 2, 4, 8, 16, 32 processors
  • NOW with 2, 4, 8, 16, 32 processors

15
IFESTOS Linear Algebra User Specific
Functionality
  • Select the best algorithm for users linear
    system.
  • The KB expects a list of features, an estimate of
    system size, and desired bounds for
    memory/execution time
  • The KB returns the name of algorithm, parameter
    values, and estimates for the needed resources
    included the machine configuration and an
    exemplar to explain its decision
  • Verify some assumptions or answer domain specific
    questions
  • Are iterative solvers better than direct solvers
    for large systems?
  • Is CG an efficient method for non-symmetric
    systems?
  • What is the best method for FD symmetric
    systems?
  • What is the best method for FEM systems?
Write a Comment
User Comments (0)
About PowerShow.com