Title: ASKALON A Tool Set for Cluster
1ASKALONA Tool Set for Cluster and Grid Computing
Cracow03 Grid Workshop, Oct. 2003
T. Fahringer, A. Hofer, A. Jugravu,
S. Pllana, R. Prodan, C. Seragiotto, J.
Testori, H.-L. Truong, A. Villazon, M.
Welzl Institute for Computer Science University
of Innsbruck Thomas.Fahringer_at_uibk.univie.ac.at in
formatik.uibk.ac.at/dps
2Outline
- ASKALON Overview
- Performance Analysis and the Grid
- Automatic Experiment Management
- JavaSymphony A New Programming Method
for the Grid - Summary
3A Tool Set for Cluster and Grid Architectures
ASKALON
informatik.uibk.ac.at/dps
Zenturio
- Parameter Studies
- Performance Studies
- Experiment
- Management
- Software Testing
Performance Experiment Program Machine Database
- Programming
- Paradigms
- MPI,GlobusMPI
- OpenMP/MPI
- HPF/OpenMP
- JavaSymphony
- Architectures
- NOWs
- PC-Clusters
- SMP Clusters
- GRID Systems
- DM/SM Systems
3
4ASKALONWeb Services
Registry
Application Compilation Command
Execution Command Machine
Factory
SCALEA User Portal
Performance Analyzer
SIS Instrumentor
ZENTURIO User Portal
Experiment Generator
Performance Estimator
Middleware
AKSUM User Portal
Performance Property Analyzer
Search Engine
Service Sites
ASKALON Visualization Diagrams
PROPHET User Portal
Overhead Analyzer
Experiment Executor
Factory
ASKALON DataRepository
Scheduler
Compute Site
5Performance Analysis for the Grid
- so far mostly low level analysis
- monitoring, instrumentation, analysis for the
Grid infrastructure - but not for applications
- lots of low level performance data and
visualization - lack of high-level summary information
- difficult to associate data with specific
middleware components and applications
6low level performance analysis
7low level performance analysis
8Performance Analysis and Interpretation
P1
P1
P2
P2
P3
P3
P4
P4
time line
P1
P1
P2
P2
P3
P3
P4
P4
9Performance Analysis for the Grid
- next steps
- higher level analysis
- performance analysis for the Grid and its
applications (single-entry single exit regions) - summaries instead of details
- problems and interpretation instead of raw data
- combined Grid performance analysis for (SCALEA,
- network
AKSUM) - site
- application
- customizable tools instead of hard-coded analysis
- multi-experiment instead of single-experiment
analysis - online and scalable performance analysis
10Existing Performance Tools Problems
- produce vast amount of data without
interpretation - performance data is not related to input program
- lack guidance to essential problems
- focus on single experiments
Provide a semi-automatic performance analysis
tool that detects performance problems for
varying problem and machine sizes.
11Aksum A Tool for Semi-Automatic
Multi-Experiment Performance Analysis
- user-provided problem and machine sizes
- automated instrumentation, experiment management,
performance interpretation, and search for
performance bottlenecks - performance analysis for single-entry single-exit
regions - performance problems related to the program
- targets OpenMP/MPI, and mixed programs
- customizable (build your own performance tool)
- API for performance overheads
- define performance problems and code regions of
interest - influence the search (strategy, time, code
regions)
12Aksum Architecture
Control and data flow
Data flow
Instrumentationengine
Experimentengine
Searchengine
Experimentmanager (Zenturio)
Userportal
- Application files
- Command lines
- Target machines
Instrumentation system(Scalea)
13Specification of Performance Problems with JavaPSL
- JavaPSL is a
- API for the specification of performance
problems. - high-level interface for raw performance data.
- pre-defined and user-defined JavaPSL problems
- performance problems as values between 0 and 1
(interpretation)
public class SynchronizationOverhead implements
Property private float severity public
SynchronizationOverhead( DynamicCodeRegion
d, ReferenceDynamicCodeRegion r) severity
(float)d.getSynchronizationOverhead()
/ r.getExecutionTime() public boolean
holds( ) return severity gt 0 public float
getSeverity( ) return severity public float
getConfidence( ) return 1
14Property hierarchy
- Defines evaluationorder of performanceproperties
- Predefined hierarchies
- OpenMP, MPI, mixed mode
- Can be customized
- Each node has
- a threshold (property instances with severity
less than the threshold are discarded) - reference code region
- bean properties
15Application files and compilation
- Application files
- instrumentable(source codes)
- non-instrumentable(e.g. Makefiles)
- Compilation data
- command-line
- compilation dir
16Property Hierarchy (first levels)
DataMovementOverhead
...
SynchronizationOverhead
...
ParallelInefficiency
ControlOfParallelismOverhead
...
Inefficiency
LoadImbalance
...
ImperfectFloatingPointBehavior
SerialInefficiency
ImperfectCacheBehavior
17Property Hierarchy
18"Instrumentable" files
- Instrumentation can be restricted to specific
code regions.
19Application parameters
- Strings to besubstituted insome or allof the
input files - Mapped toZEN directivesin the inputfiles
- Basis for experiment generation and execution
done by ZENTURIO
20Case study LAPW0 material science code
21Case study LAPW0 (views)
22Case study LAPW0 (charts)
23Outline
- Performance Analysis and the Grid
- Automatic Experiment Management
- JavaSymphony A New Programming Method
for the Grid - Summary
24Management of Experiments and Parameter Studies
- Currently scientists
- manually create parameter studies
- manage many different sets of input data
- launch large number of compilations and
executions - administer result files
- invoke performance analysis tools
- interpret/visualize performance and parameter
results, etc. - This is a tedious, error-prone, and time
consuming process.
25ZENTURIO An Automatic Experiment Management
Framework for Cluster and Grid Architectures
- Support for scientists to semi-automatically
conduct large sets of - parameter studies
- throughput versus high-performance computing
- performance studies
- software tests
- on cluster and Grid architectures.
26ZENTURIO A Web Service based Architecture
Registry Service
application
User Portal
compilation execution command
Experiment Preparation
Middleware
machine
27Application Parameters and Value Sets
- Performance and parameter results depend on
application parameters and their value sets. - machine sizes x CPUs, y Grid sites,
- problem sizes x atoms, matrix size,
- program variables 1,2,3,161102
- data distributions block, cyclic,
- loop scheduling strategies static, guided,
- communication networks Myrinet, FastEthernet,
- input/output file names, etc.
- An Experiment is defined by its sources with
every application parameter replaced by a
specific value.
28ZEN Directive-based LanguageSpecification of
Arbitrary Complex Experiments
- Set of directives to specify value sets of
interest for arbitrary application parameters. - Directives
- assignment
- substitute
- constraint
- performance
- Annotation of arbitrary source/input files
- program files, Makefiles, scripts, input files,
etc. - ZENTURIO generates sources for every different
experiment based on ZEN directives.
29LAPW0 Machine Size Globus RSL script
count2
count3
count4
...
-
- (
- (resourceManagerContact gescher/jobmanager-pbs
) - (ZEN SUBSTITUTE count\4 count240 )
- (count4)
- (jobtypempi)
- (directory/home/radu/APPS/LAPW0)
- (executable../SRC/lapw0)
- (argumentslapw0.def)
- )
count40
30Problem size lapw0.def
- 4, 'znse_6.inm', 'unknown', 'formatted', 0
- !ZEN SUBSTITUTE ktp_.125hour.clmsum
ktp_.125hour.clmsum, - ktp_.25hour.clmsum, ktp_.5hour.clmsum,
ktp_1hour.clmsum - 8, 'ktp_.125hour.clmsum', 'old', 'formatted', 0
- !ZEN SUBSTITUTE ktp_.125hour.struct
ktp_.125hour.struct, ktp_.25hour.struct,
ktp_.5hour.struct, ktp_1hour.struct - 20, 'ktp_.125hour.struct', 'old', 'formatted', 0
- 58, 'znse_6.vint', 'unknown','formatted', 0
- !ZEN CONSTRAINT INDEX ktp_.125hour.clmsum
ktp_.125hour.struct
ktp_.125hour.clmsum
ktp_.25hour.clmsum
ktp_.5hour.clmsum
ktp_.1hour.clmsum
ktp_.125hour.struct
ktp_.25hour.struct
ktp_.5hour.struct
ktp_.1hour.struct
31LAPW0 Machine Size PBS script
- !ZEN SUBSTITUTE nodes\1 nodes110
- PBS -l walltime02900,nodes1fourprocppn4
- PBS -N lapw0
- cd PBS_O_WORKDIR
- !ZEN ASSIGN MPIRUN /opt/local/mpich/bin/mpir
un, /opt/local/mpich_gm/bin/mpirun.ch_gm - !ZEN SUBSTITUTE no_procs 140
- MPIRUN -np no_procs ../SRC/lapw0 lapw0.def
- !ZEN CONSTRAINT INDEX 4 ( nodes\1 - 1 ) lt
no_procs no_procs lt 4 nodes\1
no_procs ! 1
32ZEN Performance Behaviour Directive
!ZEN CR CR_P, CR_L PERF WTIME, ODATA . .
. !ZEN CR CR_OMPDO, CR_CALLS PERF WTIME, OSYNC
BEGIN !OMP DO SCHEDULE(STATIC) . . . !OMP
END DO NOWAIT !OMP BARRIER !ZEN END CR
- request performance data for arbitrary code
regions - CR_P entire program
- CR_L all loops
- CR_OMPDO OpenMP do regions
- CR_CALLS procedure calls
- WTIME execution time
- ODATA data movement
- OSYNC synchronisation
- 50 code region mnemonics
- 40 performance metrics
- supported by SCALEA
33ExperimentPreparation
34ZENTURIO User Portal
35ApplicationDataVisualiser(ADV)
36Scalability Fast Ethernet
37Performance Overheads 8 Atoms, Myrinet
38Backward PricingTotal Price Evolution
39JavaSymphonyHigh-Level Object-Oriented
Programming of Grid Applications
- JavaSymphony (100 Java) - new object-oriented
programming paradigm of concurrent and
distributed systems - portability
- higher level programming
- simple access to resources
- explicit control of locality and parallelism
- performance-oriented
- JavaSymphony programming model
- dynamic virtual architectures (VAs)
- API for system parameters
- single- and multi-threaded remote distributed
objects - distribution/migration of objects and code
- asynchronous und one-sided (remote) method
invocation - synchronization and events (distributed)
- And all of that without programming RMI, sockets,
and threads!
40Summary
- Performance analysis for the Grid
- higher-level analysis, performance
interpretation, multi-experiments, automatic,
customizable, - high-level performance instrumentation interface
- standardization of performance data
- Multi-Experiment Performance Analysis and
Parameter studies for the Grid - request for arbitrary number of experiments
- automatic management of experiments
- fault tolerance, events
- combine with schedulers and performance tools
- JavaSymphony A new Programming Model
for Grid Applications - Explicit control of locality, parallelism, and
load balancing at a high level - dynamic virtual architectures, events,
synchronization, migration, multi-threaded
objects, asynchronous/snychronour/one-sided
remote methods - no RMI, socket or thread programming
41Current and Future work
- Extend SCALEA and Aksum for the Grid.
- switch to a stable compiler frontend for Fortran,
C, C, and Java - software performance engineering for the Grid
42A Tool Set for Cluster and Grid Architectures
ASKALON
informatik.uibk.ac.at/dps
Zenturio
- Parameter Studies
- Performance Studies
- Experiment
- Management
- Software Testing
Performance Experiment Program Machine Database
- Architectures
- NOWs
- PC-Clusters
- SMP Clusters
- GRID Systems
- DM/SM Systems
- Programming
- Paradigms
- MPI,GlobusMPI
- OpenMP/MPI
- HPF/OpenMP
- JavaSymphony
University of Innsbruck/ Institute for Computer
Science / T. Fahringer
42