Allen D' Malony

About This Presentation

Title:

Allen D' Malony

Description:

University of Oregon. Integrating Performance Analysis in. Complex Scientific Software: ... scientific component developers can concentrate on performance ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 57

Provided by: allend7

Learn more at: http://www.cs.uoregon.edu

Category:

more less

Transcript and Presenter's Notes

Title: Allen D' Malony

1
Integrating Performance Analysis inComplex
Scientific SoftwareExperiences with theUintah
Computational Framework

Allen D. Malony
malony_at_cs.uoregon.edu
Department of Computer and Information Science
Computational Science Institute
University of Oregon

2
Acknowledgements

Sameer Shende, Robert BellUniversity of Oregon
Steven Parker, J. Dav de St.-Germain, and Alan
MorrisUniversity of Utah
Department of Energy (DOE), ASCI Academic
Strategic Alliances Program (ASAP)
Center for Simulation of Accidental Fires
andExplosions (C-SAFE), ASCI/ASAP Level 1
center, University of Utah, http//www.csafe.utah
.edu
Computational Science Institute, ASCI/ASAPLevel
3 projects with LLNL / LANL,University of
Oregon, http//www.csi.uoregon.edu

3
Complex Parallel Systems

Complexity in computing system architecture
Diverse parallel system architectures
shared / distributed memory, cluster, hybrid,
NOW, Grid,
Sophisticated processor and memory architectures
Advanced network interface and switching
architecture
Specialization of hardware components
Complexity in parallel software environment
Diverse parallel programming paradigms
shared memory multi-threading, message passing,
hybrid
Hierarchical, multi-level software architectures
Optimizing compilers and sophisticated runtime
systems
Advanced numerical libraries and application
frameworks

4
Complexity Drives Performance Need / Technology

Observe/analyze/understand performance behavior
Multiple levels of software and hardware
Different types and detail of performance data
Alternative performance problem solving methods
Multiple targets of software and system
application
Robust AND ubiquitous performance technology
Broad scope of performance observability
Flexible and configurable mechanisms
Technology integration and extension
Cross-platform portability
Open, layered, and modular framework architecture

5
What is Parallel Performance Technology?

Performance instrumentation tools
Different program code levels
Different system levels
Performance measurement (observation) tools
Profiling and tracing of SW/HW performance events
Different software (SW) and hardware (HW) levels
Performance analysis tools
Performance data analysis and presentation
Online and offline tools
Performance experimentation and data management
Performance modeling and prediction tools

6
Complexity Challenges for Performance Tools

Computing system environment complexity
Observation integration and optimization
Access, accuracy, and granularity constraints
Diverse/specialized observation
capabilities/technology
Restricted modes limit performance problem
solving
Sophisticated software development environments
Programming paradigms and performance models
Performance data mapping to software abstractions
Uniformity of performance abstraction across
platforms
Rich observation capabilities and flexible
configuration
Common performance problem solving methods

7
General Problems

How do we create robust and ubiquitous
performance technology for the analysis and
tuning of parallel and distributed software and
systems in the presence of (evolving) complexity
challenges?
How do we apply performance technology
effectively for the variety and diversity of
performance problems that arise in the context of
complex parallel and distributed computer systems?

?
8
Scientific Software Engineering

Modern scientific simulation software is complex
Large development teams of diverse expertise
Simultaneous development on different system
parts
Iterative, multi-stage, long-term software
development
Need support for managing complex software
process
Software engineering tools for revision control,
automated testing, and bug tracking are
commonplace
Tools for HPC performance engineering are not
evaluation (measurement, analysis, benchmarking)
optimization (diagnosis, tracking, prediction,
tuning)
Incorporate performance engineering methodology
and support by flexible and robust performance
tools

9
Computation Model for Performance Technology

How to address dual performance technology goals?
Robust capabilities widely available
methodologies
Contend with problems of system diversity
Flexible tool composition/configuration/integratio
n
Approaches
Restrict computation types / performance problems
limited performance technology coverage
Base technology on abstract computation model
general architecture and software execution
features
map features/methods to existing complex system
types
develop capabilities that can adapt and be
optimized

10
General Complex System Computation Model

Node physically distinct shared memory machine
Message passing node interconnection network
Context distinct virtual memory space within
node
Thread execution threads (user/system) in context

Interconnection Network
Inter-node messagecommunication

Node
Node
Node
node memory
memory
memory
SMP
physicalview
VM space

modelview

Context
Threads
11
Framework for Performance Problem Solving

Model-based performance technology
Instrumentation / measurement / execution models
performance observability constraints
performance data types and events
Analysis / presentation model
performance data processing
performance views and model mapping
Integration model
performance tool component configuration /
integration
Can a performance problem solving framework be
designed based on a general complex system model
and with a performance technology model approach?

12
TAU Performance System Framework

Tuning and Analysis Utilities
Performance system framework for scalable
parallel and distributed high-performance
computing
Targets a general complex system computation
model
nodes / contexts / threads
Multi-level system / software / parallelism
Measurement and analysis abstraction
Integrated toolkit for performance
instrumentation, measurement, analysis, and
visualization
Portable performance profiling/tracing facility
Open software approach

13
TAU Performance System Architecture
Paraver
EPILOG
14
Pprof Output (NAS Parallel Benchmark LU)

Intel Quad PIII Xeon, RedHat, PGI F90
F90 MPICH
Profile for Node Context Thread
Application events and MPI events

15
jRacy (NAS Parallel Benchmark LU)
Routine profile across all nodes
n node c context t thread
Global profiles
Individual profile
16
TAU PAPI (NAS Parallel Benchmark LU )

Floating point operations
Replaces execution time
Only requiresre-linking to different TAU library

17
TAU Vampir (NAS Parallel Benchmark LU)
Callgraph display
Timeline display
Parallelism display
Communications display
18
Utah ASCI/ASAP Level 1 Center (C-SAFE)

C-SAFE was established to build a problem-solving
environment (PSE) for the numerical simulation of
accidental fires and explosions
Fundamental chemistry and engineering physics
models
Coupled with non-linear solvers, optimization,
computational steering, visualization, and
experimental data verification
Very large-scale simulations
Computer science problems
Coupling of multiple simulation codes
Software engineering across diverse expert teams
Achieving high performance on large-scale systems

19
Example C-SAFE Simulation Problems
?
Heptane fire simulation
Typical C-SAFE simulation with a billion degrees
of freedom and non-linear time dynamics
Material stress simulation
20
Uintah Problem Solving Environment

Enhanced SCIRun PSE
Pure dataflow to component-based
Shared memory to scalable multi-/mixed-mode
parallelism
Interactive only to interactive and standalone
Design and implement Uintah component
architecture
Application programmers provide
description of computation (tasks and variables)
code to perform task on single patch
(sub-region of space)
Follow Common Component Architecture (CCA) model
Design and implement Uintah Computational
Framework (UCF) on top of the component
architecture

21
Uintah High-Level Component View
22
Uintah Parallel Component Architecture
23
Uintah Computational Framework

Execution model based on software (macro)
dataflow
Exposes parallelism and hides data transport
latency
Computations expressed a directed acyclic graphs
of tasks
consumes input and produces output (input to
future task)
input/outputs specified for each patch in a
structured grid
Abstraction of global single-assignment memory
DataWarehouse
Directory mapping names to values (array
structured)
Write value once then communicate to awaiting
tasks
Task graph gets mapped to processing resources
Communications schedule approximates global
optimal

24
Uintah Task Graph (Material Point Method)

Diagram of named tasks (ovals) and data (edges)
Imminent computation
Dataflow-constrained
MPM
Newtonian material point motion time step
Solid values defined at material point
(particle)
Dashed values defined at vertex (grid)
Prime () values updated during time step

25
Example Taskgraphs (MPM and Coupled)
26
Taskgraph Advantages

Accommodates flexible integration needs
Accommodates a wide range of unforeseen work
loads
Accommodates a mix of static and dynamic load
balance
Manage complexity of mixed-mode programming
Avoids unnecessary transport abstraction
overheads
Simulation time/space coupling
Allows uniform abstraction for coordinating
coupled models time and grid scales
Allows application components and framework
infrastructure (e.g., scheduler) to evolve
independently

27
Uintah PSE

UCF automatically sets up
Domain decomposition
Inter-processor communication with
aggregation/reduction
Parallel I/O
Checkpoint and restart
Performance measurement and analysis (stay tuned)
Software engineering
Coding standards
CVS (Commits Y3 - 26.6 files/day, Y4 - 29.9
files/day)
Correctness regression testing with bugzilla bug
tracking
Nightly build (parallel compiles)
170,000 lines of code (Fortran and C tasks
supported)

28
Performance Technology Integration

Uintah present challenges to performance
integration
Software diversity and structure
UCF middleware, simulation code modules
component-based hierarchy
Portability objectives
cross-language and cross-platform
multi-parallelism thread, message passing, mixed
Scalability objectives
High-level programming and execution abstractions
Requires flexible and robust performance
technology
Requires support for performance mapping

29
Performance Analysis Objectives for Uintah

Micro tuning
Optimization of simulation code (task) kernels
for maximum serial performance
Scalability tuning
Identification of parallel execution bottlenecks
overheads scheduler, data warehouse,
communication
load imbalance
Adjustment of task graph decomposition and
scheduling
Performance tracking
Understand performance impacts of code
modifications
Throughout course of software development
C-SAFE application and UCF software

30
Uintah Performance Engineering Approach

Contemporary performance methodology focuses on
control flow (function) level measurement and
analysis
C-SAFE application involves coupled-models with
task-based parallelism and dataflow control
constraints
Performance engineering on algorithmic (task)
basis
Observe performance based on algorithm (task)
semantics
Analyze task performance characteristics in
relation to other simulation tasks and UCF
components
scientific component developers can concentrate
on performance improvement at algorithmic level
UCF developers can concentrate on bottlenecks not
directly associated with simulation module code

31
Task Execution in Uintah Parallel Scheduler

Profile methods and functions in scheduler and in
MPI library

Task execution time dominates (what task?)
Task execution time distribution
MPI communication overheads (where?)

Need to map performance data!

32
Semantics-Based Performance Mapping

Associate performance measurements with
high-level semantic abstractions
Need mapping support in the performance
measurement system to assign data correctly

33
Hypothetical Mapping Example

Particles distributed on surfaces of a cube

Particle PMAX / Array of particles / int
GenerateParticles() / distribute particles
over all faces of the cube / for (int face0,
last0 face lt 6 face) / particles on
this face / int particles_on_this_face
num(face) for (int ilast i lt
particles_on_this_face i) / particle
properties are a function of face / Pi
... f(face) ... last
particles_on_this_face
34
Hypothetical Mapping Example (continued)
int ProcessParticle(Particle p) / perform
some computation on p / int main()
GenerateParticles() / create a list of
particles / for (int i 0 i lt N i) /
iterates over the list / ProcessParticle(Pi)

How much time is spent processing face i
particles?
What is the distribution of performance among
faces?
How is this determined if execution is parallel?

35
Semantic Entities/Attributes/Associations (SEAA)

New dynamic mapping scheme (S. Shende, Ph.D.
thesis)
Contrast with ParaMap (Miller and Irvin)
Entities defined at any level of abstraction
Attribute entity with semantic information
Entity-to-entity associations
Two association types (implemented in TAU API)
Embedded extends data structure of associated
object to store performance measurement entity
External creates an external look-up table
using address of object as the key to locate
performance measurement entity

36
No Performance Mapping versus Mapping

Typical performance tools report performance with
respect to routines
Does not provide support for mapping

Performance tools with SEAA mapping can observe
performance with respect to scientists
programming and problem abstractions

TAU (w/ mapping)
TAU (no mapping)
37
Uintah Task Performance Mapping

Uintah partitions individual particles across
processing elements (processes or threads)
Simulation tasks in task graph work on particles
Tasks have domain-specific character in the
computation
interpolate particles to grid in Material Point
Method
Task instances generated for each partitioned
particle set
Execution scheduled with respect to task
dependencies
How to attributed execution time among different
tasks
Assign semantic name (task type) to a task
instance
SerialMPMinterpolateParticleToGrid
Map TAU timer object to (abstract) task (semantic
entity)
Look up timer object using task type (semantic
attribute)
Further partition along different domain-specific
axes

38
Task Performance Mapping Instrumentation

void MPISchedulerexecute(const ProcessorGroup
pc,
DataWarehouseP old_dw, DataWarehouseP
dw )
...
TAU_MAPPING_CREATE(
task-gtgetName(), "MPISchedulerexecute()",
(TauGroup_t)(void)task-gtgetName(),
task-gtgetName(), 0)
...
TAU_MAPPING_OBJECT(tautimer)
TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void)task
-gtgetName())
// EXTERNAL ASSOCIATION
...
TAU_MAPPING_PROFILE_TIMER(doitprofiler,
tautimer, 0)
TAU_MAPPING_PROFILE_START(doitprofiler,0)
task-gtdoit(pc)
TAU_MAPPING_PROFILE_STOP(0)
...

39
Task Performance Mapping (Profile)
Mapped task performance across processes
Performance mapping for different tasks
40
Task Performance Mapping (Trace)
Work packet computation events colored by task
type
Distinct phases of computation can be identifed
based on task
41
Task Performance Mapping (Trace - Zoom)
Startup communication imbalance
42
Task Performance Mapping (Trace - Parallelism)
Communication / load imbalance
43
Comparing Uintah Traces for Scalability Analysis
44
Scaling Performance Optimizations
Last year initial correct scheduler
Reduce communication by 10 x
Reduce task graph overhead by 20 x
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
45
Scalability to 2000 Processors (Fall 2001)
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
46
Performance Tracking and Reporting

Integrated performance measurement allows
performance analysis throughout development
lifetime
Applied performance engineering in software
design and development (software engineering)
process
Create performance portfolio from regular
performance experimentation (coupled with
software testing)
Use performance knowledge in making key software
design decision, prior to major development
stages
Use performance benchmarking and regression
testing to identify irregularities
Support automatic reporting of performance bugs
Cross-platform (cross-generation) evaluation

47
XPARE - eXPeriment Alerting and REporting

Experiment launcher automates measurement /
analysis
Configuration and compilation of performance
tools
Uintah instrumentation control for experiment
type
Multiple experiment execution
Performance data collection, analysis, and
storage
Integrated in Uintah software testing harness
Reporting system conducts performance regression
tests
Apply performance difference thresholds (alert
ruleset)
Alerts users via email if thresholds have been
exceeded
Web alerting setup and full performance data
reporting
Historical performance data analysis

48
XPARE System Architecture
Experiment Launch
Performance Database
Performance Reporter
Comparison Tool
Regression Analyzer
Alerting Setup
49
Alerting Setup
50
Experiment Results Viewing Selection
51
Web-Based Experiment Reporting
52
Web-Based Experiment Reporting (continued)
53
Web-Based Experiment Reporting (continued)
54
Performance Analysis Tool Integration

Complex systems pose challenging performance
analysis problems that require robust
methodologies and tools
New performance problems will arise
Instrumentation and measurement
Data analysis and presentation
Diagnosis and tuning
No one performance tool can address all concerns
Look towards an integration of performance
technologies
Support to link technologies to create
performance problem solving environments
Performance engineering methodology and tool
integration with software design and development
process

55
Integrated Performance Evaluation Environment
56
References

A. Malony and S. Shende, Performance Technology
for Complex Parallel and Distributed Systems,
Proc. 3rd Workshop on Parallel and Distributed
Systems (DAPSYS), pp. 37-46, Aug. 2000.
S. Shende, A. Malony, and R. Ansell-Bell,
Instrumentation and Measurement Strategies for
Flexible and Portable Empirical Performance
Evaluation, Proc. Intl. Conf. on Parallel and
Distributed Processing Techniques and
Applications (PDPTA), CSREA, pp. 1150-1156, July
2001.
S. Shende, The Role of Instrumentation and
Mapping in Performance Measurement, Ph.D.
Dissertation, Univ. of Oregon, Aug. 2001.
J. de St. Germain, A. Morris, S. Parker, A.
Malony, and S. Shende, Integrating Performance
Analysis in the Uintah Software Development
Cycle, ISHPC 2002, Nara, Japan, May, 2002.

Write a Comment

User Comments (0)

About PowerShow.com

Allen D' Malony - PowerPoint PPT Presentation

Allen D' Malony

University of Oregon. Integrating Performance Analysis in. Complex Scientific Software: ... scientific component developers can concentrate on performance ... – PowerPoint PPT presentation