Allen D' Malony - PowerPoint PPT Presentation

About This Presentation
Title:

Allen D' Malony

Description:

University of Oregon. Integrating Performance Analysis in. Complex Scientific Software: ... scientific component developers can concentrate on performance ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 57
Provided by: allend7
Category:

less

Transcript and Presenter's Notes

Title: Allen D' Malony


1
Integrating Performance Analysis inComplex
Scientific SoftwareExperiences with theUintah
Computational Framework
  • Allen D. Malony
  • malony_at_cs.uoregon.edu
  • Department of Computer and Information Science
  • Computational Science Institute
  • University of Oregon

2
Acknowledgements
  • Sameer Shende, Robert BellUniversity of Oregon
  • Steven Parker, J. Dav de St.-Germain, and Alan
    MorrisUniversity of Utah
  • Department of Energy (DOE), ASCI Academic
    Strategic Alliances Program (ASAP)
  • Center for Simulation of Accidental Fires
    andExplosions (C-SAFE), ASCI/ASAP Level 1
    center, University of Utah, http//www.csafe.utah
    .edu
  • Computational Science Institute, ASCI/ASAPLevel
    3 projects with LLNL / LANL,University of
    Oregon, http//www.csi.uoregon.edu

3
Complex Parallel Systems
  • Complexity in computing system architecture
  • Diverse parallel system architectures
  • shared / distributed memory, cluster, hybrid,
    NOW, Grid,
  • Sophisticated processor and memory architectures
  • Advanced network interface and switching
    architecture
  • Specialization of hardware components
  • Complexity in parallel software environment
  • Diverse parallel programming paradigms
  • shared memory multi-threading, message passing,
    hybrid
  • Hierarchical, multi-level software architectures
  • Optimizing compilers and sophisticated runtime
    systems
  • Advanced numerical libraries and application
    frameworks

4
Complexity Drives Performance Need / Technology
  • Observe/analyze/understand performance behavior
  • Multiple levels of software and hardware
  • Different types and detail of performance data
  • Alternative performance problem solving methods
  • Multiple targets of software and system
    application
  • Robust AND ubiquitous performance technology
  • Broad scope of performance observability
  • Flexible and configurable mechanisms
  • Technology integration and extension
  • Cross-platform portability
  • Open, layered, and modular framework architecture

5
What is Parallel Performance Technology?
  • Performance instrumentation tools
  • Different program code levels
  • Different system levels
  • Performance measurement (observation) tools
  • Profiling and tracing of SW/HW performance events
  • Different software (SW) and hardware (HW) levels
  • Performance analysis tools
  • Performance data analysis and presentation
  • Online and offline tools
  • Performance experimentation and data management
  • Performance modeling and prediction tools

6
Complexity Challenges for Performance Tools
  • Computing system environment complexity
  • Observation integration and optimization
  • Access, accuracy, and granularity constraints
  • Diverse/specialized observation
    capabilities/technology
  • Restricted modes limit performance problem
    solving
  • Sophisticated software development environments
  • Programming paradigms and performance models
  • Performance data mapping to software abstractions
  • Uniformity of performance abstraction across
    platforms
  • Rich observation capabilities and flexible
    configuration
  • Common performance problem solving methods

7
General Problems
  • How do we create robust and ubiquitous
    performance technology for the analysis and
    tuning of parallel and distributed software and
    systems in the presence of (evolving) complexity
    challenges?
  • How do we apply performance technology
    effectively for the variety and diversity of
    performance problems that arise in the context of
    complex parallel and distributed computer systems?

?
8
Scientific Software Engineering
  • Modern scientific simulation software is complex
  • Large development teams of diverse expertise
  • Simultaneous development on different system
    parts
  • Iterative, multi-stage, long-term software
    development
  • Need support for managing complex software
    process
  • Software engineering tools for revision control,
    automated testing, and bug tracking are
    commonplace
  • Tools for HPC performance engineering are not
  • evaluation (measurement, analysis, benchmarking)
  • optimization (diagnosis, tracking, prediction,
    tuning)
  • Incorporate performance engineering methodology
    and support by flexible and robust performance
    tools

9
Computation Model for Performance Technology
  • How to address dual performance technology goals?
  • Robust capabilities widely available
    methodologies
  • Contend with problems of system diversity
  • Flexible tool composition/configuration/integratio
    n
  • Approaches
  • Restrict computation types / performance problems
  • limited performance technology coverage
  • Base technology on abstract computation model
  • general architecture and software execution
    features
  • map features/methods to existing complex system
    types
  • develop capabilities that can adapt and be
    optimized

10
General Complex System Computation Model
  • Node physically distinct shared memory machine
  • Message passing node interconnection network
  • Context distinct virtual memory space within
    node
  • Thread execution threads (user/system) in context

Interconnection Network
Inter-node messagecommunication


Node
Node
Node
node memory
memory
memory
SMP
physicalview
VM space

modelview

Context
Threads
11
Framework for Performance Problem Solving
  • Model-based performance technology
  • Instrumentation / measurement / execution models
  • performance observability constraints
  • performance data types and events
  • Analysis / presentation model
  • performance data processing
  • performance views and model mapping
  • Integration model
  • performance tool component configuration /
    integration
  • Can a performance problem solving framework be
    designed based on a general complex system model
    and with a performance technology model approach?

12
TAU Performance System Framework
  • Tuning and Analysis Utilities
  • Performance system framework for scalable
    parallel and distributed high-performance
    computing
  • Targets a general complex system computation
    model
  • nodes / contexts / threads
  • Multi-level system / software / parallelism
  • Measurement and analysis abstraction
  • Integrated toolkit for performance
    instrumentation, measurement, analysis, and
    visualization
  • Portable performance profiling/tracing facility
  • Open software approach

13
TAU Performance System Architecture
Paraver
EPILOG
14
Pprof Output (NAS Parallel Benchmark LU)
  • Intel Quad PIII Xeon, RedHat, PGI F90
  • F90 MPICH
  • Profile for Node Context Thread
  • Application events and MPI events

15
jRacy (NAS Parallel Benchmark LU)
Routine profile across all nodes
n node c context t thread
Global profiles
Individual profile
16
TAU PAPI (NAS Parallel Benchmark LU )
  • Floating point operations
  • Replaces execution time
  • Only requiresre-linking to different TAU library

17
TAU Vampir (NAS Parallel Benchmark LU)
Callgraph display
Timeline display
Parallelism display
Communications display
18
Utah ASCI/ASAP Level 1 Center (C-SAFE)
  • C-SAFE was established to build a problem-solving
    environment (PSE) for the numerical simulation of
    accidental fires and explosions
  • Fundamental chemistry and engineering physics
    models
  • Coupled with non-linear solvers, optimization,
    computational steering, visualization, and
    experimental data verification
  • Very large-scale simulations
  • Computer science problems
  • Coupling of multiple simulation codes
  • Software engineering across diverse expert teams
  • Achieving high performance on large-scale systems

19
Example C-SAFE Simulation Problems
?
Heptane fire simulation
Typical C-SAFE simulation with a billion degrees
of freedom and non-linear time dynamics
Material stress simulation
20
Uintah Problem Solving Environment
  • Enhanced SCIRun PSE
  • Pure dataflow to component-based
  • Shared memory to scalable multi-/mixed-mode
    parallelism
  • Interactive only to interactive and standalone
  • Design and implement Uintah component
    architecture
  • Application programmers provide
  • description of computation (tasks and variables)
  • code to perform task on single patch
    (sub-region of space)
  • Follow Common Component Architecture (CCA) model
  • Design and implement Uintah Computational
    Framework (UCF) on top of the component
    architecture

21
Uintah High-Level Component View
22
Uintah Parallel Component Architecture
23
Uintah Computational Framework
  • Execution model based on software (macro)
    dataflow
  • Exposes parallelism and hides data transport
    latency
  • Computations expressed a directed acyclic graphs
    of tasks
  • consumes input and produces output (input to
    future task)
  • input/outputs specified for each patch in a
    structured grid
  • Abstraction of global single-assignment memory
  • DataWarehouse
  • Directory mapping names to values (array
    structured)
  • Write value once then communicate to awaiting
    tasks
  • Task graph gets mapped to processing resources
  • Communications schedule approximates global
    optimal

24
Uintah Task Graph (Material Point Method)
  • Diagram of named tasks (ovals) and data (edges)
  • Imminent computation
  • Dataflow-constrained
  • MPM
  • Newtonian material point motion time step
  • Solid values defined at material point
    (particle)
  • Dashed values defined at vertex (grid)
  • Prime () values updated during time step

25
Example Taskgraphs (MPM and Coupled)
26
Taskgraph Advantages
  • Accommodates flexible integration needs
  • Accommodates a wide range of unforeseen work
    loads
  • Accommodates a mix of static and dynamic load
    balance
  • Manage complexity of mixed-mode programming
  • Avoids unnecessary transport abstraction
    overheads
  • Simulation time/space coupling
  • Allows uniform abstraction for coordinating
    coupled models time and grid scales
  • Allows application components and framework
    infrastructure (e.g., scheduler) to evolve
    independently

27
Uintah PSE
  • UCF automatically sets up
  • Domain decomposition
  • Inter-processor communication with
    aggregation/reduction
  • Parallel I/O
  • Checkpoint and restart
  • Performance measurement and analysis (stay tuned)
  • Software engineering
  • Coding standards
  • CVS (Commits Y3 - 26.6 files/day, Y4 - 29.9
    files/day)
  • Correctness regression testing with bugzilla bug
    tracking
  • Nightly build (parallel compiles)
  • 170,000 lines of code (Fortran and C tasks
    supported)

28
Performance Technology Integration
  • Uintah present challenges to performance
    integration
  • Software diversity and structure
  • UCF middleware, simulation code modules
  • component-based hierarchy
  • Portability objectives
  • cross-language and cross-platform
  • multi-parallelism thread, message passing, mixed
  • Scalability objectives
  • High-level programming and execution abstractions
  • Requires flexible and robust performance
    technology
  • Requires support for performance mapping

29
Performance Analysis Objectives for Uintah
  • Micro tuning
  • Optimization of simulation code (task) kernels
    for maximum serial performance
  • Scalability tuning
  • Identification of parallel execution bottlenecks
  • overheads scheduler, data warehouse,
    communication
  • load imbalance
  • Adjustment of task graph decomposition and
    scheduling
  • Performance tracking
  • Understand performance impacts of code
    modifications
  • Throughout course of software development
  • C-SAFE application and UCF software

30
Uintah Performance Engineering Approach
  • Contemporary performance methodology focuses on
    control flow (function) level measurement and
    analysis
  • C-SAFE application involves coupled-models with
    task-based parallelism and dataflow control
    constraints
  • Performance engineering on algorithmic (task)
    basis
  • Observe performance based on algorithm (task)
    semantics
  • Analyze task performance characteristics in
    relation to other simulation tasks and UCF
    components
  • scientific component developers can concentrate
    on performance improvement at algorithmic level
  • UCF developers can concentrate on bottlenecks not
    directly associated with simulation module code

31
Task Execution in Uintah Parallel Scheduler
  • Profile methods and functions in scheduler and in
    MPI library

Task execution time dominates (what task?)
Task execution time distribution
MPI communication overheads (where?)
  • Need to map performance data!

32
Semantics-Based Performance Mapping
  • Associate performance measurements with
    high-level semantic abstractions
  • Need mapping support in the performance
    measurement system to assign data correctly

33
Hypothetical Mapping Example
  • Particles distributed on surfaces of a cube

Particle PMAX / Array of particles / int
GenerateParticles() / distribute particles
over all faces of the cube / for (int face0,
last0 face lt 6 face) / particles on
this face / int particles_on_this_face
num(face) for (int ilast i lt
particles_on_this_face i) / particle
properties are a function of face / Pi
... f(face) ... last
particles_on_this_face
34
Hypothetical Mapping Example (continued)
int ProcessParticle(Particle p) / perform
some computation on p / int main()
GenerateParticles() / create a list of
particles / for (int i 0 i lt N i) /
iterates over the list / ProcessParticle(Pi)
  • How much time is spent processing face i
    particles?
  • What is the distribution of performance among
    faces?
  • How is this determined if execution is parallel?

35
Semantic Entities/Attributes/Associations (SEAA)
  • New dynamic mapping scheme (S. Shende, Ph.D.
    thesis)
  • Contrast with ParaMap (Miller and Irvin)
  • Entities defined at any level of abstraction
  • Attribute entity with semantic information
  • Entity-to-entity associations
  • Two association types (implemented in TAU API)
  • Embedded extends data structure of associated
    object to store performance measurement entity
  • External creates an external look-up table
    using address of object as the key to locate
    performance measurement entity

36
No Performance Mapping versus Mapping
  • Typical performance tools report performance with
    respect to routines
  • Does not provide support for mapping
  • Performance tools with SEAA mapping can observe
    performance with respect to scientists
    programming and problem abstractions

TAU (w/ mapping)
TAU (no mapping)
37
Uintah Task Performance Mapping
  • Uintah partitions individual particles across
    processing elements (processes or threads)
  • Simulation tasks in task graph work on particles
  • Tasks have domain-specific character in the
    computation
  • interpolate particles to grid in Material Point
    Method
  • Task instances generated for each partitioned
    particle set
  • Execution scheduled with respect to task
    dependencies
  • How to attributed execution time among different
    tasks
  • Assign semantic name (task type) to a task
    instance
  • SerialMPMinterpolateParticleToGrid
  • Map TAU timer object to (abstract) task (semantic
    entity)
  • Look up timer object using task type (semantic
    attribute)
  • Further partition along different domain-specific
    axes

38
Task Performance Mapping Instrumentation
  • void MPISchedulerexecute(const ProcessorGroup
    pc,
  • DataWarehouseP old_dw, DataWarehouseP
    dw )
  • ...
  • TAU_MAPPING_CREATE(
  • task-gtgetName(), "MPISchedulerexecute()",
    (TauGroup_t)(void)task-gtgetName(),
    task-gtgetName(), 0)
  • ...
  • TAU_MAPPING_OBJECT(tautimer)
  • TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void)task
    -gtgetName())
  • // EXTERNAL ASSOCIATION
  • ...
  • TAU_MAPPING_PROFILE_TIMER(doitprofiler,
    tautimer, 0)
  • TAU_MAPPING_PROFILE_START(doitprofiler,0)
  • task-gtdoit(pc)
  • TAU_MAPPING_PROFILE_STOP(0)
  • ...

39
Task Performance Mapping (Profile)
Mapped task performance across processes
Performance mapping for different tasks
40
Task Performance Mapping (Trace)
Work packet computation events colored by task
type
Distinct phases of computation can be identifed
based on task
41
Task Performance Mapping (Trace - Zoom)
Startup communication imbalance
42
Task Performance Mapping (Trace - Parallelism)
Communication / load imbalance
43
Comparing Uintah Traces for Scalability Analysis
44
Scaling Performance Optimizations
Last year initial correct scheduler
Reduce communication by 10 x
Reduce task graph overhead by 20 x
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
45
Scalability to 2000 Processors (Fall 2001)
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
46
Performance Tracking and Reporting
  • Integrated performance measurement allows
    performance analysis throughout development
    lifetime
  • Applied performance engineering in software
    design and development (software engineering)
    process
  • Create performance portfolio from regular
    performance experimentation (coupled with
    software testing)
  • Use performance knowledge in making key software
    design decision, prior to major development
    stages
  • Use performance benchmarking and regression
    testing to identify irregularities
  • Support automatic reporting of performance bugs
  • Cross-platform (cross-generation) evaluation

47
XPARE - eXPeriment Alerting and REporting
  • Experiment launcher automates measurement /
    analysis
  • Configuration and compilation of performance
    tools
  • Uintah instrumentation control for experiment
    type
  • Multiple experiment execution
  • Performance data collection, analysis, and
    storage
  • Integrated in Uintah software testing harness
  • Reporting system conducts performance regression
    tests
  • Apply performance difference thresholds (alert
    ruleset)
  • Alerts users via email if thresholds have been
    exceeded
  • Web alerting setup and full performance data
    reporting
  • Historical performance data analysis

48
XPARE System Architecture
Experiment Launch
Performance Database
Performance Reporter
Comparison Tool
Regression Analyzer
Alerting Setup
49
Alerting Setup
50
Experiment Results Viewing Selection
51
Web-Based Experiment Reporting
52
Web-Based Experiment Reporting (continued)
53
Web-Based Experiment Reporting (continued)
54
Performance Analysis Tool Integration
  • Complex systems pose challenging performance
    analysis problems that require robust
    methodologies and tools
  • New performance problems will arise
  • Instrumentation and measurement
  • Data analysis and presentation
  • Diagnosis and tuning
  • No one performance tool can address all concerns
  • Look towards an integration of performance
    technologies
  • Support to link technologies to create
    performance problem solving environments
  • Performance engineering methodology and tool
    integration with software design and development
    process

55
Integrated Performance Evaluation Environment
56
References
  • A. Malony and S. Shende, Performance Technology
    for Complex Parallel and Distributed Systems,
    Proc. 3rd Workshop on Parallel and Distributed
    Systems (DAPSYS), pp. 37-46, Aug. 2000.
  • S. Shende, A. Malony, and R. Ansell-Bell,
    Instrumentation and Measurement Strategies for
    Flexible and Portable Empirical Performance
    Evaluation, Proc. Intl. Conf. on Parallel and
    Distributed Processing Techniques and
    Applications (PDPTA), CSREA, pp. 1150-1156, July
    2001.
  • S. Shende, The Role of Instrumentation and
    Mapping in Performance Measurement, Ph.D.
    Dissertation, Univ. of Oregon, Aug. 2001.
  • J. de St. Germain, A. Morris, S. Parker, A.
    Malony, and S. Shende, Integrating Performance
    Analysis in the Uintah Software Development
    Cycle, ISHPC 2002, Nara, Japan, May, 2002.
Write a Comment
User Comments (0)
About PowerShow.com