Title: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende
1A Framework for Online PerformanceAnalysis and
Visualization of Large-Scale Parallel Applications
- Kai Li, Allen D. Malony, Robert Bell, Sameer
Shende - likai,malony,bertie,sameer_at_cs.uoregon.edu
- Department of Computer and Information Science
- Computational Science Institute, NeuroInformatics
Center - University of Oregon
2Outline
- Problem description
- Scaling and performance observation
- Interest in online performance analysis
- General online performance system architecture
- Access models
- Profiling issues and control issues
- Framework for online performance analysis
- TAU performance system
- SCIRun computational and visualization
environment - Experiments
- Conclusions and future work
3Problem Description
- Need for parallel performance observation
- Instrumentation, measurement, analysis,
visualization - In general, there is the concern for intrusion
- Seen as a tradeoff with accuracy of performance
diagnosis - Scaling complicates observation and analysis
- Issues of data size, processing time, and
presentation - Online approaches add capabilities as well as
problems - Performance interaction, but at what cost?
- Tools for large-scale performance observation
online - Supporting performance system architecture
- Tool integration, effective usage, and portability
4Scaling and Performance Observation
- Consider traditional measurement methods
- Profiling summary statistics calculated during
execution - Tracing time-stamped sequence of execution
events - More parallelism ? more performance data overall
- Performance specific to each thread of execution
- Possible increase in number interactions between
threads - Harder to manage the data (memory, transfer,
storage, ) - More parallelism / performance data ? harder
analysis - More time consuming to analyze
- More difficult to visualize (meaningful displays)
- Need techniques to address scaling at all levels
5Why Complicate Matters with Online Methods?
- Adds interactivity to performance analysis
process - Opportunity for dynamic performance observation
- Instrumentation change
- Measurement change
- Allows for control of performance data volume
- Post-mortem analysis may be too late
- View on status of long running jobs
- Allow for early termination
- Computation steering to achieve better results
- Performance steering to achieve better
performance - Online performance observation may be intrusive
6Related Ideas
- Computational steering
- Falcon (Schwan, Vetter) computational steering
- Dynamic instrumentation and performance search
- Paradyn (Miller) online performance bottleneck
analysis - Adaptive control and performance steering
- Active Harmony (Hollingsworth) auto decision
control - Autopilot (Reed) actuator/sensor performance
steering - Scalable monitoring
- Peridot (Gerndt) automatic online performance
analysis - MRNet (Miller) multi-case reduction for access /
control - Scalable analysis and visualization
- VNG (Brunst) parallel trace analyis
7General Online Performance Observation System
8Models of Performance Data Access (Monitoring)
- Push Model
- Producer/consumer style of access and transfer
- Application decides when/what/how much data to
send - External analysis tools only consume performance
data - Availability of new data is signaled passively or
actively - Pull Model
- Client/server style of performance data access
and transfer - Application is a performance data server
- Access decisions are made externally by analysis
tools - Two-way communication is required
- Push/Pull Models
9Online Profiling Issues
- Profiles are summary statistics of performance
- Kept with respect to some unit of parallel
execution - Profiles are distributed across the machine (in
memory) - Must be gathered and delivered to profile
analysis tool - Profile merging must take place (possibly in
parallel) - Consistency checking of profile data
- Callstack must be updated to generate correct
profile data - Correct communication statistics may require
completion - Event identification (not necessary is save event
names) - Sequence of profile samples allow interval
analysis - Interval frequency depends on profile collection
delay
10Performance Control
- Instrumentation control
- Dynamic instrumentation
- Inserts / removes instrumentation at runtime
- Measurement control
- Dynamic measurement
- Enabling / disabling / changing of measurement
code - Dynamic instrumentation or measurement variables
- Data access control
- Selection of what performance data to access
- Control of frequency of access
11TAU Performance System Framework
- Tuning and Analysis Utilities (aka Tools Are Us)
- Performance system framework for scalable
parallel and distributed high-performance
computing - Targets a general complex system computation
model - nodes / contexts / threads
- Multi-level system / software / parallelism
- Measurement and analysis abstraction
- Integrated toolkit for performance
instrumentation, measurement, analysis, and
visualization - Portable performance profiling/tracing facility
- Open software approach
12TAU Performance System Architecture
Paraver
EPILOG
ParaProf
13Online Profile Measurement and Analysis in TAU
- Standard TAU profiling
- Per node/context/thread
- Profile dump routine
- Context-level
- Profile file per eachthread in context
- Appends to profile file
- Selective event dumping
- Analysis tools access filesthrough shared file
system - Application-level profileaccess routine
14Online Performance Analysis and Visualization
SCIRun (Univ. of Utah)
Performance Visualizer
Application
// performance data streams
TAU Performance System
Performance Analyzer
// performance data output
accumulated samples
Performance Data Reader
Performance Data Integrator
file system
sample sequencing reader synchronization
15Profile Sample Data Structure in SCIRun
node
context
thread
16Performance Analysis/Visualization in SCIRun
SCIRun program
17Uintah Computational Framework (UCF)
- Universityof Utah
- UCF analysis
- Scheduling
- MPI library
- Components
- 500 processes
- Use for onlineand offlinevisualization
- Apply SCIRunsteering
18Terrain Performance Visualization
F
19Scatterplot Displays
- Each pointcoordinatedeterminedby threevalues
- MPI_Reduce
- MPI_Recv
- MPI_Waitsome
- Min/Maxvalue range
- Effective forclusteranalysis
- Relation between MPI_Recv and MPI_Waitsome
20Online Unitah Performance Profiling
- Demonstration of online profiling capability
- Colliding elastic disks
- Test material point method (MPM) code
- Executed on 512 processors ASCI Blue Pacific at
LLNL - Example 1 (Terrain visualization)
- Exclusive execution time across event groups
- Multiple time steps
- Example 2 (Bargraph visualization)
- MPI execution time and performance mapping
- Example 3 (Domain visualization)
- Task time allocation to patches
21Example 1 (Event Groups)
22Example 2 (MPI Performance)
23Example 3 (Domain-Specific Visualization)
24ParaProf Framework Architecture
- Portable, extensible, and scalable tool for
profile analysis - Offer best of breed capabilities to performance
analysts - Build as profile analysis framework for
extensibility
25ParaProf Profile Display (VTF)
- Virtual Testshock Facility (VTF), Caltech, ASCI
Center - Dynamic measurement, online analysis,
visualization
26Full Profile Display (SAMRAI)
- Structured AMR toolkit (SAMRAI), LLNL
512 processes
27Evaluation of Experimental Approaches
- Currently only supporting push model
- File system solution for moving performance data
- Is this a scalable solution?
- Robust solution that can leverage
high-performance I/O - May result in high intrusion
- However, does not require IPC
- Should be relatively portable
- Analysis and visualization only runs sequentially
28Possible Improvements
- Profile merging at context level to reduce number
of files - Merging at node level may require explicit
processing - Concurrent trace merging could also reduce files
- Hierarchical merge tree
- Will require explicit processing
- Could consider IPC transfer
- MPI (e.g., used in mpiP for profile merging)
- Create own communicators
- Sockets or PACX between computer server and
analyzer - Leverage large-scale systems infrastructure
- Parallel profile analysis
29Concluding Remarks
- Interest in online performance monitoring,
analysis, and visualization for large-scale
parallel systems - Need to intelligently use
- Benefit from other scalability considerations of
the system software and system architecture - See as an extension to the parallel system
architecture - Avoid solutions that have portability
difficulties - In part, this is an engineering problem
- Need to work with the system configuration you
have - Need to understand if approach is applicable to
problem - Not clear if there is a single solution
30Future Work
- Build online support in TAU performance system
- Extend to support PULL model capabilities
- Develop hierarchical data access solutions
- Performance studies of full system
- Latency analysis
- Bandwidth analysis
- Integration with other performance tools
- System performance monitors
- ParaProf parallel profile analyzer
- Development of 3D visualization library
- Portability focus