Title: Performance Technology for Component Software - TAU
1Performance Technology for Component Software -
TAU
- Allen D. Malony (U. Oregon)
- Sameer Shende (U. Oregon)
- Craig Rasmussen (LANL)
- Jaideep Ray (SNL, CA)
- Matt Sottile (LANL)
2Overview
- Complexity and performance technology
- TAU performance system
- Developing performance interfaces for CCA
- Performance modeling and prediction issues
- Conclusions
3Focus on Component Technology
- Emerging component technology for HPC and Grid
- Component software object embedding
functionality - Component architecture (CA) how components
connect - Component framework implements a CA
- Common Component Architecture (CCA)
- Standard foundation for scientific component
architecture - Component descriptions
- Scientific Interface Description Language (SIDL)
- CCA ports for component interactions
- CCA framework services (CCAFEINE)
4Problem Statement
- How do we create robust and ubiquitous
performance technology for the analysis and
tuning of component software in the presence of
(evolving) complexity challenges? -
- How do we apply performance technology
effectively for the variety and diversity of
performance problems that arise in the context of
CCA components?
?
5- Tuning and Analysis Utilities
- Performance system framework for scalable
parallel and distributed high-performance
computing - Targets a general complex system computation
model - nodes / contexts / threads
- Multi-level system / software / parallelism
- Measurement and analysis abstraction
- Integrated toolkit for performance
instrumentation, measurement, analysis, and
visualization - Portable, configurable performance
profiling/tracing facility - Open software approach
- University of Oregon, LANL, FZJ Germany
- http//www.cs.uoregon.edu/research/paracomp/tau
6TAU Performance System Architecture
Paraver
EPILOG
7TAU Instrumentation
- Flexible instrumentation mechanisms at multiple
levels - Source code
- Manual (TAU API, CCA Measurement Port API)
- automatic using Program Database Toolkit (PDT),
OPARI (for OpenMP programs), Babel SIDL compiler
(proposed) - Object code
- pre-instrumented libraries (e.g., MPI using PMPI)
- statically linked
- dynamically linked (e.g., Virtual machine
instrumentation) - fast breakpoints (compiler generated)
- Executable code
- dynamic instrumentation (pre-execution) using
DynInstAPI
8Program Database Toolkit
9Program Database Toolkit (PDT)
- Program code analysis framework for developing
source-based tools for C99, C and F90
U.Oregon, LANL, FZJ Germany - High-level interface to source code information
- Widely portable
- IBM, SGI, Compaq, HP, Sun, Linux
clusters,Windows, Apple, Hitachi, Cray T3E... - Integrated toolkit for source code parsing,
database creation, and database query - commercial grade front end parsers (EDG for
C99/C, Mutek for F90) - Intel/KAI C headers for std. C library
distributed with PDT - portable IL analyzer, database format, and access
API - open software approach for tool development
- Target and integrate multiple source languages
- Used in CCA for automated generation of SIDL
CHASM - Use in TAU to build automated performance
instrumentation tools (tau_instrumentor) - Can be used to generate code for performance
ports in CCA
10Extended Component Design
Extended Component Design
genericcomponent
- PKC Performance Knowledge Component
- POC Performance Observability Component
11Performance Observation
- Ability to observe execution performance is
important - Empirically-derived performance knowledge
- Does not require measurement integration in
component - Monitor during execution to make dynamic
decisions - Measurement integration is key
- Performance observation integration
- Component integration core and variant
- Runtime measurement and data collection
- On-line and off-line performance analysis
12Performance Observation Component (POC)
- Performance observation in aperformance-engineere
dcomponent model - Functional extension of originalcomponent design
( ) - Include new componentmethods and ports ( ) for
othercomponents to access measured performance
data - Allow original component to access performance
data - Encapsulate as tightly-couple and co-resident
performance observation object - POC provides port allow use optmized interfaces
( )to access internal'' performance
observations
13Performance Observation Component
Performance Component
Measurement Port
- One performance component per context
- Performance component provides a Measurement Port
- Measurement Port allows a user to create and
access - Timer (start/stop, set name/type/group)
- Event (trigger)
- Control (enable/disable groups)
- Query (get functions, metrics, counters, dump to
disk)
14Measurement Port in CCAFEINE
Performance Component API
- namespace performance
- namespace ccaports class Measurement
public virtual classicgovccaPort
public virtual Measurement ()
/ Create a Timer / virtual
performanceTimer createTimer(void) 0
virtual performanceTimer createTimer(string
name) 0 virtual performanceTimer
createTimer(string name, string type) 0
virtual performanceTimer createTimer(string
name, string type, - string group) 0 / Create a Query
interface / virtual performanceQuery
createQuery(void) 0 / Create a User
Defined Event interface / virtual
performanceEvent createEvent(void) 0
virtual performanceEvent createEvent(string
name) 0 / Create a Control
interface for selectively enabling and disabling
the instrumentation based on groups
/ virtual performanceControl
createControl(void) 0
15CCA Timer Interface
- namespace performance
- class Timer public virtual
Timer() / Start the Timer. Implement
these methods in a derived class to
provide required functionality. / virtual
void start(void) 0 -
- / Stop the Timer./ virtual void
stop(void) 0 virtual void
setName(string name) 0 - virtual string getName(void) 0
virtual void setType(string name) 0
virtual string getType(void) 0 - /Set the group name associated with the
Timer (e.g., All MPI calls can be
grouped into an "MPI" group)/ - virtual void setGroupName(string name)
0 virtual string getGroupName(void) 0 - virtual void setGroupId(unsigned long group
) 0 virtual unsigned long
getGroupId(void) 0
16Control Class Interface
CCA Instrumentation Control Interface
- namespace performance
- class Control public Control ()
/ Control instrumentation. Enable
group Id./ virtual void enableGroupId(unsig
ned long id) 0 / Control
instrumentation. Disable group Id. /
virtual void disableGroupId(unsigned long id)
0 / Control instrumentation. Enable
group name. / virtual void
enableGroupName(string name) 0 /
Control instrumentation. Disable group name./
virtual void disableGroupName(string name)
0 / Control instrumentation. Enable
all groups./ virtual void
enableAllGroups(void) 0 / Control
instrumentation. Disable all groups./
virtual void disableAllGroups(void) 0
17Query Class Interface
CCA Performance Query Interface
- namespace performance
- class Query public virtual
Query() / Get the list of Timer names
/ virtual void getTimerNames(const char
functionList, int numFuncs) - 0 / Get the list of Counter names
/ virtual void getCounterNames(const char
counterList, int numCounters)
0 / getTimerData. Returns lists of
metrics./ virtual void getTimerData(const
char inTimerList, int numTimers,
double counterExclusive, double
counterInclusive, int numCalls, int
numChildCalls, const char counterNames,
int numCounters) 0 virtual void
dumpProfileData(void) 0 virtual void
dumpProfileDataIncremental(void) 0 //
timestamped dump virtual void
dumpTimerNames(void) 0 virtual void
dumpTimerData(const char inTimerList, int
numTimers) - 0 virtual void dumpTimerDataIncrementa
l(const char inTimerList, int
numTimers) 0
18Event Class Interface
CCA User Defined Event Interface
- namespace performance
- class Event public /
Destructor / virtual Event()
/ Register the name
of the event / virtual void
trigger(double data) 0 - / e.g., size of a message, error in an
iteration, memory allocated / -
19Measurement Port Implementation
- TAU component implements the MeasurementPort
- Implements Timer, Control, Query and Control
classes - Registers the port with the CCAFEINE framework
- Components target the generic MeasurementPort
interface - Runtime selection of TAU component during
execution - Instrumentation code independent of underlying
tool - Instrumentation code independent of measurement
choice - TauMeasurement_CCA port implementation uses a
specific TAU measurement library
20Using MeasurementPort
Using the Timer Interface An Example
- include "ports/Measurement_CCA.h"
- double MonteCarloIntegratorintegrate (double
lowBound, double upBound,
int count)
classicgovccaPort port double sum
0.0 // Get Measurement port port
frameworkServices-gtgetPort ("MeasurementPort")
if (port) measurement_m
dynamic_cast lt performanceccaportsMeasurement
gt(port) if (measurement_m
0) cerr ltlt "Connected to something
other than a Measurement port" return
-1 static performanceTimer t
measurement_m-gtcreateTimer(
string("IntegrateTimer")) t-gtstart()
for (int i 0 i lt count i)
double x random_m-gtgetRandomNumber ()
sum sum function_m-gtevaluate (x)
t-gtstop()
21TAU Component in CCAFEINE
- repository get TauMeasurement
- repository get Driver
- repository get MidpointIntegrator
- repository get MonteCarloIntegrator
- repository get RandomGenerator
- repository get LinearFunction
- repository get NonlinearFunction
- repository get PiFunction
- create LinearFunction lin_func
- create NonlinearFunction nonlin_func
- create PiFunction pi_func
- create MonteCarloIntegrator mc_integrator
- create RandomGenerator rand
- create TauMeasurement tau
- connect mc_integrator RandomGeneratorPort rand
RandomGeneratorPort - connect mc_integrator FunctionPort nonlin_func
FunctionPort - connect mc_integrator MeasurementPort tau
MeasurementPort
22SIDL interface for Timers
- //
- // File performance.sidl
- //
- version performance 1.0
- package performance
- class Timer
- void start()
- void stop()
- void setName(in string name)
- string getName()
- void setType(in string name)
- string getType()
- void setGroupName(in string name)
- string getGroupName()
- void setGroupId(in long group)
- long getGroupId()
-
23Using SIDL Interface for Timers
- // SIDL
- include "performance_Timer.hh"
- int main(int argc, char argv)
-
- performanceTimer t performanceTimer_crea
te() - ...
- t.setName("Integrate timer")
- t.start()
- // Computation
- for (int i 0 i lt count i)
- double x random_m-gtgetRandomNumber ()
- sum sum function_m-gtevaluate (x)
-
- ...
- t.stop()
- return 0
24Performance Knowledge Component
- Describe and store known components
performance - Benchmark characterizations in performance
database - Empirical or analytical performance models
- Saved information about component performance
- Use for performance-guided selection and
deployment - Use for runtime adaptation
- Representation must be in common forms with
standard means for accessing the performance
information
25Performance Knowledge Repository
- Component performance repository
- Implement in componentarchitecture framework
- Similar to CCA componentrepository Alexandria
- Access by componentinfrastructure
- View performance knowledge as component (PKC)
- PKC ports give access to performance knowledge
- to other components back to original
component - Store performance model for performance
prediction - Component composition performance knowledge
26Component Performance Model
- User specified
- Inferred automatically by performance tool
- Prior performance data
- Expression
- Parametric model
- Estimate performance of a single component by
- Querying runtime performance data
- Passing this to performance model for evaluation
- Integration of performance observation and
knowledge components key to runtime selection of
components
27Applications Uintah (U. Utah)
Scalability analysis
28Applications VTF (ASCI ASAP Caltech)
- C, C, F90, Python
- PDT, MPI
29Applications SAMRAI (LLNL)
- C
- PDT, MPI
- SAMRAI timers (groups)
30TAU Status
- Instrumentation supported
- Source, preprocessor, compiler, MPI, runtime,
virtual machine - Languages supported
- C, C, F90, Java, Python
- HPF, ZPL, HPC, pC...
- Packages supported
- PAPI UTK, PCL FZJ (hardware performance
counter access), - Opari, PDT UO,LANL,FZJ, DyninstAPI U.Maryland
(instrumentation), - EXPERT, EPILOGFZJ,VampirPallas, Paraver
CEPBA (visualization) - Platforms supported
- IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq
Tru64 ES, - Linux clusters (IA-32, IA-64, PowerPC, Alpha),
Apple, Windows, - Hitachi SR8000, NEC SX, Cray T3E ...
- Compilers suites supported
- GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM,
Compaq,HP, Fujitsu, Hitachi, Sun, Apple,
Microsoft, NEC, Cray, PGI, Absoft, - Thread libraries supported
- Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS
31Concluding Remarks
- Complex component systems pose challenging
performance analysis problems that require robust
methodologies and tools - New performance problems will arise
- Instrumentation and measurement
- Data analysis and presentation
- Diagnosis and tuning
- Performance engineered components
- Performance knowledge, observation, query and
control - Integration of performance technology
32Support Acknowledgement
- TAU and PDT support
- Department of Energy (DOE)
- DOE 2000 ACTS contract
- DOE MICS contract
- DOE ASCI Level 3 (LANL, LLNL)
- U. of Utah DOE ASCI Level 1 subcontract
- DARPA
- NSF National Young Investigator (NYI) award