Title: Allen D. Malony, Sameer Shende
 1Performance Technologyfor Component Software
-  Allen D. Malony, Sameer Shende 
-  malony,shende_at_cs.uoregon.edu 
- Department of Computer and Information Science 
- Computational Science Institute 
- University of Oregon
2Outline
- Complexity and performance technology 
- TAU performance system 
- Developing performance interfaces for CCA 
- Performance modeling and prediction issues 
- Applications 
- Uintah U. Utah, VTF Caltech, SAMRAI LLNL 
- Concluding remarks
3Focus on Component Technology and CCA
- Emerging component technology for HPC and Grid 
- Component software object embedding 
 functionality
- Component architecture (CA) how components 
 connect
- Component framework implements a CA 
- Common Component Architecture (CCA) 
- Standard foundation for scientific component 
 architecture
- Component descriptions 
- Scientific Interface Description Language (SIDL) 
- CCA ports for component interactions 
- CCA framework services (CCAFEINE) 
- directory, registry, connection, event
4Problem Statement
- How do we create robust and ubiquitous 
 performance technology for the analysis and
 tuning of component software in the presence of
 (evolving) complexity challenges?
-  
- How do we apply performance technology 
 effectively for the variety and diversity of
 performance problems that arise in the context of
 CCA components?
? 
 5TAU Performance System Framework
- Tuning and Analysis Utilities 
- Performance system framework for scalable 
 parallel and distributed high-performance
 computing
- Targets a general complex system computation 
 model
- nodes / contexts / threads 
- Multi-level system / software / parallelism 
- Measurement and analysis abstraction 
- Integrated toolkit for performance 
 instrumentation, measurement, analysis, and
 visualization
- Portable, configurable performance 
 profiling/tracing facility
- Open software approach 
- University of Oregon, LANL, FZJ Germany 
- http//www.cs.uoregon.edu/research/paracomp/tau
6TAU Performance System Architecture
Paraver
EPILOG 
 7Extended Component Design
genericcomponent
- PKC Performance Knowledge Component 
- POC Performance Observability Component
8Performance Observation
- Ability to observe execution performance is 
 important
- Empirically-derived performance knowledge 
- Does not require measurement integration in 
 component
- Monitor during execution to make dynamic 
 decisions
- Measurement integration is key 
- Performance observation integration 
- Component integration core and variant 
- Runtime measurement and data collection 
- On-line and off-line performance analysis
9Performance Observation Component (POC)
- Performance observation in aperformance-engineere
 dcomponent model
- Functional extension of originalcomponent design 
 ( )
- Include new componentmethods and ports ( ) for 
 othercomponents to access measured performance
 data
- Allow original component to access performance 
 data
- Encapsulate as tightly-couple and co-resident 
 performance observation object
- POC provides port allow use optmized interfaces 
 ( )to access internal'' performance
 observations
10Design of Performance Observation Component
Performance Component
Measurement Port
- One performance component per context 
- Performance component provides a Measurement Port 
- Measurement Port allows a user to create and 
 access
- Timer (start/stop, set name/type/group) 
- Event (trigger) 
- Control (enable/disable groups) 
- Query (get functions, metrics, counters, dump to 
 disk)
11Measurement Port in CCAFEINE 
- namespace performance  
-  namespace ccaports  class Measurement 
 public virtual classicgovccaPort
 public virtual  Measurement ()
 / Create a Timer /  virtual
 performanceTimer createTimer(void)  0
 virtual performanceTimer createTimer(string
 name)  0  virtual performanceTimer
 createTimer(string name, string type)  0
 virtual performanceTimer createTimer(string
 name, string type,
-  string group)  0  / Create a Query 
 interface /  virtual performanceQuery
 createQuery(void)  0  / Create a User
 Defined Event interface /  virtual
 performanceEvent createEvent(void)  0
 virtual performanceEvent createEvent(string
 name)  0  /  Create a Control
 interface for selectively enabling and disabling
 the instrumentation based on groups
 /  virtual performanceControl
 createControl(void)  0
12Timer Class Interface
- namespace performance  
-  class Timer   public  virtual 
 Timer()  / Start the Timer. Implement
 these methods in  a derived class to
 provide required functionality. /  virtual
 void start(void)  0
-  
-  / Stop the Timer./  virtual void 
 stop(void)  0 virtual void
 setName(string name)  0
-  virtual string getName(void)  0 
 virtual void setType(string name)  0
 virtual string getType(void)  0
-  /Set the group name associated with the 
 Timer  (e.g., All MPI calls can be
 grouped into an "MPI" group)/
-  virtual void setGroupName(string name)  
 0 virtual string getGroupName(void)  0
-  virtual void setGroupId(unsigned long group 
 )  0 virtual unsigned long
 getGroupId(void)  0
13Control Class Interface
- namespace performance  
-  class Control  public Control () 
 / Control instrumentation. Enable
 group Id./ virtual void enableGroupId(unsig
 ned long id)  0  / Control
 instrumentation. Disable group Id. /
 virtual void disableGroupId(unsigned long id)
 0  / Control instrumentation. Enable
 group name. / virtual void
 enableGroupName(string name)  0  /
 Control instrumentation. Disable group name./
 virtual void disableGroupName(string name)
 0  / Control instrumentation. Enable
 all groups./ virtual void
 enableAllGroups(void)  0  / Control
 instrumentation. Disable all groups./
 virtual void disableAllGroups(void)  0
14Query Class Interface
- namespace performance  
-  class Query  public virtual 
 Query()  / Get the list of Timer names
 /  virtual void getTimerNames(const char
 functionList, int numFuncs)
-   0 / Get the list of Counter names 
 /  virtual void getCounterNames(const char
 counterList,  int numCounters)
 0 / getTimerData. Returns lists of
 metrics./ virtual void getTimerData(const
 char  inTimerList,  int numTimers,
 double  counterExclusive,  double
 counterInclusive, int numCalls,  int
 numChildCalls, const char  counterNames,
 int numCounters)  0 virtual void
 dumpProfileData(void)  0 virtual void
 dumpProfileDataIncremental(void)  0 //
 timestamped dump virtual void
 dumpTimerNames(void)  0 virtual void
 dumpTimerData(const char  inTimerList, int
 numTimers)
-   0 virtual void dumpTimerDataIncrementa
 l(const char  inTimerList,  int
 numTimers)  0
15Event Class Interface
- namespace performance  
-  class Event  public /  
 Destructor /  virtual Event()
 /  Register the name
 of the event /  virtual void
 trigger(double data)  0
-  / e.g., size of a message, error in an 
 iteration, memory allocated /
-   
16Measurement Port Implementation
- TAU component implements the MeasurementPort 
- Implements Timer, Control, Query and Control 
 classes
- Registers the port with the CCAFEINE framework 
- Components target the generic MeasurementPort 
 interface
- Runtime selection of TAU component during 
 execution
- Instrumentation code independent of underlying 
 tool
- Instrumentation code independent of measurement 
 choice
- TauMeasurement_CCA port implementation uses a 
 specific TAU measurement library
17Using MeasurementPort
- include "ports/Measurement_CCA.h" 
- double MonteCarloIntegratorintegrate (double 
 lowBound, double upBound,
 int count)
 classicgovccaPort  port double sum
 0.0 // Get Measurement port port
 frameworkServices-gtgetPort ("MeasurementPort")
 if (port) measurement_m
 dynamic_cast lt performanceccaportsMeasurement
 gt(port) if (measurement_m
 0) cerr ltlt "Connected to something
 other than a Measurement port" return
 -1   static performanceTimer t
 measurement_m-gtcreateTimer(
 string("IntegrateTimer")) t-gtstart()
 for (int i  0 i lt count i)
 double x  random_m-gtgetRandomNumber ()
 sum  sum  function_m-gtevaluate (x)
 t-gtstop()
18Using TAU Component in CCAFEINE
- repository get TauMeasurement 
- repository get Driver 
- repository get MidpointIntegrator 
- repository get MonteCarloIntegrator 
- repository get RandomGenerator 
- repository get LinearFunction 
- repository get NonlinearFunction 
- repository get PiFunction 
- create LinearFunction lin_func 
- create NonlinearFunction nonlin_func 
- create PiFunction pi_func 
- create MonteCarloIntegrator mc_integrator 
- create RandomGenerator rand 
- create TauMeasurement tau 
- connect mc_integrator RandomGeneratorPort rand 
 RandomGeneratorPort
- connect mc_integrator FunctionPort nonlin_func 
 FunctionPort
- connect mc_integrator MeasurementPort tau 
 MeasurementPort
19Using SIDL for Language Interoperability
- // 
- // File performance.sidl 
- // 
- version performance 1.0 
- package performance  
-  class Timer  
-  void start() 
-  void stop() 
-  void setName(in string name) 
-  string getName() 
-  void setType(in string name) 
-  string getType() 
-  void setGroupName(in string name) 
-  string getGroupName() 
-  void setGroupId(in long group) 
-  long getGroupId() 
-   
20Using SIDL Interface for Timers
- // SIDL 
- include "performance_Timer.hh" 
- int main(int argc, char argv) 
-  
-  performanceTimer t  performanceTimer_crea
 te()
-  ... 
-  t.setName("Integrate timer") 
-  t.start() 
-  // Computation 
-  for (int i  0 i lt count i)  
-  double x  random_m-gtgetRandomNumber () 
-  sum  sum  function_m-gtevaluate (x) 
-   
-  ... 
-  t.stop() 
-  return 0 
21Performance Knowledge Component
- Describe and store known components 
 performance
- Benchmark characterizations in performance 
 database
- Empirical or analytical performance models 
- Saved information about component performance 
- Use for performance-guided selection and 
 deployment
- Use for runtime adaptation 
- Representation must be in common forms with 
 standard means for accessing the performance
 information
22Performance Knowledge Repository  Component
- Component performance repository 
- Implement in componentarchitecture framework 
- Similar to CCA componentrepository Alexandria 
- Access by componentinfrastructure 
- View performance knowledge as component (PKC) 
- PKC ports give access to performance knowledge 
-  to other components back to original 
 component
- Store performance model for performance 
 prediction
- Component composition performance knowledge
23Component Performance Model
- User specified 
- Inferred automatically by performance tool 
- Prior performance data 
- Expression 
- Parametric model 
- Estimate performance of a single component by 
- Querying runtime performance data 
- Passing this to performance model for evaluation 
- Integration of performance observation and 
 knowledge components key to runtime selection of
 components
24Composition of Components
- Understanding scalability of performance models 
 (Research problem)
- Linear superposition principle does not apply! 
- Composition of scalable components may not 
 produce a scalable execution (mismatch of data
 structures)
Scalable Component A
Scalable Component B
data
Unscalable union 
 25Performance Technology for Components TAU
Paraver
EPILOG 
 26TAU Instrumentation
- Flexible instrumentation mechanisms at multiple 
 levels
- Source code 
- Manual (TAU API, CCA Measurement Port API) 
- automatic using Program Database Toolkit (PDT), 
 OPARI (for OpenMP programs), Babel SIDL compiler
 (proposed)
- Object code 
- pre-instrumented libraries (e.g., MPI using PMPI) 
- statically linked 
- dynamically linked (e.g., Virtual machine 
 instrumentation)
- fast breakpoints (compiler generated) 
- Executable code 
- dynamic instrumentation (pre-execution) using 
 DynInstAPI
27Program Database Toolkit 
 28Program Database Toolkit (PDT)
- Program code analysis framework for developing 
 source-based tools for C99, C and F90
 U.Oregon, LANL, FZJ Germany
- High-level interface to source code information 
- Widely portable 
- IBM, SGI, Compaq, HP, Sun, Linux 
 clusters,Windows, Apple, Hitachi, Cray T3E...
- Integrated toolkit for source code parsing, 
 database creation, and database query
- commercial grade front end parsers (EDG for 
 C99/C, Mutek for F90)
- Intel/KAI C headers for std. C library 
 distributed with PDT
- portable IL analyzer, database format, and access 
 API
- open software approach for tool development 
- Target and integrate multiple source languages 
- Used in CCA for automated generation of SIDL 
 CHASM
- Use in TAU to build automated performance 
 instrumentation tools (tau_instrumentor)
- Can be used to generate code for performance 
 ports in CCA
29New Features in TAU
- Instrumentation 
- OPARI  OpenMP directive rewriting approach 
 POMP, FZJ
- Selective instrumentation grouping, 
 include/exclude lists
- tau_reduce  rule based detection of high 
 overhead lightweight routines
- Measurement 
- PAPI UTK  Support for multiple hardware 
 counters/time
- Callpath profiling (1-level) 
- Native generation of EPILOG traces EXPERT, FZJ 
- Analysis 
- Support for Paraver CEPBA trace visualizer 
- jracy  New Java based profile browser in TAU 
- Availability 
- New platforms and compilers supported (NEC, 
 Hitachi, Intel)
30Applications Uintah (U. Utah)
Scalability analysis 
 31Applications VTF (ASCI ASAP Caltech)
- C, C, F90, Python 
- PDT, MPI
32(No Transcript) 
 33Applications SAMRAI (LLNL)
- C 
- PDT, MPI 
- SAMRAI timers (groups)
34TAU Status
- Instrumentation supported 
- Source, preprocessor, compiler, MPI, runtime, 
 virtual machine
- Languages supported 
- C, C, F90, Java, Python 
- HPF, ZPL, HPC, pC... 
- Packages supported 
- PAPI UTK, PCL FZJ (hardware performance 
 counter access),
- Opari, PDT UO,LANL,FZJ, DyninstAPI U.Maryland 
 (instrumentation),
- EXPERT, EPILOGFZJ,VampirPallas, Paraver 
 CEPBA (visualization)
- Platforms supported 
- IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq 
 Tru64 ES,
- Linux clusters (IA-32, IA-64, PowerPC, Alpha), 
 Apple, Windows,
- Hitachi SR8000, NEC SX, Cray T3E ... 
- Compilers suites supported 
- GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, 
 Compaq,HP, Fujitsu, Hitachi, Sun, Apple,
 Microsoft, NEC, Cray, PGI, Absoft,
- Thread libraries supported 
- Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS
35Concluding Remarks
- Complex component systems pose challenging 
 performance analysis problems that require robust
 methodologies and tools
- New performance problems will arise 
- Instrumentation and measurement 
- Data analysis and presentation 
- Diagnosis and tuning 
- Performance engineered components 
- Performance knowledge, observation, query and 
 control
- Integration of performance technology
36Support Acknowledgement
- TAU and PDT support 
- Department of Energy (DOE) 
- DOE 2000 ACTS contract 
- DOE MICS contract 
- DOE ASCI Level 3 (LANL, LLNL) 
- U. of Utah DOE ASCI Level 1 subcontract 
- DARPA 
- NSF National Young Investigator (NYI) award