Sameer Shende, Allen D. Malony - PowerPoint PPT Presentation

About This Presentation
Title:

Sameer Shende, Allen D. Malony

Description:

Generation of JVM events and hooks into JVM. Profiler agent (TAU) loaded ... Node / Thread Event Timeline. Temporal event behavior. Event relationships. 11/9/09 ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 33
Provided by: allend7
Category:

less

Transcript and Presenter's Notes

Title: Sameer Shende, Allen D. Malony


1
Integration and Application of theTAU
Performance System inParallel Java Environments
  • Sameer Shende, Allen D. Malony
  • sameer,malony_at_cs.uoregon.edu
  • Computer Information Science Department
  • Computational Science Institute
  • University of Oregon

2
Java HPC and Performance Technology
  • Interest in performance tools for Java HPC
  • Shared- and distributed-memory parallelism
  • Multi-level (semantic) performance views
  • Java environment challenges performance
    technology
  • Language and packages
  • object-oriented, interfaces, RMI, reflection,
  • Java Virtual Machine (JVM) execution model
  • thread mapping, scheduling, SMP execution, event
    access
  • Just-In-Time (JIT) compilation and dynamic
    loading
  • Java Native Interface (JNI)
  • inter-language execution, non-Java events /
    execution
  • Portability of performance tools and methods

3
Research Problems
  • GeneralHow to create robust and ubiquitous
    performance technology for the analysis and
    tuning of parallel high-performance software and
    systems in the presence of (evolving) complexity
    challenges?
  • SpecificCan performance technology developed for
    use in HPC environments be successfully applied
    to parallel Java environments, and how are the
    new performance instrumentation, measurement, and
    analysis problems addressed?

4
Talk Outline
  • Java HPC and Performance Technology
  • TAU Performance System
  • Computation model for performance technology
  • TAU performance system toolkit
  • Target HPC Java Environment
  • SMP clusters and distributed computing
  • Multi-threading MPI message passing
  • Integration (Adaption) of TAU Performance System
  • User-level, JVM-level, JNI-level, inter-language
  • Example Mixed-Mode Application
  • Conclusions

5
TAU Performance System
  • Tuning and Analysis Utilities
  • Performance system framework
  • scalable parallel and distributed HPC
  • Targets a general complex system computation
    model
  • nodes / contexts / threads
  • Multi-level system / software / parallelism
  • Measurement and analysis abstraction
  • Integrated performance toolkit
  • instrumentation, measurement, analysis,
    visualization
  • Portable facility based on open software approach
  • Robust and widely applied

6
General Complex System Computation Model
  • Node physically distinct shared memory machine
  • Message passing node interconnection network
  • Context distinct virtual memory space within
    node
  • Thread execution threads (user/system) in context

Interconnection Network
Inter-node messagecommunication


Node
Node
Node
node memory
memory
memory
SMP
physicalview
VM space

?
?
?
modelview

Context
Threads
7
TAU Performance System Framework
8
Target HPC Java Environment
  • Hybrid, multi-language scientific applications
  • Java C, C, Fortran libraries
  • Numerical, system, communications support
  • Performance optimization
  • Mixed-mode parallelism
  • Multi-threaded shared memory parallelism
  • Distributed memory parallelism using
    communications
  • Cluster of SMP nodes
  • Scalable parallelism
  • Distributed

9
Performance Technology Issues
  • Object-oriented programming
  • Object-based performance analysis
  • High-level classes and performance mapping
  • Multi-level performance events
  • User / source / byte code / VM / OS / libraries /
    external
  • Multiple performance instrumentation strategies
  • Integration of performance measurements
  • Mixed-mode parallel computation
  • Multi-threading performance measurement
  • Cross-mode performance correspondence
  • Hybrid, multi-language performance measurement

10
Java Source-Level Instrumentation
  • TAU Java package
  • User-defined events
  • TAU.Profile class for new timers
  • Start/Stop
  • Performance data output at end

11
TAU Java Source Instrumentation Architecture
  • Any code section can be measured
  • Portability
  • Measurement options
  • Profiling, tracing
  • Limitations
  • Source access only
  • Lack of thread information
  • Lack of node information

Java program
TAU.Profile class (init, data, output)
TAU package
JNI C bindings
JNI
TAU as dynamic shared object
TAU
Profile database stored in JVM heap
Profile DB
12
Multi-Threading Performance Measurement
  • General issues
  • Thread identity and per-thread data storage
  • Performance measurement support and
    synchronization
  • Fine-grained parallelism
  • different forms and levels of threading
  • greater need for efficient instrumentation
  • TAU general threading and measurement model
  • Common thread layer and measurement support
  • Interface to system specific libraries (reg, id,
    sync)
  • Target different thread systems with core
    functionality
  • Pthreads, Windows, Java, OpenMP

13
Virtual Machine Performance Instrumentation
  • Integrate performance system with VM
  • Captures robust performance data (e.g., thread
    events)
  • Maintain features of environment
  • portability, concurrency, extensibility,
    interoperation
  • Allow use in optimization methods
  • JVM Profiling Interface (JVMPI)
  • Generation of JVM events and hooks into JVM
  • Profiler agent (TAU) loaded as shared object
  • registers events of interest and address of
    callback routine
  • Access to information on dynamically loaded
    classes
  • No need to modify Java source, bytecode, or JVM

14
JVMPI Events
  • Method transition events
  • Memory events
  • Heap arena events
  • Garbage collection events
  • Class events
  • Global reference events
  • Monitor events
  • Monitor wait events
  • Thread events
  • Dump events
  • Virtual machine events

15
TAU Java JVM Instrumentation Architecture
  • Robust set of events
  • Portability
  • Access to thread info
  • Measurement options
  • Limitations
  • Overhead
  • Many events
  • Event control
  • No user-defined events

16
Java Multi-Threading Performance (Test Case)
  • Profile and trace Java (JDK 1.2) applications
  • Observe user-level and system-level threads
  • Observe events for different Java packages
  • /lang, /io, /awt,
  • Test application
  • SciVis, NPAC, Syracuse University
  • ./configure -jdkltdir_where_jdk_is_installedgt
  • setenv LD_LIBRARY_PATH LD_LIBRARY_PATH\ltt
    audirgt/ltarchgt/lib
  • java -XrunTAU svserver

17
TAU Profiling of Java Application (SciVis)
24 threads of execution!
Profile for eachJava thread
Captures eventsfor different Javapackages
18
TAU Tracing of Java Application (SciVis)
Performance groups
Timeline display
Parallelism view
19
Vampir Dynamic Call Tree View (SciVis)
Per thread call tree
Expandedcall tree
Annotated performance
20
Message Communications Performance
  • Explicit message communications libraries for
    Java
  • MPI performance measurement
  • MPI profiling interface - link-time interposition
    library
  • TAU wrappers in native profiling interface
    library
  • Send/Receive events and communication statistics
  • mpiJava (Syracuse, JavaGrande, 1999)
  • Java wrapper package
  • JNI C bindings to MPI communication library
  • Dynamic shared object (libmpijava.so) loaded in
    JVM
  • prunjava calls mpirun to distribute program to
    nodes
  • Contrast to Java RMI-based schemes (MPJ, CCJ)

21
TAU Java Instrumentation Architecture
  • No source instrumentation
  • Portability
  • Measurement options
  • Limitations
  • MPI events only
  • No mpiJava events
  • Node info only
  • No thread info

Java program
mpiJava package
JNI
22
Mixed-mode Parallel Programs (Java MPI)
  • Java threads and MPI communications
  • Shared-memory multi-threading events
  • Message communications events
  • Unified performance measurement and views
  • Integration of performance mechanisms
  • Integrated association of performance events
  • thread event and communication events
  • user-defined (source-level) performance events
  • JVM events
  • Support for performance measurement scaling
  • Support for performance data access

23
Instrumentation and Measurement Cooperation
  • Problem
  • JVMPI doesnt see MPI events (e.g., rank (node))
  • MPI profiling interfaces doesnt see threads
  • Source instrumentation doesnt see either!
  • Need cooperation between interfaces
  • MPI exposes rank, gets thread information
  • JVMPI exposes thread information, get rank
  • Source instrumentation gets both
  • Post-mortem matching of sends and receives
  • Selective instrumentation
  • java -XrunTAUexcludejava/io,sun

24
TAU Java Instrumentation Architecture
Java program
mpiJava package
TAU package
JNI
MPI profiling interface
Event notification
TAU wrapper
TAU
Native MPI library
JVMPI
Profile DB
25
Parallel Java Game of Life (Profile)
Merged Java and MPI event profiles
  • mpiJavatestcase
  • 4 nodes,28 threads

Thread 4 executes all MPI routines
Node 0
Node 1
Node 2
26
Parallel Java Game of Life (Trace)
  • Integrated event tracing
  • Mergedtrace viz
  • Nodeprocessgrouping
  • Threadmessagepairing
  • Vampirdisplay
  • Multi-level event grouping

27
Node / Thread Event Timeline
  • Temporal event behavior
  • Event relationships

28
Integrated Performance View (Callgraph)
  • Sourcelevel
  • MPIlevel
  • Javapackageslevel

29
Conclusion
  • Integrate robust and portable performance system
    (TAU) in Java HPC environment
  • Apply performance system to observe multiple
    levels of Java HPC operation
  • Leverage performance system framework based on
    common performance measurement API
  • Key define multi-level events and define
    associations
  • Opportunities for improvement and application
  • JVM instrumentation and JIT (dynamic compilation)
  • Runtime access to performance data
  • Java scientific packages, communication libraries
    (CCJ, MPJ, RMI), // compilers (JOMP),
    applications, ..

30
More Information and Acknowledgments
  • URLs
  • TAU www.cs.uoregon.edu/research/paracomp/tau
  • Grant support (TAU)
  • DOE 2000 ACTS
  • http//www-unix.mcs.anl.gov/DOE2000
  • http//www.nersc.gov/ACTS
  • DOE ASCI Level 3 (LANL, LLNL)
  • DARPA

31
TAU Distributed Monitoring Framework
  • Extend usability of TAU performance analysis
  • Access TAU performance data during execution
  • Framework model
  • each application context is a performance data
    server
  • monitor agent thread is created within each
    context
  • client processes attach to agents and request
    data
  • server thread synchronization for data
    consistency
  • pull mode of interaction
  • Distributed TAU performance data space
  • A Runtime Monitoring Framework for the TAU
    Profiling System (ISCOPE 99)

32
TAU Distributed Monitor Architecture
TAU profile database
  • Each context has a monitor agent
  • Client in separatethread directs agent
  • Pull model ofinteraction

33
Java Implementation of TAU Monitor
  • Motivations
  • More portable monitor middleware system (RMI)
  • More flexible and programmable server interface
    (JNI)
  • More robust client development (EJB, JDBC, Swing)
Write a Comment
User Comments (0)
About PowerShow.com