Performance Analysis, Tools and Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Performance Analysis, Tools and Optimization

Description:

Performance Analysis, Tools and Optimization Philip J. Mucci Kevin S. London University of Tennessee, Knoxville ARL MSRC Users Group Meeting September 2, 1998 – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 23
Provided by: Innova54
Learn more at: https://icl.utk.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance Analysis, Tools and Optimization


1
Performance Analysis, Tools and Optimization
  • Philip J. Mucci
  • Kevin S. London
  • University of Tennessee, Knoxville
  • ARL MSRC Users Group Meeting
  • September 2, 1998

2
PET, UT and You
  • Training
  • Environments
  • Benchmarking
  • Evaluation and Reviews
  • Consulting
  • Development

3
Training
  • Courses on Benchmarking, Performance
    Optimization, Parallel Tools
  • Provides good mechanism for technology transfer
  • Develop needs and direction from the interaction
    with the user community
  • Tremendous knowledge base from which to draw

4
Environments
  • Use of the MSRC environments provides
  • Bug reports to the vendor
  • System tuning
  • System administrator support
  • Analysis of software needs
  • Performance evaluation
  • Researchers access to advanced hardware

5
Performance Understanding
  • In order to optimize we must understand
  • Why is our code performing a certain way?
  • What can be done about it?
  • How good can we do?
  • Results in confidence, efficiency and better code
    development
  • Time spent is an investment in the future

6
Tool EvaluationPtools Consortium
  • Review of available performance tools,
    particularly parallel
  • Regular reports are issued
  • Tools that we find useful get presented to the
    developers in training or consultation
  • Installation, testing and training
  • Example VAMPIR for scalability analysis

7
Optimization Course
  • Course focuses on compiler options, available
    tools and single processor performance
  • Single biggest bottleneck to many codes,
    especially cache performance
  • Why? Link speeds have increased within an order
    of magnitude of memory bandwidths
  • Also, MPI and language specific issues

8
Benchmarks
  • CacheBench - performance of the memory hierarchy
  • MPBench - performance of core MPI operations
  • BLASBench - performance of dense numerical
    kernels
  • Intended to provide an orthogonal set of
    low-level benchmarks with which we can
    parameterize codes

9
Cache Performance
10
Cache Performance
  • Tuning for caches is difficult without some
    understanding of computer architecture
  • No way to really know whats in the cache during
    a given point in an application
  • Factor of 2-4 performance increase is common
  • Develop a tool to help identify regions in the
    source code, a specific reference.

11
Cache Simulator
  • Profiling the code reveals cache problems
  • Automated instrumentation of offending routines
    via a GUI or by hand
  • Link with simulator library
  • Make architecture configuration file
  • Addresses are traced and simulated
  • Miss locations are recorded and reports are
    generated

12
PerfAPI
  • A standardized interface to hardware performance
    counters
  • Easily usable by application engineers as well as
    tool developers
  • Intended for
  • Performance tools
  • Evaluation
  • Modeling
  • Watch http//www.cs.utk.edu/mucci/pdsa

13
High Performance Debugger
  • Industry wide lack of good debugging support for
    parallel programs
  • TotalView is expensive and GUI only
  • Bandwidth is often not-available off-site
  • Based on dbx and gdb as backends
  • Uses p2d2 from NASA as a framework
  • Standardized, familiar command-line interface

14
MPI Connect
  • Connects separate MPI jobs with PVM
  • 3 function calls to enroll
  • Uses include
  • Metacomputing with Vendor MPI
  • Dynamic and Fault Tolerant MPI jobs now

15
The Future
  • BYOC Workshops
  • Regular Training Schedule
  • Web Based Training
  • Consulting
  • Cross-MSRC Information Exchange
  • Technology Transfer
  • Tool development

16
Origin 2000 Performance Prescription
  • Always use dplace on all codes
  • Always use
  • -LNOcache_size24096
  • For accuracy compile and link with
  • -O2 -IPA -SWPON -LNO -TENVX0-5
  • or
  • -Ofastip27 -OPTroundoff0-3
  • -OPTIEEE_arithmetic1-3

17
Origin 2000 Performance Prescription
  • In Fortran, innermost array index should change
    fastest
  • Use functions in
  • -lcomplib.sgimath or -lscs
  • -lfastm
  • -lm
  • Use MPI_Ixxxx primitives
  • Always execute IRECV early

18
Vampir Timeline Display
19
Vampir Global Activity Chart
20
Identifying a Message in Vampir
21
Identifying a Message in Vampir
22
Nupshot Display
Write a Comment
User Comments (0)
About PowerShow.com