Scalability Study of S3D on Jaguar CNL using TAU - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Scalability Study of S3D on Jaguar CNL using TAU

Description:

Support for multiple parallel programming paradigms ... MPI_Wait has jagged edges. TAU Performance System. S3D Scalability Study. 27 ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 28
Provided by: peris
Category:
Tags: cnl | s3d | tau | jagged | jaguar | scalability | study | using

less

Transcript and Presenter's Notes

Title: Scalability Study of S3D on Jaguar CNL using TAU


1
Scalability Study of S3D on Jaguar CNL using TAU
  • Sameer Shende
  • tau-team_at_cs.uoregon.edu

2
Acknowledgements
  • Alan Morris UO
  • Kevin Huck UO
  • Allen D. Malony UO
  • Bronis R. de Supinski LLNL
  • The performance data presented here is
    available at
  • http//tau.uoregon.edu/s3d

3
TAU Parallel Performance System
  • http//tau.uoregon.edu/
  • Multi-level performance instrumentation
  • Multi-language automatic source instrumentation
  • Flexible and configurable performance measurement
  • Widely-ported parallel performance profiling
    system
  • Computer system architectures and operating
    systems
  • Different programming languages and compilers
  • Support for multiple parallel programming
    paradigms
  • Multi-threading, message passing, mixed-mode,
    hybrid

4
Scalability Study
  • C2H4 Benchmark
  • Platform Jaguar Cray CNL at ORNL
  • 1p
  • 4p
  • 64p
  • 512p
  • 1728p
  • 4096p
  • 8000p
  • 12000p
  • Goal to evaluate scaling properties of code
    regions
  • Scalability of MPI operations

5
PerfDMF Performance Data Mgmt. Framework
6
PerfExplorer - Comparative Analysis
  • Relative speedup, efficiency
  • total runtime, by event, one event, by phase
  • Breakdown of total runtime
  • Group fraction of total runtime
  • Correlating events to total runtime
  • Timesteps per second

7
PerfExplorer
TAUs PerfDMF database
8
PerfExplorer Select Experiment Analysis
9
Total Execution Time
10
Relative Efficiency For S3D - Weak Scaling
11
Relative Efficiency by Event
12
Relative Speedup by Event
13
Data Mining Event Correlation to Total Time
r 1 implies direct correlation
14
MPI Scaling (Total time in MPI/Total Time)
15
Total Runtime Breakdown by Events
16
Floating Point Instructions
17
Level 1 Data Cache Misses
18
ParaProf 12000 core job
19
ParaProf Mean across all nodes
20
ParaProf 3D Correlation Cube MPI_Wait!
21
ParaProf MPI_Wait variation!
22
ParaProf MPI_Wait Histogram
23
ParaProf Mflops in Code Regions
24
ParaProf Mflops Sorted by Exclusive Time
low mflops?
25
S3D - Building with TAU
  • Change name of compiler in build/make.XT3
  • ftngt tau_f90.sh
  • cc gt tau_cc.sh
  • Set compile time environment variables
  • setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_late
    st/craycnl/lib/ Makefile.tau-callpath-multipleco
    unters-mpi-papi-pdt-pgi
  • Choose callpath, PAPI counters, MPI profiling,
    PDT for source instrumentation
  • setenv TAU_OPTIONS -optTauSelectFileselect.tau
    -optPreProcess
  • Selective instrumentation file eliminates
    instrumentation in lightweight routines
  • Pre-process Fortran source code using cpp before
    compiling
  • Set runtime environment variables for
    instrumentation control and event PAPI counter
    selection in job submission script
  • export TAU_THROTTLE1
  • export COUNTER1 GET_TIME_OF_DAY
  • export COUNTER2 PAPI_FP_INS
  • export COUNTER3 PAPI_L1_DCM

26
Concluding Discussion
  • Identified scaling trends for S3D upto 12k cores
  • Identified two loops that take a significant
    amount of time that has relatively low mflops
  • TRANSPORT_MCOMPUTESPECIESDIFFFLUX 630-656
  • INTEGRATE 73-93
  • MPI_Wait has jagged edges

27
Support Acknowledgements
  • Department of Energy (DOE)
  • Office of Science
  • LLNL, LANL, ORNL, ASC
  • PERI
Write a Comment
User Comments (0)
About PowerShow.com