Automatic Performance Analysis and Tuning - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Automatic Performance Analysis and Tuning

Description:

Automatically detect bottlenecks ... Problem identification bottlenecks. Where is the critical resource? ... Use confidence/severity to rank detected bottlenecks ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 39
Provided by: olegmo
Category:

less

Transcript and Presenter's Notes

Title: Automatic Performance Analysis and Tuning


1
Automatic Performance Analysis and Tuning
Anna Morajko, Oleg Morajko, Josep Jorba, Tomàs
Margalef, Emilio Luque Universitat Autónoma de
Barcelona
2
Content
  • Our goals automatic analysis and tuning
  • Dynamic automatic performance analysis
  • Dynamic automatic tuning
  • Conclusions and future work

3
Our goals
  • Create tools that are able to automatically
    analyze the performance of parallel applications
    detect bottlenecks, explain their reasons and
    provide hints to the developer
  • Static approach, based on trace files and source
    code analysis (Kappa-Pi)
  • Dynamic approach, based on on the fly analysis
    of run-time performance data and using
    application model and static call graph (Dynamic
    Kappa-Pi)
  • Create a tool that is able to automatically tune
    the performance of parallel applications during
    their execution without recompiling and rerunning
    them (MATE)

4
Our goals
  • Static automatic analysis Kappa-Pi
  • Static trace file analysis supported by source
    code examination

5
Our goals
  • Dynamic automatic analysis (Dynamic Kappa-Pi)
  • based on inductive reasoning (bottom-up approach)

6
Our goals
  • Dynamic automatic tuning (MATE)

7
Content
  • Our goals automatic analysis and tuning
  • Dynamic automatic performance analysis
  • Objectives
  • Performance metrics and properties
  • Performance analysis
  • Dynamic automatic tuning
  • Conclusions and future work

8
Objectives
  • Analyze the performance of parallel applications
    during their execution
  • Automatically detect bottlenecks
  • Provide a clear explanation of the identified
    problems to developers
  • Correlate problems with the source code
  • Provide recommendations on possible solutions

9
Key Challenges
  • What is wrong in the application?Problem
    identification bottlenecks
  • Where is the critical resource?Problem location
    (hardware, OS, source code)
  • When does it happen?How to organize problem
    search in time?
  • How important it is?How to compare the
    importance of different problems?
  • Why it happens?How to explain the reasons of the
    problem to the user?

10
Design Assumptions
  • Dynamic on-the-fly analysis
  • Knowledge specification
  • Bottleneck detection based on inductive reasoning
    (bottom-up approach)
  • Bottleneck location identification based on
    call-graph search
  • Tool primarily targeted to MPI-based parallel
    programs

11
Performance Data Collection
  • Application is described with basic abstractions
  • Static entities (modules, funtions, static call
    graph),
  • Dynamic entities (processes, threads, objects)
  • Performance analysis is based on measurements of
    performance data
  • Performance data is captured at run-time by means
    of dynamic instrumentation
  • Add/Remove measurement
  • Mesurement description specify
  • What metric should be collected
  • Where to insert instrumentation

12
Performance Properties
  • Properties describe the specific types of
    performance behavior in a program
  • Properties are higher-level abstractions used to
    represent common performance problems
  • They are based on conditions dependent on certain
    performance metrics
  • We can express these abstractions using ASL
    (APART Specification Language)

13
APART Specification Language
  • ASL is a declarative language (like SQL)

property small_io_requests (Process p, Experiment
e) let float cost profile
(p,e).io_time int num_reads profile
(p,e).num_reads int bytes_read profile
(p,e).bytes_read in condition cost gt 0
and bytes_read/num_reads lt SMALL_IO_THRESHOLD
confidence 1 severity cost/duration
(p, e)
14
Property Evaluation
Process instance
ProcessTaskId 7 Path
/home/bin/mc...
property small_io_requests (Process p, Experiment
e) let float cost profile
(p,e).io_time int num_reads profile
(p,e).num_reads int bytes_read profile
(p,e).bytes_read in condition cost gt 0
and bytes_read/num_reads lt SMALL_IO_THRESHOLD
confidence 1 severity cost/duration
(p, e)
condition true confidence
1 severity 0.35
15
Performance Analysis
  • Bottom-up approach
  • Insert a set of basic metrics in each process
  • readOps, bytesRead, commTime, bytesSent, ...
  • Perform measurements and periodically gather
    results
  • Propagate metrics to evaluate hierarchy of
    properties
  • Use confidence/severity to rank detected
    bottlenecks
  • Next, try looking for more precise location of
    the problem (where-axis search)
  • use static call-graph information
  • start from locations where metrics are captured
    (e.g. read, write, flush functions)
  • Find related functions in the user application
    (callers)
  • Finally, explain the behavior to the user

16
Performance Analysis
...
Metrics
17
Content
  • Our goals automatic analysis and tuning
  • Dynamic automatic performance analysis
  • Dynamic automatic tuning
  • Objectives
  • MATE (Monitoring, Analysis and Tuning
    Environment)
  • Tuning techniques and experiments
  • Conclusions and future work

18
Objectives
  • Improve the execution of a parallel application
  • by dynamically adapting it
  • Key issues
  • Dynamic performance tuning approach
  • Automatic improvement of any application without
    recompiling and rerunning it
  • MATE Monitoring, Analysis and Tuning
    Environment
  • Prototype implementation in C
  • for PVM based applications
  • Sun Solaris 2.x / SPARC

19
Dynamic Tuning Applications
  • Black box tuning of ANY application
  • Library usage Tested
  • Operating system usage
  • Cooperative tuning of prepared application
  • Supported by program specification Tested
  • Application developed using framework In progress

20
MATE
Machine 1
Machine 2
Tuner
Tuner
Monitor
Monitor
pvmd
pvmd
Machine 3
Analyzer
21
MATE Monitoring
  • Monitors control execution of application tasks
    and allow dynamic event collection
  • Key services
  • Distributed application control
  • Instrumentation management
  • AddEventTrace(id,func,place,args)
  • RemoveEventTrace(id)
  • Transmission of requested event records to
    analyzer

22
MATE Monitoring
  • Instrumentation management
  • Based on DynInst API
  • Dynamically loads tracing library
  • Inserts snippets into requested points
  • A snippet calls a library function
  • A function prepares event record and transmits
    it to the Analyzer
  • Event record
  • What - event type (id, place)
  • When - global timestamp
  • Where task identifier
  • Requested attributes - function call parameters,
    source code line number, etc.

23
MATE Analysis
  • Analyzer is responsible for the automatic
    performance analysis on the fly
  • Uses a set of predefined tuning techniques
  • Each technique is specified as
  • measure points what events are needed
  • performance model and activating conditions
  • solution - tuning actions/points/synchronization
    - what to change, where, when

24
MATE Analysis
Events (from tracing library) via TCP/IP
Tuning technique
Event Collector
Tuning technique
Tuning technique
Tuning request (to tuner) via TCP/IP
Event Processor
Tuning Manager
Analyzer
Instrumentation request (to monitor) via TCP/IP
Instr Manager
Metric Repository
25
MATE Knowledge
  • Measure point example

Insert instrumentation into ALL tasks at entry of
function pvm_send() as eventId 1 record
parameter 1 as int parameter 2 as int
ltintrumentation_request taskIdallgt ltfunction
namepvm_sendgt lteventIdgt1lt/eventIdgt
ltplacegtentrylt/placegt ltparam
idx1gtintlt/paramgt ltparam
idx2gtintlt/paramgt lt/functiongtlt/intrumentation
_requestgt
26
MATE Knowledge
  • Performance model example

CurrentSize result of pvm_getopt
(PvmFragSize) OptimalSize Average (MsgSize)
Stddev (MsgSize) Condition CurrentSize
OptimalSize gt threshold1 CommunicationCost
xxx Condition Communication cost gt threshold2
ltperformance_modelgt ltvalue nameCurrentSizegt
ltcalcgt lttypegtfunction_resultlt/t
ypegt ltfunction namepvm_getoptgt
ltparamgtPvmFragSizelt/paramgt
lt/functiongt lt/calcgt
lt/valuegt ... lt/performance_modelgt
27
MATE Knowledge
  • Solution example

In task with tid524247, execute one time a
function pvm_setopt (PvmFragSize, 16384)
breaking at entry of function pvm_send()
lttuning_request taskId524247gt ltactiongt
ltone_time_callgt ltfunction
namepvm_setoptgt ltparam
idx0gtPvmFragSizelt/paramgt ltparam
idx1gt16384lt/paramgt lt/functiongt
ltsynchronizegt
ltbreakpointgt ltfunction
namepvm_sendgt
ltplacegtentrylt/placegt
lt/functiongt lt/breakpointgt
lt/synchronizegt lt/one_time_callgt
lt/actiongtlt/tuning_requestgt
28
MATE Tuning
  • Tuners apply solutions by executing tuning
    actions at specified tuning points
  • A tuner module is integrated with monitor process
  • Receives a tuning request from analyzer
  • Prepares modifications (snippets)
  • Applies modifications via DynInst

Tuner/Monitor
recv_req (taskId, TuningReq) Task task
taskList.Find (taskId) snippet
PrepareSnippet (TuningReq) task.thread.insertS
nippet (snippet)
Analyzer
TuningReq () send_req (tuner, taskId,
tuningReq)
29
MATE Tuning
  • Tuning request
  • Tuner machine
  • Task id
  • Tuning action
  • One time function call
  • Function parameter changes
  • Function call
  • Function replacement
  • Variable changes
  • Tuning points as pairs object, value
  • function name, param name, param value
  • Synchronization
  • When to perform tuning action

In task with tid524247, call one time a function
pvm_setopt (PvmFragSize, 16384) breaking at
entry of function pvm_send()
30
Tuning techniques
  • What can be tuned?
  • Library usage
  • Mathematical libraries
  • Communication libraries
  • Memory management functions
  • I/O functions

31
Tuning techniques
  • Communication library
  • Tuning of PVM library usage
  • Investigating PVM bottlenecks
  • PVM communication bottlenecks
  • Communication mode (Direct, indirect)
  • Data encoding mode
  • Message fragment size

32
Tuning techniques Example application
  • Integer Sort Kernel benchmark from NAS
  • High communication cost (50)
  • Default settings indirect communication mode,
    DataRaw encoding, message fragment size 4KB

33
Other tuning techniques
  • TCP/IP
  • send/receive buffer size
  • sending without delay (Nagles algorithm,
    TCP_NO_DELAY)
  • I/O
  • read/write operations
  • using prefetch when small requests
  • using asynchronous read/write instead of
    synchronous
  • I/O buffer size
  • Memory allocation
  • plugging-in specialized strategies (pool
    allocator)

34
Content
  • Our goals automatic analysis and tuning
  • Dynamic automatic analysis based on ASL
  • Dynamic automatic tuning
  • Conclusions and future work

35
Conclusions
  • Automatic performance analysis
  • Dynamic tuning
  • Designs
  • Experiments

36
Future work
  • Automatic analysis
  • Discuss and close detailed ASL language
    specification
  • Complete property evaluator
  • Connect the analyzer with performance measurement
    tool
  • Investigate the why-axis analysis(Evaluate the
    causal property chains)

37
Future work
  • Automatic tuning
  • Tuning mathematical libraries
  • Tuning application developed using framework

38
Automatic Performance Analysis and Tuning
Thank you very much
Universitat Autònoma de Barcelona
Write a Comment
User Comments (0)
About PowerShow.com