Title: Automatic Performance Analysis and Tuning
1Automatic Performance Analysis and Tuning
Anna Morajko, Oleg Morajko, Josep Jorba, Tomà s
Margalef, Emilio Luque Universitat Autónoma de
Barcelona
2Content
- Our goals automatic analysis and tuning
- Dynamic automatic performance analysis
- Dynamic automatic tuning
- Conclusions and future work
3Our goals
- Create tools that are able to automatically
analyze the performance of parallel applications
detect bottlenecks, explain their reasons and
provide hints to the developer - Static approach, based on trace files and source
code analysis (Kappa-Pi) - Dynamic approach, based on on the fly analysis
of run-time performance data and using
application model and static call graph (Dynamic
Kappa-Pi) - Create a tool that is able to automatically tune
the performance of parallel applications during
their execution without recompiling and rerunning
them (MATE)
4Our goals
- Static automatic analysis Kappa-Pi
- Static trace file analysis supported by source
code examination
5Our goals
- Dynamic automatic analysis (Dynamic Kappa-Pi)
- based on inductive reasoning (bottom-up approach)
6Our goals
- Dynamic automatic tuning (MATE)
7Content
- Our goals automatic analysis and tuning
- Dynamic automatic performance analysis
- Objectives
- Performance metrics and properties
- Performance analysis
- Dynamic automatic tuning
- Conclusions and future work
8Objectives
- Analyze the performance of parallel applications
during their execution - Automatically detect bottlenecks
- Provide a clear explanation of the identified
problems to developers - Correlate problems with the source code
- Provide recommendations on possible solutions
9Key Challenges
- What is wrong in the application?Problem
identification bottlenecks - Where is the critical resource?Problem location
(hardware, OS, source code) - When does it happen?How to organize problem
search in time? - How important it is?How to compare the
importance of different problems? - Why it happens?How to explain the reasons of the
problem to the user?
10Design Assumptions
- Dynamic on-the-fly analysis
- Knowledge specification
- Bottleneck detection based on inductive reasoning
(bottom-up approach) - Bottleneck location identification based on
call-graph search - Tool primarily targeted to MPI-based parallel
programs
11Performance Data Collection
- Application is described with basic abstractions
- Static entities (modules, funtions, static call
graph), - Dynamic entities (processes, threads, objects)
- Performance analysis is based on measurements of
performance data - Performance data is captured at run-time by means
of dynamic instrumentation - Add/Remove measurement
- Mesurement description specify
- What metric should be collected
- Where to insert instrumentation
12Performance Properties
- Properties describe the specific types of
performance behavior in a program - Properties are higher-level abstractions used to
represent common performance problems - They are based on conditions dependent on certain
performance metrics - We can express these abstractions using ASL
(APART Specification Language)
13APART Specification Language
- ASL is a declarative language (like SQL)
property small_io_requests (Process p, Experiment
e) let float cost profile
(p,e).io_time int num_reads profile
(p,e).num_reads int bytes_read profile
(p,e).bytes_read in condition cost gt 0
and bytes_read/num_reads lt SMALL_IO_THRESHOLD
confidence 1 severity cost/duration
(p, e)
14Property Evaluation
Process instance
ProcessTaskId 7 Path
/home/bin/mc...
property small_io_requests (Process p, Experiment
e) let float cost profile
(p,e).io_time int num_reads profile
(p,e).num_reads int bytes_read profile
(p,e).bytes_read in condition cost gt 0
and bytes_read/num_reads lt SMALL_IO_THRESHOLD
confidence 1 severity cost/duration
(p, e)
condition true confidence
1 severity 0.35
15Performance Analysis
- Bottom-up approach
- Insert a set of basic metrics in each process
- readOps, bytesRead, commTime, bytesSent, ...
- Perform measurements and periodically gather
results - Propagate metrics to evaluate hierarchy of
properties - Use confidence/severity to rank detected
bottlenecks - Next, try looking for more precise location of
the problem (where-axis search) - use static call-graph information
- start from locations where metrics are captured
(e.g. read, write, flush functions) - Find related functions in the user application
(callers) - Finally, explain the behavior to the user
16Performance Analysis
...
Metrics
17Content
- Our goals automatic analysis and tuning
- Dynamic automatic performance analysis
- Dynamic automatic tuning
- Objectives
- MATE (Monitoring, Analysis and Tuning
Environment) - Tuning techniques and experiments
- Conclusions and future work
18Objectives
- Improve the execution of a parallel application
- by dynamically adapting it
- Key issues
- Dynamic performance tuning approach
- Automatic improvement of any application without
recompiling and rerunning it - MATE Monitoring, Analysis and Tuning
Environment - Prototype implementation in C
- for PVM based applications
- Sun Solaris 2.x / SPARC
19Dynamic Tuning Applications
- Black box tuning of ANY application
- Library usage Tested
- Operating system usage
- Cooperative tuning of prepared application
- Supported by program specification Tested
- Application developed using framework In progress
20MATE
Machine 1
Machine 2
Tuner
Tuner
Monitor
Monitor
pvmd
pvmd
Machine 3
Analyzer
21MATE Monitoring
- Monitors control execution of application tasks
and allow dynamic event collection - Key services
- Distributed application control
- Instrumentation management
- AddEventTrace(id,func,place,args)
- RemoveEventTrace(id)
- Transmission of requested event records to
analyzer
22MATE Monitoring
- Instrumentation management
- Based on DynInst API
- Dynamically loads tracing library
- Inserts snippets into requested points
- A snippet calls a library function
- A function prepares event record and transmits
it to the Analyzer - Event record
- What - event type (id, place)
- When - global timestamp
- Where task identifier
- Requested attributes - function call parameters,
source code line number, etc.
23MATE Analysis
- Analyzer is responsible for the automatic
performance analysis on the fly - Uses a set of predefined tuning techniques
- Each technique is specified as
- measure points what events are needed
- performance model and activating conditions
- solution - tuning actions/points/synchronization
- what to change, where, when
24MATE Analysis
Events (from tracing library) via TCP/IP
Tuning technique
Event Collector
Tuning technique
Tuning technique
Tuning request (to tuner) via TCP/IP
Event Processor
Tuning Manager
Analyzer
Instrumentation request (to monitor) via TCP/IP
Instr Manager
Metric Repository
25MATE Knowledge
Insert instrumentation into ALL tasks at entry of
function pvm_send() as eventId 1 record
parameter 1 as int parameter 2 as int
ltintrumentation_request taskIdallgt ltfunction
namepvm_sendgt lteventIdgt1lt/eventIdgt
ltplacegtentrylt/placegt ltparam
idx1gtintlt/paramgt ltparam
idx2gtintlt/paramgt lt/functiongtlt/intrumentation
_requestgt
26MATE Knowledge
- Performance model example
CurrentSize result of pvm_getopt
(PvmFragSize) OptimalSize Average (MsgSize)
Stddev (MsgSize) Condition CurrentSize
OptimalSize gt threshold1 CommunicationCost
xxx Condition Communication cost gt threshold2
ltperformance_modelgt ltvalue nameCurrentSizegt
ltcalcgt lttypegtfunction_resultlt/t
ypegt ltfunction namepvm_getoptgt
ltparamgtPvmFragSizelt/paramgt
lt/functiongt lt/calcgt
lt/valuegt ... lt/performance_modelgt
27MATE Knowledge
In task with tid524247, execute one time a
function pvm_setopt (PvmFragSize, 16384)
breaking at entry of function pvm_send()
lttuning_request taskId524247gt ltactiongt
ltone_time_callgt ltfunction
namepvm_setoptgt ltparam
idx0gtPvmFragSizelt/paramgt ltparam
idx1gt16384lt/paramgt lt/functiongt
ltsynchronizegt
ltbreakpointgt ltfunction
namepvm_sendgt
ltplacegtentrylt/placegt
lt/functiongt lt/breakpointgt
lt/synchronizegt lt/one_time_callgt
lt/actiongtlt/tuning_requestgt
28MATE Tuning
- Tuners apply solutions by executing tuning
actions at specified tuning points - A tuner module is integrated with monitor process
- Receives a tuning request from analyzer
- Prepares modifications (snippets)
- Applies modifications via DynInst
Tuner/Monitor
recv_req (taskId, TuningReq) Task task
taskList.Find (taskId) snippet
PrepareSnippet (TuningReq) task.thread.insertS
nippet (snippet)
Analyzer
TuningReq () send_req (tuner, taskId,
tuningReq)
29MATE Tuning
- Tuning request
- Tuner machine
- Task id
- Tuning action
- One time function call
- Function parameter changes
- Function call
- Function replacement
- Variable changes
- Tuning points as pairs object, value
- function name, param name, param value
- Synchronization
- When to perform tuning action
In task with tid524247, call one time a function
pvm_setopt (PvmFragSize, 16384) breaking at
entry of function pvm_send()
30Tuning techniques
- What can be tuned?
- Library usage
- Mathematical libraries
- Communication libraries
- Memory management functions
- I/O functions
31Tuning techniques
- Communication library
- Tuning of PVM library usage
- Investigating PVM bottlenecks
- PVM communication bottlenecks
- Communication mode (Direct, indirect)
- Data encoding mode
- Message fragment size
32Tuning techniques Example application
- Integer Sort Kernel benchmark from NAS
- High communication cost (50)
- Default settings indirect communication mode,
DataRaw encoding, message fragment size 4KB
33Other tuning techniques
- TCP/IP
- send/receive buffer size
- sending without delay (Nagles algorithm,
TCP_NO_DELAY) - I/O
- read/write operations
- using prefetch when small requests
- using asynchronous read/write instead of
synchronous - I/O buffer size
- Memory allocation
- plugging-in specialized strategies (pool
allocator)
34Content
- Our goals automatic analysis and tuning
- Dynamic automatic analysis based on ASL
- Dynamic automatic tuning
- Conclusions and future work
35Conclusions
- Automatic performance analysis
- Dynamic tuning
- Designs
- Experiments
36Future work
- Automatic analysis
- Discuss and close detailed ASL language
specification - Complete property evaluator
- Connect the analyzer with performance measurement
tool - Investigate the why-axis analysis(Evaluate the
causal property chains)
37Future work
- Automatic tuning
- Tuning mathematical libraries
- Tuning application developed using framework
38Automatic Performance Analysis and Tuning
Thank you very much
Universitat Autònoma de Barcelona