UAB - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

UAB

Description:

Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tom s Margalef and Emilio Luque ... New wide systems are built over the available resources and the user does not ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 23

Provided by: genaro4

Category:

Tags: uab | paola

more less

Transcript and Presenter's Notes

Title: UAB

1
UAB
Paradyn Week 2006 March 2006

Dynamic Monitoring and Tuning in Multicluster
Environment

Genaro Costa, Anna Morajko, Paola Caymes Scutari,
Tomàs Margalef and Emilio Luque Universitat
Autònoma de Barcelona
2
Outline

Introduction
Multicluster Systems
Applications on Wide Systems
MATE
New Requirements
Design
Conclusions

3
Introduction

System performance
New problems require more computation power.
Performance is a key issue.
New wide systems are built over the available
resources and the user does not have total
control of where the application will run.
It became more difficult to reach high
performance and efficiency for these wide systems.

4
Introduction (II)

To reach performance goals, users need to find
and solve bottlenecks.
Dynamic Monitoring and Tuning is a promising
approach.
With dynamic systems properties, efficient
resource use is hard to reach even for expert
users.

5
Multicluster Systems

New systems are built using existing resources.
Examples are NOW and HNOW linked with multistage
network interconnections.
Intra cluster communications have different
latencies than inter cluster communications.
Generally multiclusters built of clusters
(homogenous or heterogeneous) interconnected by
WAN.

6
Multicluster Systems (II)

Each cluster can have its own scheduler and can
be exposed either through a head node or by all
nodes

7
Applications on Wide Systems
Cluster A
Master

Hierarchical Master/Worker Applications
Raise the possibility of performance bottlenecks
Load imbalance problems
Inefficient resource use
Non-deterministic inter cluster bandwidth

Worker
Worker
Worker
Worker
Common data aretransmitted once
Cluster B
Sub Master
Sub Master explores data locality
Worker
Worker
Worker
Worker
8
Applications on Wide Systems (II)

Hierarchical Master/Worker Applications
Sub master is seen as a high processing node by
the master.
Work distribution from master to sub master
should be based on
Available bandwidth
Computing power
These characteristics may have dynamic behavior.

9
MATE

Monitoring, Analysis and Tuning Environment
Dynamic automatic tuning of parallel/distributed
applications.

Modifications
DynInst
Instrumentation
10
MATE (II)
Machine 2
Machine 1
modif.
AC
AC
Task1
Task2
Task3
DMLib
DMLib
DMLib
instr.
instr.
events

Application Controller - AC
Dynamic Monitoring Library - DMLib
Analyzer

events
Machine 3
Analyzer
11
MATE (III)
Analyzer
DTAPI

Each tuning technique is implemented in MATE as a
tunlet, a C/C library dynamically loaded to
the Analyzer process.
measure points what events are needed
performance model how to determine bottlenecks
and solutions
tuning actions/points/synchronization - what to
change, where, when

12
New Requirements

Transparent process tracking
AC should follow application process to any
cluster.
Lower inter cluster instrumentation communication
overhead
Inter cluster communications generally have high
latency and lower bandwidth.

13
Transparent process tracking
DESIGN

System Service
Machine or Cluster can have MATE enabled as
daemon that detects startup of new processes.

MATE EnabledMachine
MATE EnabledMachine
Taskn
attach
Taskn
DMLib
AC
AC
startup detection
control
Analyzersubscription
14
Transparent process tracking
DESIGN (II)

Application plug-in
AC can be binary packaged with application
binary.

AC
Task
DMLib
Remote Machine
Remote Machine
detects Dyninst
Taskn
new Task
create
new Task
create
DMLib
Task
AC
AC
control
Job submission
AC
DMLib
Analyzersubscription
15
Lower communication overhead
DESIGN (III)

Smart event collection
Total application trace may generate much
overhead.
Event aggregation
Remote trace events should be aggregated to trace
event abstractions, saving bandwidth.
Inter Cluster Trace Event Routing

16
Analyzer Approaches

Centralized
Requires tunlets modification to distinguish
instrumentation data of local application
processes.
Hierarchical
Requires tunlets dismembering into local tunlets
and global tunlets.
Distributed
Requires that tunlets instances located on
different Analyzer instances cooperate to tune an
application.

17
Lower communication overhead (II)
DESIGN (IV)

Centralized Analyzer Approach

Cluster B
Cluster A
Machine B3
Machine B1
Machine A1
Machine A2
Task1
Task1
AC
AC
AC
AC
Task2
Task4
Task3
Task3
Machine B2
AC
Machine A3
Analyzer
Event Router
Task2
18
Local Performance Model Analysis
DESIGN (V)

Hierarchical Analyzer Approach

Cluster B
Cluster A
Machine B3
Machine B1
Machine A1
Machine A2
Task1
Task1
AC
AC
AC
AC
Task2
Task4
Task3
Task3
Machine B2
LocalAnalyzer
Machine A4
Machine A3
GlobalAnalyzer
LocalAnalyzer
Abstract Events
19
Distributed Monitoring, Analysis and Tuning
Environment
DESIGN (VI)

Distributed Analyzer Approach

Cluster A
Cluster B
Cluster B
Cluster A
Machine B3
Machine B1
Machine A1
Machine A2
Task1
Task1
AC
AC
AC
AC
Task2
Task4
Task3
Task3
Machine B2
Machine A3
Analyzer
Tunlet instancescooperation
Analyzer
20
Conclusions and future work

Conclusions
Interference of instrumentation information on
inter cluster communication should be minimal.
Process tracking enables MATE for multicluster
systems.
Centralized Analyzer approach benefits tunlet
developer but does not scale.
Distributed Analyzer approach scales but requires
different model based analysis.

21
Conclusions and future work (II)