Corina Stratan, Ciprian Dobre UPB - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Corina Stratan, Ciprian Dobre UPB

Description:

... models and measured parameters on test bed systems for all the basic components, ... having different sizes / protocols. Network model ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 46

Provided by: ios80

Category:

more less

Transcript and Presenter's Notes

Title: Corina Stratan, Ciprian Dobre UPB

1

MONARC Simulation Framework

Corina Stratan, Ciprian Dobre UPB
Iosif Legrand, Harvey Newman CALTECH

2
The GOALS of the Simulation Framework

The aim of this work is to continue and improve
the development of the MONARC simulation
framework
To perform realistic simulation and modelling of
large scale distributed computing systems,
customised for specific HEP applications.
To offer a dynamic and flexible simulation
environment to be used as a design tool for
large distributed systems
To provide a design framework to evaluate the
performance of a range of possible computer
systems, as measured by their ability to provide
the physicists with the requested data in the
required time, and to optimise the cost.

3
A Global View for Modelling

MONITORING
REAL Systems
Testbeds
4
Design Considerations

This Simulation framework is not intended to
be a detailed simulator for basic components such
as operating systems, data base servers or
routers.
Instead, based on realistic mathematical models
and measured parameters on test bed systems for
all the basic components, it aims to correctly
describe the performance and limitations of large
distributed systems with complex interactions.

5
Simulation Engine

6
Design Considerations of the Simulation Engine

A process oriented approach for discrete event
simulation is well suited to describe concurrent
running programs.
Active objects (having an execution thread, a
program counter, stack...) provide an easy way
to map the structure of a set of distributed
running programs into the simulation environment.
The Simulation engine supports an interrupt
scheme
This allows effective correct simulation
for concurrent processes with very different time
scale by using a DES approach with a continuous
process flow between events

7
The Simulation Engine Tasks and Events
Task for simulating an entity with time
dependent behavior (active object, server, )
Running
semaphore.v()
semaphore.p()
Assigned to worker thread
5 possible states for a task CREATED, READY,
RUNNING, FINISHED, WAITING
Finished
Created
Ready
Event happens or sleeping period is over
Waiting
Each task maintains an internal semaphore
necessary for switching between states.
Event - used for communication and
synchronization between tasks when a task must
notify another task about something that happened
or will happen in the future, it creates an event
addressed to that task. The events are queued
and sent to the destination tasks by the engines
scheduler.
8
Tests of the Engine
Processing a TOTAL of 100 000 simple jobs in
1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs
using the same number of parallel threads
more tests http//monarc.cacr.caltech.edu/
9
Basic Components
10
Basic Components

These Basic components are capable to simulate
the core functionality for general distributed
computing systems. They are constructed based on
the simulation engine and are using efficiently
the implementation of the interrupt functionality
for the active objects .
These components should be considered the basic
classes from which specific components can be
derived and constructed

11
Basic Components

Computing Nodes
Network Links and Routers , IO protocols
Data Containers
Servers
Data Base Servers
File Servers (FTP, NFS )
Jobs
Processing Jobs
FTP jobs
Scripts Graph execution schemes
Basic Scheduler
Activities ( a time sequence of jobs )

12
Multitasking Processing Model

Concurrent running tasks share resources (CPU,
memory, I/O) Interrupt driven scheme For each
new task or when one task is finished, an
interrupt is generated and all processing times
are recomputed.
13
LAN/WAN Simulation Model
Link
Node
LAN
ROUTER
Internet Connections
Interrupt driven simulation for each new
message an interrupt is created and for all the
active transfers the speed and the estimated
time to complete the transfer are recalculated.
ROUTER
Continuous Flow between events ! An efficient and
realistic way to simulate concurrent transfers
having different sizes / protocols.
14
Network model

data traffic simulated for both local and wide
area networks
a simulation at the packet level is practically
impossible
we adopted a larger scale approach, based on an
interrupt mechanism

Network Entity
LAN, WAN, LinkPort
main attribute bandwidth
keeps the evidence of the messages that traverse
it

Components of the network model
15
Simulating the network transfers

interrupt mechanism similar with the one used for
job execution simulation
the initial speed of a message is determined by
evaluating the bandwidth that each entity on the
route can offer
different network protocols can be modelled

Caltech WAN
CERN WAN
CERN Router
Caltech Router
INT
INT
Caltech LAN
INT
CERN LAN
Message1
Message2
newMessage
Message3
CPU
CPU
LinkPort
LinkPort
1. The route and the available bandwidth for the
new message are determined. 1. The messages on
the route are interrupted and their speeds are
recalculated.
16
Job Scheduling and Execution
Activity1 class Activity1 extends Activity
public void pushJobs() Job newJob
new Job () addJob(newJob)
CPU 1
CPU 2
Job 3 (30 CPU)
Activity2 class Activity2 extends Activity

Job 4 (30 CPU)
Job 5 (40 CPU)
1. The activity class creates a job and submits
it to the farm. 2. The job scheduler sends the
new job to a CPU unit. All the jobs executing on
that CPU are interrupted. 3. CPU power
reallocated on the unit where the new job was
scheduled. The interrupted jobs reestimate their
completion time.
CPU 3
INT
3
Job 6 (50 CPU)
INT
Job 7 (50 CPU)
17
Output of the simulation
Node
Simulation Engine
DB
Output Listener Filters
GRAPHICS
Router
Output Listener Filters
Log Files EXEL
User C
Any component in the system can generate generic
results objects Any client can subscribe with a
filter and will receive the results it is
Interested in . VERY SIMILAR structure as in
MonALISA . We will integrate soon The output of
the simulation framework into MonaLISA
18
Specific Components
19
Specific Components

These Components should be derived from the
basic components and must implement the specific
characteristics and way they will operate.
Major Parts
Data Model
Data Flow Diagrams from Production and
especially for Analysis Jobs
Scheduling / pre-allocation policies
Data Replication Strategies

20
Data Model

Generic Data
Container
Size
Event Type
Event Range
Access Count
INSTANCE

META DATA Catalog Replication Catalog
Network FILE
FILE
Data Base
Custom Data Server
FTP Server Node
DB Server
NFS Server
Export / Import
21
Data Model (2)
META DATA Catalog Replication Catalog
Data Processing JOB
Data Request
Data Container
Select from the options
JOB
List Of IO Transactions
22
Database Functionality
Automatic storage management example

Client-server model
Automatic storage management is possible, with
data being sent to mass storage units

DatabaseServer
Mass Storage 1
DContainer 1
DContainer 20
DB1
myJob
DContainer 21
DContainer 2
DContainer 22
Mass Storage 2
DContainer 3
DContainer 23
DContainer 24

3 kinds of requests for the database server
write
read
get (read the data and erase it from the server)

DContainer 15
DB2
DContainer 16

1. The job wants to write a container into the
database DB1, but the server is out of storage
space. 2. The least frequently used container is
moved to a mass storage unit. The new container
is written to the database.
23
Data Flow Diagrams for JOBS
Input and output is a collection of data. This
data is described by type and range
Input
Processing 1
Process is described by name
A fine granularity decomposition of processes
which can be executed independently and the way
they communicate can be very useful for
optimization and parallel execution !
Output
Input
Output
Processing 2
Processing 4
10x
Output
Output
Input
Input
Processing 3
Processing 4
Output
Input
24
Job Scheduling Centralized Scheme
Site A
Site B
JobScheduler
JobScheduler
Dynamically loadable module
GLOBAL Job Scheduler
25
Job Scheduling Distributed Scheme market
model
COST
Site A
Site B
JobScheduler
JobScheduler
Request
DECISION
JobScheduler
Site A
26
Computing Models
27
Activities Arrival Patterns

A flexible mechanism to define the Stochastic
process of how users perform data processing
tasks
Dynamic loading of Activity tasks, which are
threaded objects and are controlled by the
simulation scheduling mechanism
Physics Activities Injecting Jobs
Each Activity thread generates data processing
jobs
These dynamic objects are used to model the users
behavior
28
Regional Centre Model

Complex Composite
Object

Simplified topology of the Centers
D
A
B
E
C
29
MONARC - Main Classes
30
Monitoring
31
Real Need for Flexible Monitoring Systems

It is important to measure monitor the Key
applications in a well defined test environment
and to extract the parameters we need for
modeling
Monitor the farms used today, and try to
understand how they work and simulate such
systems.
It requires a flexible monitoring system able to
dynamically add new parameters and provide
access to historical data
Interfacing monitoring tools to get the
parameters we need in simulations in a nearly
automatic way
MonALISA was designed and developed based on the
experience with the simulation problems.

32
EXAMPLES
33
FTP and NFS clusters

This examples evaluate the performance of a local
area network with a server and several worker
stations. The server stores events used by the
processing nodes.
NFS Example the server concurrently delivers
the events, one by one to the clients.
FTP Example the server sends a whole file with
events in a single transfer

34
FTP Cluster
50 CPU units x 2 Jobs per unit 100 events per
job, event size 1MB LAN bandwidth 1 Gbps,
servers effective bandwidth 60Mbps
35
NFS Cluster
36
Distributed Scheduling

Job Migration when a regional center is
assigned too many jobs, it sends a part of them
to other centers with more free resources
New job scheduler implemented, which supports
job migration, applying load balancing criteria

export()
export()
Regional Center
Jobs
export()
We tested different configurations, with 1, 2 and
4 regional centers, and with different numbers of
CPUs per regional center. The number of jobs
submitted is kept constant, the job arrival rate
varying during a day.
37
Distributed Scheduling (2)

Test Case
4 regional centers, 20 CPUs per center
average job processing time 3h, approx. 500 jobs
per day submitted in a center

Average processing time and CPU usage for 1, 2,
4, 6 centers
38
Distributed Scheduling (3)

similar with the previous example, but the jobs
are more complex, involving network transfers
centers connected in a chain configuration

Chain WAN connection
Every job submitted to a regional center needs an
amount of data located in that center. If the job
is exported to another center, would the benefits
be great enough to compensate the cost of the
data transfer?
39
Distributed Scheduling (4)
The average processing time significantly
increases when reducing the bandwidth and the
number of CPUs
The network transfers are more intense in the
centers from the middle of the chain (like
Caltech)
40
Distributed Scheduling (5)
41
Local Data Replication

Evaluates the performance improvements that can
be obtained by replicating data.
We simulated a regional center which has a
number of database servers, and another four
centers which host jobs that process the data on
those database servers
A better performance can be obtained if the data
from the servers is replicated into the other
regional centers

42
Local Data Replication (2)
43
WAN Data Replication

similar with the previous example, but now with
two central servers, each holding an equal amount
of replicated data, and eight satellite regional
centers, hosting worker jobs
a worker job will get a number of events from one
of the central regional centers (one event at a
time) and process them locally

workers choose the best server to get the data
from. They use a Replication Load balancing
service (knowing the load of the network and of
the servers) VS The server is chosen randomly
44
WAN Data Replication
Both servers have the same bandwidth and support
the same maximum load
Better average response time, total execution
time is smaller when taking decisions based on
load balancing
One server has half of the others bandwidth
and supports half of its maximum load
45
Summary

Modelling and understanding current systems,
their performance and limitations, is essential
for the design of the large scale distributed
processing systems. This will require continuous
iterations between modelling and monitoring
Simulation and Modelling tools must provide the
functionality to help in designing complex
systems and evaluate different strategies and
algorithms for the decision making units and the
data flow management.

http//monarc.cacr.caltech.edu/

Write a Comment

User Comments (0)