Corina Stratan, Ciprian Dobre UPB - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Corina Stratan, Ciprian Dobre UPB

Description:

... models and measured parameters on test bed systems for all the basic components, ... having different sizes / protocols. Network model ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 46
Provided by: ios80
Category:
Tags: upb | bed | ciprian | corina | dobre | sizes | stratan

less

Transcript and Presenter's Notes

Title: Corina Stratan, Ciprian Dobre UPB


1

MONARC Simulation Framework
  • Corina Stratan, Ciprian Dobre UPB
  • Iosif Legrand, Harvey Newman CALTECH

2
The GOALS of the Simulation Framework
  • The aim of this work is to continue and improve
    the development of the MONARC simulation
    framework
  • To perform realistic simulation and modelling of
    large scale distributed computing systems,
    customised for specific HEP applications.
  • To offer a dynamic and flexible simulation
    environment to be used as a design tool for
    large distributed systems
  • To provide a design framework to evaluate the
    performance of a range of possible computer
    systems, as measured by their ability to provide
    the physicists with the requested data in the
    required time, and to optimise the cost.

3
A Global View for Modelling

MONITORING
REAL Systems
Testbeds
4
Design Considerations
  • This Simulation framework is not intended to
    be a detailed simulator for basic components such
    as operating systems, data base servers or
    routers.
  • Instead, based on realistic mathematical models
    and measured parameters on test bed systems for
    all the basic components, it aims to correctly
    describe the performance and limitations of large
    distributed systems with complex interactions.

5
Simulation Engine

6
Design Considerations of the Simulation Engine
  • A process oriented approach for discrete event
    simulation is well suited to describe concurrent
    running programs.
  • Active objects (having an execution thread, a
    program counter, stack...) provide an easy way
    to map the structure of a set of distributed
    running programs into the simulation environment.
  • The Simulation engine supports an interrupt
    scheme
  • This allows effective correct simulation
    for concurrent processes with very different time
    scale by using a DES approach with a continuous
    process flow between events

7
The Simulation Engine Tasks and Events
Task for simulating an entity with time
dependent behavior (active object, server, )
Running
semaphore.v()
semaphore.p()
Assigned to worker thread
5 possible states for a task CREATED, READY,
RUNNING, FINISHED, WAITING
Finished
Created
Ready
Event happens or sleeping period is over
Waiting
Each task maintains an internal semaphore
necessary for switching between states.
Event - used for communication and
synchronization between tasks when a task must
notify another task about something that happened
or will happen in the future, it creates an event
addressed to that task. The events are queued
and sent to the destination tasks by the engines
scheduler.
8
Tests of the Engine
Processing a TOTAL of 100 000 simple jobs in
1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs
using the same number of parallel threads
more tests http//monarc.cacr.caltech.edu/
9
Basic Components
10
Basic Components
  • These Basic components are capable to simulate
    the core functionality for general distributed
    computing systems. They are constructed based on
    the simulation engine and are using efficiently
    the implementation of the interrupt functionality
    for the active objects .
  • These components should be considered the basic
    classes from which specific components can be
    derived and constructed

11
Basic Components
  • Computing Nodes
  • Network Links and Routers , IO protocols
  • Data Containers
  • Servers
  • Data Base Servers
  • File Servers (FTP, NFS )
  • Jobs
  • Processing Jobs
  • FTP jobs
  • Scripts Graph execution schemes
  • Basic Scheduler
  • Activities ( a time sequence of jobs )

12
Multitasking Processing Model

Concurrent running tasks share resources (CPU,
memory, I/O) Interrupt driven scheme For each
new task or when one task is finished, an
interrupt is generated and all processing times
are recomputed.
13
LAN/WAN Simulation Model
Link
Node
LAN
ROUTER
Internet Connections
Interrupt driven simulation for each new
message an interrupt is created and for all the
active transfers the speed and the estimated
time to complete the transfer are recalculated.
ROUTER
Continuous Flow between events ! An efficient and
realistic way to simulate concurrent transfers
having different sizes / protocols.
14
Network model
  • data traffic simulated for both local and wide
    area networks
  • a simulation at the packet level is practically
    impossible
  • we adopted a larger scale approach, based on an
    interrupt mechanism
  • Network Entity
  • LAN, WAN, LinkPort
  • main attribute bandwidth
  • keeps the evidence of the messages that traverse
    it

Components of the network model
15
Simulating the network transfers
  • interrupt mechanism similar with the one used for
    job execution simulation
  • the initial speed of a message is determined by
    evaluating the bandwidth that each entity on the
    route can offer
  • different network protocols can be modelled

Caltech WAN
CERN WAN
CERN Router
Caltech Router
INT
INT
Caltech LAN
INT
CERN LAN
Message1
Message2
newMessage
Message3
CPU
CPU
LinkPort
LinkPort
1. The route and the available bandwidth for the
new message are determined. 1. The messages on
the route are interrupted and their speeds are
recalculated.
16
Job Scheduling and Execution
Activity1 class Activity1 extends Activity
public void pushJobs() Job newJob
new Job () addJob(newJob)
CPU 1
CPU 2
Job 3 (30 CPU)
Activity2 class Activity2 extends Activity

Job 4 (30 CPU)
Job 5 (40 CPU)
1. The activity class creates a job and submits
it to the farm. 2. The job scheduler sends the
new job to a CPU unit. All the jobs executing on
that CPU are interrupted. 3. CPU power
reallocated on the unit where the new job was
scheduled. The interrupted jobs reestimate their
completion time.
CPU 3
INT
3
Job 6 (50 CPU)
INT
Job 7 (50 CPU)
17
Output of the simulation
Node
Simulation Engine
DB
Output Listener Filters
GRAPHICS
Router
Output Listener Filters
Log Files EXEL
User C
Any component in the system can generate generic
results objects Any client can subscribe with a
filter and will receive the results it is
Interested in . VERY SIMILAR structure as in
MonALISA . We will integrate soon The output of
the simulation framework into MonaLISA
18
Specific Components
19
Specific Components
  • These Components should be derived from the
    basic components and must implement the specific
    characteristics and way they will operate.
  • Major Parts
  • Data Model
  • Data Flow Diagrams from Production and
  • especially for Analysis Jobs
  • Scheduling / pre-allocation policies
  • Data Replication Strategies

20
Data Model
  • Generic Data
  • Container
  • Size
  • Event Type
  • Event Range
  • Access Count
  • INSTANCE

META DATA Catalog Replication Catalog
Network FILE
FILE
Data Base
Custom Data Server
FTP Server Node
DB Server
NFS Server
Export / Import
21
Data Model (2)
META DATA Catalog Replication Catalog
Data Processing JOB
Data Request
Data Container
Select from the options
JOB
List Of IO Transactions
22
Database Functionality
Automatic storage management example
  • Client-server model
  • Automatic storage management is possible, with
    data being sent to mass storage units

DatabaseServer
Mass Storage 1
DContainer 1
DContainer 20
DB1
myJob
DContainer 21
DContainer 2
DContainer 22
Mass Storage 2
DContainer 3
DContainer 23
DContainer 24
  • 3 kinds of requests for the database server
  • write
  • read
  • get (read the data and erase it from the server)

DContainer 15
DB2
DContainer 16

1. The job wants to write a container into the
database DB1, but the server is out of storage
space. 2. The least frequently used container is
moved to a mass storage unit. The new container
is written to the database.
23
Data Flow Diagrams for JOBS
Input and output is a collection of data. This
data is described by type and range
Input
Processing 1
Process is described by name
A fine granularity decomposition of processes
which can be executed independently and the way
they communicate can be very useful for
optimization and parallel execution !
Output
Input
Output
Processing 2
Processing 4
10x
Output
Output
Input
Input
Processing 3
Processing 4
Output
Input
24
Job Scheduling Centralized Scheme
Site A
Site B
JobScheduler
JobScheduler
Dynamically loadable module
GLOBAL Job Scheduler
25
Job Scheduling Distributed Scheme market
model
COST
Site A
Site B
JobScheduler
JobScheduler
Request
DECISION
JobScheduler
Site A
26
Computing Models
27
Activities Arrival Patterns

A flexible mechanism to define the Stochastic
process of how users perform data processing
tasks
Dynamic loading of Activity tasks, which are
threaded objects and are controlled by the
simulation scheduling mechanism
Physics Activities Injecting Jobs
Each Activity thread generates data processing
jobs
These dynamic objects are used to model the users
behavior
28
Regional Centre Model
  • Complex Composite
    Object

Simplified topology of the Centers
D
A
B
E
C
29
MONARC - Main Classes
30
Monitoring
31
Real Need for Flexible Monitoring Systems
  • It is important to measure monitor the Key
    applications in a well defined test environment
    and to extract the parameters we need for
    modeling
  • Monitor the farms used today, and try to
    understand how they work and simulate such
    systems.
  • It requires a flexible monitoring system able to
    dynamically add new parameters and provide
    access to historical data
  • Interfacing monitoring tools to get the
    parameters we need in simulations in a nearly
    automatic way
  • MonALISA was designed and developed based on the
    experience with the simulation problems.

32
EXAMPLES
33
FTP and NFS clusters
  • This examples evaluate the performance of a local
    area network with a server and several worker
    stations. The server stores events used by the
    processing nodes.
  • NFS Example the server concurrently delivers
    the events, one by one to the clients.
  • FTP Example the server sends a whole file with
    events in a single transfer

34
FTP Cluster
50 CPU units x 2 Jobs per unit 100 events per
job, event size 1MB LAN bandwidth 1 Gbps,
servers effective bandwidth 60Mbps
35
NFS Cluster
36
Distributed Scheduling
  • Job Migration when a regional center is
    assigned too many jobs, it sends a part of them
    to other centers with more free resources
  • New job scheduler implemented, which supports
    job migration, applying load balancing criteria

export()
export()
Regional Center
Jobs
export()
We tested different configurations, with 1, 2 and
4 regional centers, and with different numbers of
CPUs per regional center. The number of jobs
submitted is kept constant, the job arrival rate
varying during a day.
37
Distributed Scheduling (2)
  • Test Case
  • 4 regional centers, 20 CPUs per center
  • average job processing time 3h, approx. 500 jobs
    per day submitted in a center

Average processing time and CPU usage for 1, 2,
4, 6 centers
38
Distributed Scheduling (3)
  • similar with the previous example, but the jobs
    are more complex, involving network transfers
  • centers connected in a chain configuration

Chain WAN connection
Every job submitted to a regional center needs an
amount of data located in that center. If the job
is exported to another center, would the benefits
be great enough to compensate the cost of the
data transfer?
39
Distributed Scheduling (4)
The average processing time significantly
increases when reducing the bandwidth and the
number of CPUs
The network transfers are more intense in the
centers from the middle of the chain (like
Caltech)
40
Distributed Scheduling (5)
41
Local Data Replication
  • Evaluates the performance improvements that can
    be obtained by replicating data.
  • We simulated a regional center which has a
    number of database servers, and another four
    centers which host jobs that process the data on
    those database servers
  • A better performance can be obtained if the data
    from the servers is replicated into the other
    regional centers

42
Local Data Replication (2)
43
WAN Data Replication
  • similar with the previous example, but now with
    two central servers, each holding an equal amount
    of replicated data, and eight satellite regional
    centers, hosting worker jobs
  • a worker job will get a number of events from one
    of the central regional centers (one event at a
    time) and process them locally

workers choose the best server to get the data
from. They use a Replication Load balancing
service (knowing the load of the network and of
the servers) VS The server is chosen randomly
44
WAN Data Replication
Both servers have the same bandwidth and support
the same maximum load
Better average response time, total execution
time is smaller when taking decisions based on
load balancing
One server has half of the others bandwidth
and supports half of its maximum load
45
Summary
  • Modelling and understanding current systems,
    their performance and limitations, is essential
    for the design of the large scale distributed
    processing systems. This will require continuous
    iterations between modelling and monitoring
  • Simulation and Modelling tools must provide the
    functionality to help in designing complex
    systems and evaluate different strategies and
    algorithms for the decision making units and the
    data flow management.

http//monarc.cacr.caltech.edu/
Write a Comment
User Comments (0)
About PowerShow.com