Title: Part I: Introduction
1What Mum Never Told Me about Parallel Simulation
Karim Djemame Informatics Research Lab. School
of Computing University of Leeds
2Plan of the Lecture
- Goals
- Learn about issues in the design and execution of
Parallel Discrete Event Simulation (PADS)
- Overview
- Discrete Event Simulation a Review
- Parallel Simulation a Definition
- Applications
- Synchonisation Algorithms
- Conservative
- Optimistic
- Synchronous
- Parallel Simulation Languages
- Performance Issues
- Conclusion
3Why Simulation?
- Mathematical models too abstract for complex
systems - Building real systems with multiple
configurations too expensive - Simulation is a good compromise!
4Discrete Event Simulation (DES)
- a DES system can be viewed as a collection of
simulated objects and a sequence of event
computations - Changes in state of the model occur at discrete
points in time - The passage of time is modelled using a
simulation clock - Event scheduling is the most well used
- provides locality in time each event describes
related actions that may all occur in a single
instant - The model maintains a list of events (Event List)
that - have been scheduled
- have not occurred yet
5Processing the Event List on a Uni-processor
Computer
An event contains two fields of information -
the event it represents (eg. arrival in a
queue) - time of occurrence time when the event
should happen - also timestamp
e1
e2
en
7
9
20
...
EVL
time
event
The event list - contains the events - is
always ordered by increasing occurrence of time
The events are processed sequentially by a
single processor
6Event-Driven Simulation Engine
(1)
(2)
(3)
- Remove 1st event (lowest time of occurrence)
from EVL - Execute corresponding event routine modify
state (S) accordingly - Based on new S, schedule new future events
7Why change? It s so simple!
- Models becomes larger and larger
- The simulation time is overwhelming or the
simulation is just untractable - Example
- parallel programs with millions of lines of
codes, - mobile networks with millions of mobile hosts,
- Networks with hundreds of complex switches,
routers - multicast model with thousands of sources,
- ever-growing Internet,
- and much more...
8Some Figures to Convince...
- ATM network models
- Simulation at the cell-level,
- 200 switches
- 1000 traffic sources, 50Mbits/s
- 155Mbits/s links,
- 1 simulation event per cell arrival.
More than 26 billions events to simulate 1
second! 30 hours if 1 event is processed in 1us
- simulation time increases as link speed
increases, - usually more than 1 event per cell arrival,
- how scalable is traditional simulation?
9Motivation for Parallel Simulation
- Sequential simulation very slow
- Sequential simulation does not exploit the
parallelism inherent in models - So why not use multiple processors ?
- Variety of parallel simulation protocols
- Availability of parallel simulation tools to
achieve a certain speedup over the sequential
simulator
10Processing the Event List on a Multi-Processor
Computer
- The events are processed by many processors.
Example
Time
Event 2
14
Event 3
9
7
Event 1
p1
p2
Processors
Processor 1 generates event 3 at 9 to be
processed by processor 2
Processor 2 has already processed event 2 at
14 Problem - the future can affect the past
! - this is the causality problem
11Causal Dependencies
- Scheduled events in timestamp order
e1, 7
e2, 9
e3, 14
e4, 20
e5, 27
e6, 40
EVL
- Sequence ordered by causal dependencies
e1, 7
e2, 9
e4, 20
e6, 40
EVL
e3, 14
e5, 27
- Causal dependencies mean restrictions
- The sequence of events (e1, e2, e4, e6) can be
executed in parallel with (e3, e5) - If any event were simulated with e1 violation
of causal dependencies
12Parallel Simulation - Principles
- Execution of a discrete event simulation on a
parallel or distributed system with several
physical processors - The simulation model is decomposed into several
sub-models (Logical Processes, LP) that can be
executed in parallel - spatial partitioning
- LPs communicate by sending timestamped messages
- Fundamental concepts
- each LP can be at a different simulation time
- local causality constraint events in each LP
must be executed in time stamp order
13Parallel Simulation example 1
14Parallel Simulation example 2
LP
LP
LP
LP
LP
- Logical processes (LPs) modelling airports, air
traffic sectors, aircraft, etc. - LPs interact by exchanging messages (events
modelling aircraft departures, landings, etc.)
15Synchronisation Mechanisms
- Synchronisation Algorithms
- Conservative avoids local causality violations
by waiting until it s safe to proceed a message
or event - Optimistic allows local causality violations but
provisions are done to recover from them at
runtime - Synchronous all LPs process messages/events with
the same timestamp in parallel
16PDES Applications
- VLSI circuit simulation
- Parallel computing
- Communication networks
- Combat scenarios
- Health care systems
- Road traffic
- Simulation of models
- Queueing networks
- Petri nets
- Finite state machines
17Conservative Protocols
- Architecture of a conservative LP
- The Chandy-Misra-Bryant protocol
- The lookahead ability
18Architecture of a Conservative LP
- LPs communicate by sending non-decreasing
timestamped messages - each LP keeps a static FIFO channel for each LP
with incoming communication - each FIFO channel (input channel, IC) has a clock
ci that ticks according to the timestamp of the
topmost message, if any, otherwise it keeps the
timestamp of the last message
19A Simple Conservative Algorithm
- each LP has to process event in time-stamp order
to avoid local causality violations
The Chandy-Misra-Bryant algorithm
while (simulation is not over) determine the
ICi with the smallest Ci if (ICi empty)
wait for a message else remove topmost
event from ICi process event
20Safe but Has to Block
LPB
LPA
LPC
LPD
IC1
3
6
10
IC2
1
4
7
5
IC3
9
21Blocks and Even Deadlocks!
A
merge point
S
M
BLOCKED
B
22How to Solve Deadlock Null-Messages
Use of null-messages for artificial propagation
of simulation time
A
S
10
10
10
M
UNBLOCKED
B
What frequency?
23How to Solve Deadlock Null-Messages
a null-message indicates a Lower Bound Time
Stamp minimum delay between links is 4 LP C
initially at simulation time 0
11
9
7
10
A
B
C
24The Lookahead Ability
- Null-messages are sent by an LP to indicate a
lower bound time stamp on the future messages
that will be sent - null-messages rely on the  lookahead ability
- communication link delays
- server processing time (FIFO)
- lookahead is very application model dependent and
need to be explicitly identified
25Conservative Pros Cons
- Pros
- simple, easy to implement
- good performance when lookahead is large
(communication networks, FIFO queue) - Cons
- pessimistic in many cases
- large lookahead is essential for performance
- no transparent exploitation of parallelism
- performances may drop even with small changes in
the model (adding preemption, adding one small
lookahead link)
26Optimistic Protocols
- Architecture of an optimistic LP
- Time Warp
27Architecture of an Optimistic LP
- LPs send timestamped messages, not necessarily in
non-decreasing time stamp order - no static communication channels between LPs,
dynamic creation of LPs is easy - each LP processes events as they are received, no
need to wait for safe events - local causality violations are detected and
corrected at runtime - Most well known optimistic mechanism Time Warp
28Processing Events as They Arrive
LPA
LPB
what to do with late messages?
LPC
LPD
LPA
29TimeWarp
30TimeWarp Rollback - How?
- Late messages (stragglers) are handled with a
rollback mechanism - undo false/uncorrect local computations,
- state saving save the state variables of an LP
- reverse computation
- undo false/uncorrect remote computations,
- anti-messages anti-messages and (real) messages
annihilate each other - process late messages
- re-process previous messages processed events
are NOT discarded!
31Need for a Global Virtual Time
- Motivations
- an indicator that the simulation time advances
- reclaim memory (fossil collection)
- Basically, GVT is the minimum of
- all LPs logical simulation time
- timestamp of messages in transit
- GVT garantees that
- events below GVT are definitive events
- no rollback can occur before the GVT
- state points before GVT can be reclaimed
- anti-messages before GVT can be reclaimed
32Time Warp - Overheads
- Periodic state savings
- states may be large, very large!
- copies are very costly
- Periodic GVT computations
- costly in a distributed architecture,
- may block computations,
- Rollback thrashing
- cascaded rollback, no advancement!
- Memory!
- memory is THE limitation
33Optimistic Mechanisms Pros Cons
- Pros
- exploits all the parallelism in the model,
lookahead is less important - transparent to the end-user
- can be general-purpose
- Cons
- very complex, needs lots of memory
- large overheads (state saving, GVT, rollbacks)
34Mixed/Adaptive Approaches
- General framework that (automatically) switches
to conservative or optimistic - Adaptive approaches may determine at runtime the
amount of conservatism or optimism
messages
35Synchronous Protocols
- Architecture of a synchronous LP
36Synchronous Protocols
TOUS pour UN et UN pour TOUS!
The Three Musketeers Alexandre Dumas (1802 1870)
37A Simple Synchronous Algorithm
- avoids local causality violations
- LP same data structures of a single sequential
simulator - Global clock shared among all LPS same value
- Some data structures are private
My min timestamp is 8
My min timestamp is 5
LPB
LPA
My min timestamp is 10
My min timestamp is 12
LPC
LPC
Global clock 5
38A Simple Synchronous Algorithm
Clock 0 while (simulation is not over) t
minimum_timestamp() clock global_minimum()
simulate_events(clock) synchronise()
Basic operations 1. Computation of Minimum
timestamp reduction operation 2. Event
Consumption 3. Message distribution 4. Message
Reception barrier operation
39Synchronous Mechanisms Pros Cons
- Pros
- simple, easy to implement
- good performance if parallelism exploited with a
moderate synchonisation cost - Cons
- pessimistic in many cases
- Worst case simulator behaves like the sequential
one - performance may drop if cost of LPs
synchronisation (reduction, barrier) is high
40PDES Languages
- PDES Simulation Languages
- a number of PDES languages have been developed
in recent years - PARSEC
- Compose
- ModSim
- etc
- Most of these languages are general purpose
languages - PARSEC
- Developed at UCLA Parallel Computing Lab.
- Availability - http//pcl.cs.ucla.edu/projects/par
sec/ - Simplicity
- Efficient event scheduling mechanism.
41Georgia Tech Time Warp (GTW)
- Optimistic discrete event simulator developed by
PADS group of Georgia Institute of Technology - http//www.cc.gatech.edu/computing/pads/tech-para
llel-gtw.html - Support small granularity simulation
- GTW runs on shared-memory multiprocessor
machines - Sun Enterprise, SGI Origin
- TeD Telecommunications Description Language
- language that has been developed mainly for
modeling telecommunicating network elements and
protocols - Jane simulator-independent Client/Server-based
graphical interface and scripting tool for
interactive parallel simulations - TeD/GTW simulations can be executed using the
Jane system
42BYOwS !
- BYOwS Build Your Own Simulator
- Choose a programming language
- C, C, Java
- Learn basic MPI
- MPI Message Passing Interface
- Point-to-Point Communication
- Available on the school Linux machines
- Implement a simple PDES protocol
- Case study a simple queueing network
43Parallel Simulation Today
- Lots of algorithms have been proposed
- variations on conservative and optimistic
- adaptives approaches
- Few end-users
- Compete with sequential simulators in terms of
user interface, generability, ease of use etc. - Research mainly focus on
- applications, ultra-large scale simulations
- tools and execution environments (clusters)
- Federated simulations
- different simulators interoperate with each other
in executing a single simulation - battle field simulation, distributed multi-user
games
44Parallel Simulation - Conclusion
- Pros
- reduction of the simulation time
- increase of the model size
- Cons
- causality constraints are difficult to maintain
- need of special mechanisms to synchronize the
different processors - increase both the model and the simulation kernel
complexity - Challenges
- ease of use, transparency.
45References
- Parallel simulation
- R. Fujimoto, Parallel and Distributed Simulation
Systems, John Wiley Sons, 2000 - R. Fujimoto, Parallel Discrete Event Simulation,
Communications of the ACM, Vol. 33(10), Oct. 90,
pp31-53 - Parallel Simulation Links http//www.cs.utsa.edu
/research/ParSim/