Parallel Simulations on High-Performance Clusters - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Simulations on High-Performance Clusters

Description:

To simulate is to reproduce the behavior of a physical ... Myrinet board with LANai 4.1, 256KB. BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP communication libraries ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 29
Provided by: PhamCo2
Category:

less

Transcript and Presenter's Notes

Title: Parallel Simulations on High-Performance Clusters


1
Parallel Simulations on High-Performance
Clusters
  • C.D. Pham
  • RESAM laboratory
  • Univ. Lyon 1, France
  • cpham_at_resam.univ-lyon1.fr

2
Outline
  • Backgrounds
  • Discrete Event Simulation (DES)
  • Parallel DES and the synchronization problems
  • The CSAM Tool
  • Architecture of the simulator kernel
  • The communication network model
  • Results
  • On mono-processor cluster
  • On multi-processor cluster

3
Simulation
  • To simulate is to reproduce the behavior of a
    physical system with a model
  • Practically, computers are used to numerically
    simulate a logical model
  • Simulations are used for performance evaluation
    and prediction of complex systems
  • fluids dynamic, chemistry reactions (continous)
  • communication network models routing, congestion
    avoidance, mobile (discrete)
  • Simulation is more flexible than analytical
    methods

4
Discrete Event Simulation (DES)
  • assumption that a system changes its state at
    discrete points in simulation time

a1
a2
a3
a4
d1
d2
d3
S1
S3
time-step
?t
0
2?t
3?t
4?t
5?t
6?t
5
DES concepts
  • fundamental concepts
  • system state (variables)
  • state transitions (events)
  • simulation time totally ordered set of values
    representing time in the system being modeled
  • the system state can only be modified upon
    reception of an event
  • modeling can be
  • event-oriented
  • process-oriented

6
Life cycle of a DES
  • a DES system can be viewed as a collec-tion of
    simulated objects and a sequence of event
    computations
  • each event computation contains a time stamp
    indicating when that event occurs in the physical
    system
  • each event computation may
  • modify state variables
  • schedule new events into the simulated future
  • events are stored in a local event list
  • events are processed in time stamped order
  • usually, no more event termination

7
A simple DES model
link model delay 5 send processing time
5 receive processing time 1 packet arrival P1
at 5, P2 at 12, P3 at 22
8
Why it works?
  • events are processed in time stamp order
  • an event at time t can only generate future
    events with timestamp greater or equal to t (no
    event in the past)
  • generated events are put and sorted in the event
    list, according to their timestamp
  • the event with the smallest timestamp is always
    processed first,
  • causality constraints are implicitly maintained.

9
Why change? It s so simple!
  • models becomes larger and larger
  • the simulation time is overwhelming or the
    simulation is just untractable
  • example
  • parallel programs with millions of lines of
    codes,
  • mobile networks with millions of mobile hosts,
  • ATM networks with hundreds of complex switches,
  • multicast model with thousands of sources,
  • ever-growing Internet,
  • and much more...

10
Some figures to convince...
  • ATM network models
  • Simulation at the cell-level,
  • 200 switches
  • 1000 traffic sources, 50Mbits/s
  • 155Mbits/s links,
  • 1 simulation event per cell arrival.

More than 26 billions events to simulate 1
second! 30 hours if 1 event is processed in 1us
  • simulation time increases as link speed
    increases,
  • usually more than 1 event per cell arrival,
  • how scalable is traditional simulation?

11
Parallel simulation - principles
  • execution of a discrete event simulation on a
    parallel or distributed system with several
    physical processors.
  • the simulation model is decomposed into several
    sub-models that can be executed in parallel
  • spacial partitioning,
  • temporel partitioning,
  • radically different from simple simulation
    replications.

12
Parallel simulation - pros cons
  • pros
  • reduction of the simulation time,
  • increase of the model size,
  • cons
  • causality constraints are difficult to maintain,
  • need of special mechanisms to synchronize the
    different processors,
  • increase both the model and the simulation kernel
    complexity.
  • challenges
  • ease of use, transparency.

13
Parallel simulation - example
14
A simple PDES model
local event list
15
Synchronization problems
  • fundamental concepts
  • each Logical Process (LP) can be at a different
    simulation time
  • local causality constraints events in each LP
    must be executed in time stamp order
  • synchronization algorithms
  • Conservative avoids local causality violations
    by waiting until it s safe
  • Optimistic allows local causality violations but
    provisions are done to recover from them at
    runtime

16
CSAM (Pham, UCBL)
  • CSAM Conservative Simulator for ATM network
    Model
  • Simulation at the cell-level
  • Conservative and/or sequential
  • C programming-style, predefined generic model
    of sources, switches, links
  • New models can be easily created by deriving from
    base classes
  • Configuration file that describes the topology

17
CSAM - Kernel characteristics
  • Exploits the lookahead of communication links
    transparent for the user
  • Virtual Input Channels
  • reduces overhead for event manipulation,
  • reduces overhead for null-messages handling.
  • Cyclic event execution
  • Message aggregation
  • static aggregation size,
  • asymmetric aggregation size on CLUMPS,
  • sender-initiated,
  • receiver-initiated.

18
CSAM - Life cycle
19
Test case 78-switch ATM network
Distance-Vector Routing with dynamic link cost
functions Connection setup, admission control
protocols
20
Why is it difficult?
  • Very small granularity 1 message represents 1
    cell tranfer
  • high level of message synchronisation
  • very small computation/communication ratio
  • Load imbalance between links
  • large number of control messages
  • partitioning and load balancing are difficult

21
CSAM - Some results...
Routing protocols reconfiguration time
22
CSAM - Some results...
23
Parallel Simulation on High Performance Clusters
  • Myrinet-based cluster of 12 Pentium Pro at
    200MHz, 64 MBytes, Linux
  • Myrinet-based cluster of 4 dual Pentium Pro
    450MHz, 128 Mbytes, Linux
  • Myrinet board with LANai 4.1, 256KB
  • BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP communication
    libraries

24
Speedup on a myrinet cluster
Pentium Pro 200MHz
More than 53 millions events to simulate 0.31s
25
Speedup with CLUMPS
Dual Pentium Pro 450MHz
26
Increasing the model size (CLUMPS)
Dual Pentium Pro 450MHz, 4x2 int
27
Speedup on SGI/Cray Origin 2000
28
Conclusions
  • Parallel Simulation is very sensitive to latency
  • High Performance Clusters is a good alternative
    to traditionnal massively parallel computer
  • CLUMPS architectures are very attractive as the
    price on the communication card can be cut in half
Write a Comment
User Comments (0)
About PowerShow.com