PGENESIS Tutorial WAMBAMM 05 - PowerPoint PPT Presentation

About This Presentation
Title:

PGENESIS Tutorial WAMBAMM 05

Description:

Tips for Avoiding Deadlocks. Use lots of echo statements. Use barrier IDs. ... are organized in a master/worker fashion with one node (the master) directing ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 76
Provided by: genes9
Category:

less

Transcript and Presenter's Notes

Title: PGENESIS Tutorial WAMBAMM 05


1
PGENESIS TutorialWAM-BAMM 05
  • Greg Hood
  • Pittsburgh Supercomputing Center
  • Carnegie Mellon University

2
Are your models running too slowly?
  • In some situations PGENESIS can be used to speed
    them up
  • Partitioning a large network across processors
  • Running a large number of simulations
  • Not appropriate for
  • Large single-cell models (i.e., those with many
    compartments)

3
What is PGENESIS?
  • Library extension to GENESIS that supports
  • communication among multiple processes
  • so nearly everything available in GENESIS is
  • available in PGENESIS
  • Allows multiple processes to perform multiple
  • simulations in parallel
  • Allows multiple processes to work together
  • cooperatively on a single simulation
  • Runs on workstations or supercomputers
  • using the PVM or MPI message-passing
  • libraries


4
History
  • PGENESIS developed by Goddard and Hood at PSC
    (1993-1998)
  • Ported from PVM to MPI by Chukkpalli and Charman
    (NPACI, 2000), and also by Panchev (Sunderland,
    2003)
  • Current contact pgenesis_at_psc.edu


5
Tutorial Outline
  • What PGENESIS provides
  • Using PGENESIS for parallel parameter searching
  • Using PGENESIS for simulating large networks
    more quickly
  • Selecting appropriate parallel hardware
  • Strategies for development and testing

6
PGENESIS Functionality
7
How PGENESIS Runs in Parallel (1)
  • PVM-based PGENESIS typically one process starts
    and then spawns n-1 other processes
  • MPI-based PGENESIS all n processes are started
    simultaneously by the mpirun or mpiexec command

8
How PGENESIS Runs in Parallel (2)
  • For both PVM and MPI-based versions
  • mapping of processes to processors is nearly
    always 1 to 1
  • mapping of processes to processors is often 1 to
    1, but may be many to 1 during debugging
  • every process runs same script
  • this is not a real limitation

9
Nodes and Zones
  • Each process is referred to as a "node".
  • Nodes may be organized into "zones".
  • A node is fully specified by a numeric string of
    the form ltnodegt.ltzonegt.
  • Simulations within a zone are kept synchronized
    in simulation time.
  • Each node joins the parallel platform using the
    paron command.
  • Each node should gracefully terminate by calling
    paroff

10
Every node in its own zone
  • Simulations on each node are not coupled
    temporally.
  • Useful for parameter searching.
  • We refer to nodes as 0.0, 0.1, 0.2,

11
All nodes in one zone
  • Simulations on each node are coupled temporally.
  • Useful for large network models
  • Zone numbers can be omitted since we are dealing
    with only one zone we can thus refer to nodes as
    0, 1, 2,

12
Nodes have distinct namespaces
  • /elem1 on node 0 refers to an element on node 0
  • /elem1 on node 1 refers to an element on node
    1
  • To avoid confusion we recommend that you use
    distinct names for elements on different nodes
    within a zone.
  • The script writer (i.e., you) is responsible
    for partitioning a network model across nodes.

13
GENESIS Terminology
  • GENESIS Computer Science
  • Object Class
  • Element Object
  • Message Connection
  • Value Message

14
Who am I?
  • PGENESIS provides several functions that allow a
    script to determine its place in the overall
    parallel configuration
  • mynode - of this node in this zone
  • nnodes - of nodes in this zone
  • (all numbering starts at 0)
  • mytotalnode - of this node in platform
  • ntotalnodes - of nodes in platform
  • myzone - of this zone
  • nzones - of zones
  • npvmcpu - of processors in configuration
  • mypvmid - PVM task identifier for this node

15
Styles of Parallel Scripts
  • Symmetric Each node executes the same script
    commands in lock-step style (synchronized
    explicitly or implicitly).
  • Master/Worker One node (usually node 0)
    coordinates processing and issues commands to the
    other nodes.

16
Explicit Synchronization
  • barrier - causes thread to block until all nodes
    within the zone have reached the corresponding
    barrier
  • barrier -wait at default barrier
  • barrier 7 -wait at named barrier
  • barrier 7 100000 -timeout is 100000
    seconds
  • barrierall - causes thread to block until all
    nodes in all zones have reached the corresponding
    barrier
  • barrierall -wait at default barrier
  • barrierall 7 -wait at named barrier
  • barrierall 7 100000 -timeout is 100000 sec

17
Implicit Synchronization
  • Two commands implicitly execute a zone-wide
    barrier
  • step - implicitly causes the thread to block
    until all nodes within the zone are ready to step
    (this behavior can be disabled with setfield
    /post sync_before_step 0)
  • reset - implicitly causes the thread to block
    until all nodes have reset
  • These commands require that all nodes in the zone
    participate, thus the barrier.

18
Remote Function Calls (1)
  • An "issuing" node directs a procedure to run on
    an "executing" node.
  • Examples
  • some_function_at_2 params...
    some_function_at_all params... some_function_at_other
    s params... some_function_at_0.4 params...
    some_function_at_1,3,5 params...

19
Remote Function Calls (2)
  • Each remote function call causes the creation of
    a new thread on the executing node.
  • All parameters are evaluated on the issuing node.
  • Example if called from node 1,
    some_function_at_2 mynode will execute
    some_function 1 on node 2

20
Remote Function Calls (3)
  • When does the executing node actually perform the
    remote function call, since we don't use hardware
    interrupts?
  • While waiting at barrier or barrierall.
  • While waiting for its own remote operations to
    complete, e.g. func_at_node, raddmsg
  • When the simulator is sitting at the prompt
    waiting for user input.
  • When the executing script calls clearthread or
    clearthreads.

21
Threads
  • A thread is a single flow of control within a
    PGENESIS script being executed.
  • When a node starts, there is exactly one thread
    on it the thread for the script.
  • There may potentially be many threads per node.
    These are stacked up, with only the topmost
    actually executing at any moment.
  • clearthread yield to one thread awaiting
    execution (if one exists)
  • clearthreads yield to all threads awaiting
    execution

22
Asynchronous Calls (1)
  • The async command allows a script to dispatch an
    operation on a remote node without waiting for
    its completion.
  • Example
  • async some_function_at_2 params...

23
Asynchronous Calls (2)
  • One may wait for an async call to complete,
    either individually,
  • future async some_function_at_2 ...
  • ... // do some work locally
  • waiton future
  • or for an entire set
  • async some_function_at_2 ...
  • async some_function_at_5 ...
  • ...
  • waiton all

24
Asynchronous Calls (3)
  • Asynchronous calls may return a value.
  • Example
  • int future async myfunc_at_1 // start thread
    on node 1
    // do some work locally
  • int result waiton future // wait
    for thread's result
  • Thus the term "future" - it is a promise of a
    value some time in the future. waiton calls in
    that promise.

25
Asynchronous Calls (4)
  • async returns a value which is only to be used as
    the parameter of a waiton call, and waiton must
    only be called with such a value.
  • Remote function calls from a particular issuing
    node to a particular executing node are
    guaranteed to be performed in the sequence they
    were sent.
  • There is no guaranteed order among calls
    involving multiple issuing or executing nodes.

26
Advice about Barriers (1)
  • It is very easy to reach deadlock if barriers are
    not handled correctly. PGENESIS tries to warn you
    by printing a message that it is waiting at a
    barrier.
  • Examples of incorrect barrier usage
  • Each node executes barrier mynode
  • Each node executes barrier_at_all
  • A single node executes barrier_at_others
    barrier However async barrier_at_others
    barrier will work!

27
Advice about Barriers (2)
  • Guideline if your script is operating in the
    symmetric style (all nodes execute all
    statements), never use barrier_at_
  • If your script is operating in the master-worker
    style, master must ensure it calls a function on
    each worker that executes a barrier before the
    master itself enters the barrier
  • barrier async barrier_at_others
    will not work.

28
Commands for Network Creation
  • Several new commands permit the creation of
    "remote" (internode) messages
  • raddmsg /local_element /remote_element_at_2 \
  • SPIKE
  • rvolumeconnect /local_elements \
  • /remote_elements_at_2 \
  • -sourcemask ... -destmask ... \
  • -probability 0.5
  • rvolumedelay /local_elements -radial 10.0
  • rvolumeweight /local_elements -fixed 0.2
  • rshowmsg /local_elements

29
Tips for Avoiding Deadlocks
  • Use lots of echo statements.
  • Use barrier IDs.
  • Do not execute barriers remotely (e.g.,
    barrier_at_all).
  • Remember that step usually does an implicit
    barrier.
  • Have each node do its own step command, or have
    one controlling node do a step_at_all. (similarly
    for reset)
  • Do not use the stop command.
  • Keep things simple.

30
Motivation
  • Parallel control of setup can be hard.
  • Parallel control of simulation can be hard.
  • Debugging parallel scripts is hard.


31
How PGENESIS Fits into Schedule
  • Schedule controls the order in which GENESIS
    elements get updated.
  • At beginning of step, all internode data is
    transferred.
  • There will be equivalence to serial GENESIS only
    if remote messages do not pass from earlier to
    later elements in the schedule.

32
How PGENESIS Fits into Schedule
  • addtask Simulate /CLASSpostmaster -action
    PROCESS
  • addtask Simulate /CLASSbuffer -action
    PROCESS
  • addtask Simulate /CLASSprojection -action
    PROCESS
  • addtask Simulate /CLASSspiking -action
    PROCESS
  • addtask Simulate /CLASSgate -action
    PROCESS
  • addtask Simulate /CLASSsegmentCLASS!membran
    e\
  • CLASS!gateCLASS!concentration -action
    PROCESS
  • addtask Simulate /CLASSmembrane -action
    PROCESS
  • addtask Simulate /CLASShsolver -action
    PROCESS
  • addtask Simulate /CLASSconcentration \
  • -action
    PROCESS
  • addtask Simulate /CLASSdevice -action
    PROCESS
  • addtask Simulate /CLASSoutput -action
    PROCESS

33
Hello, world! for PGENESIS
  • Contents of file hello.g
  • paron parallel nodes 4 output hello.out
  • barrier 17
  • echo Hello from node mynode
  • barrier 18
  • paroff
  • Execute on four nodes with
  • pgenesis nox hello.g

34
Parameter Searching with PGENESIS
35
Model Characteristics
  • The following are prerequisites to use PGENESIS
    for optimization on a particular parameter
    searching problem
  • Model must be expressed in GENESIS.
  • Decide on the parameter set.
  • Have a way to evaluate the parameter set.
  • Have some range for each of the parameter values.
  • The evaluations over the parameter-space should
    be reasonably well-behaved.
  • Stopping criterion

36
Choose a Search Strategy
  • Genetic Search
  • Simulated Annealing
  • Monte Carlo (for very ill-behaved search spaces)
  • Nelder-Mead (for well-behaved search spaces)
  • Use as many constraints as you can to restrict
    the search space
  • Always do a sanity check on results

37
An Example Model
param2
  • We have a one compartment cell model of a spiking
    neuron. Dynamics are well-behaved.
  • Parameters are the conductances for the Na, Kdr,
    Ka, and KM channels. We know the conductance
    values to be in the range from 0.1 to 10.0 a
    priori.
  • We write spike times to a file, then compare this
    using a C function, spkcmp, to "experimental"
    data.
  • Stop when our match fitness exceeds 20.0

38
A Parallel Genetic Algorithm
  • We adopt a population-based approach as opposed
    to a generation-based one.
  • We will keep a fixed population "alive" and use
    the workers to evaluate the fitness of candidate
    individuals.
  • If a candidate turns out to be better than some
    member of the current population, then we replace
    the worst member of the current population
    with the new individual.

39
Mutations
  • Pick a member of the population at random.
  • Decide whether to do crossover according to the
    crossover probability. If we are doing crossover,
    pick another random member of the current
    population, and combine the "genes" of those
    individuals. If we aren't doing crossover, just
    copy the bits of the original individual.
  • Go through each bit of the bit string, and mutate
    it with some small probability.

40
Master/Worker Paradigm (1)
41
Master/Worker Paradigm (2)
  • All nodes in a separate zone.
  • Node 0.0 will control the search.
  • Nodes 0.1 through 0.n-1 will run the model and
    perform the evaluation.

42
Commands for Optimization
  • Typically these are organized in a master/worker
    fashion with one node (the master) directing the
    search, and all other nodes evaluating parameter
    sets. Remote function calls are useful in this
    context for
  • sending tasks to workers
  • async task_at_worker param1...
  • having workers return evaluations to master
  • return_result_at_master result

43
Main Script
  • paron -farm -silent 0 -nodes n_nodes \
  • -output o.out -executable nxpgenesis
  • barrierall
  • if (mytotalnode 0)
  • init_master
  • pb_search individuals population
  • else
  • init_worker
  • end
  • barrierall 7 1000000
  • paroff

44
Master Conducts the Search
  • function pb_search
  • ...
  • for (i 0 i lt individuals \
  • max_fitness lt stopping_criterion \
  • i i 1)
  • // pick random individual from population
  • // decide whether to do crossover mutation
  • // mutate bitstring
  • // assign this task to a worker
  • delegate_task (i)
  • end
  • finish
  • print_results
  • end

45
Master Conducts the Search
  • function delegate_task
  • ...
  • // send the parameters one by one
  • for (p 0 p lt parameters p p 1)
  • async set_param_at_0.try_node \
  • p getfield \
  • /paramsp bits
  • end
  • async worker_task_at_0.try_node index
  • clearthreads
  • ...
  • end

46
Worker Evaluates Individuals (1)
  • function worker_task (index)
  • compute_parameter_values
  • // determine that fitness value for
  • // this individual
  • fit evaluate
  • // return result to the master
  • return_result_at_0.0 mytotalnode \
  • index fit
  • end

47
Worker Evaluates Individuals (2)
  • function evaluate
  • float match, fitness
  • // first run the simulation
  • newsim getfield /params0 value \
  • getfield /params1 value \
  • getfield /params2 value \
  • getfield /params3 value runfI
  • call /out/sim_output_file FLUSH

48
Worker Evaluates Individuals (3)
  • // then find the simulated spike times
  • gen2spk sim_output_file delay \
  • current_duration total_duration
  • // then compare the simulated spike
  • // times with the experimental data match
    spkcmp real_spk_file \
  • sim_spk_file -pow1 0.4 -pow2 0.6 \
  • -msp 0.5 -nmp 200.0
  • fitness 1.0 / sqrt match return
    fitness
  • end

49
Master Integrates the Results
  • function return_result (node, index, fit)
  • ...
  • end

50
Comparison of Parallel Parameter Search with
Serial Parameter Search
  • GA scales fairly well
  • SA scales to a certain extent, but not as well as
    GA
  • paths through search space will be different, but
    if searches are successful, they will converge to
    the same result

51
Large Networks with PGENESIS
52
Parallel Network Creation
  • In parallel network creation make sure elements
    exist before connecting them up, e.g.
  • create_elements(...)
  • barrier
  • create_messages(...)

53
Goals of decomposition
  • Keep all processors busy all the time on useful
    work
  • Use as many processors as are available
  • Key concepts are
  • Load-balancing
  • Minimizing communication
  • Minimizing synchronization
  • Scalable decomposition
  • Parallel I/O

54
Load balancing
  • Attempt to parcel out the modeled cells such that
    each CPU takes the same amount of time to
    simulate one step
  • This is static load balancing - cells do not move
  • Dedicated access to the CPUs is required for
    effective decomposition
  • Easier if identically configured CPUs.
  • PGENESIS provides no automated load-balancing but
    there are some performance monitoring tools.

55
Minimizing communication
  • Put highly connected clusters of cells on the
    same PGENESIS node.
  • Think of each synapse with a presynaptic cell on
    a remote node as expensive.
  • The same network distributed among more nodes
    will result in more of these expensive synapses
    hence, more nodes can be counterproductive.
  • The time spent communicating can overwhelm the
    time spent computing.

56
Orient_tut Example

57
Non-scalable decomposition

orient1
58
Scalable decomposition (1)
  • Goal as the number of available processors
    grows, your model naturally partitions into finer
    divisions

59
Scalable decomposition (2)
orient2
60
Scalable decomposition (3)
  • To the extent that you can arrange your
    decomposition to scale with the number of
    processors, it is a very good idea to create the
    scripts using a function of the number of nodes
    anywhere that a node number must be explicitly
    specified.
  • E.g.
  • createmap /library/rec /retina/recplane \
  • NX / n_slices NY \
  • -delta SEPX SEPY \
  • -origin slice SEPX NX / n_slices 0

61
Scalable decomposition (4)
  • raddmsg is used to set up off-node messages.
  • E.g.
  • raddmsg /V1/vert/soma \
  • /output/vert_at_output_node \
  • SAVE io_index Vm
  • raddmsg /V1/vert/soma \
  • /xout/drawv/inputs_at_output_node \
  • ICOORDS io_index x y z
  • raddmsg /V1/vert/soma \
  • /xout/drawv/inputs_at_output_node \
  • IVAL1 io_index Vm

62
Scalable decomposition (5)
  • rvolumeconnect can be used to connect up a set
    of source elements to a set of destination
    elements on arbitrary nodes.
  • E.g.
  • rvolumeconnect /retina/recplane/rec/input \
  • /V1/horiz/soma/exc_syn_at_workers \
  • -relative \
  • -sourcemask box 0 0 0 1 1 0 \
  • -destmask box -2.4 V1_SEPX \
  • -0.6 V1_SEPY -5.0 V1_SEPZ \
  • 2.4 V1_SEPX 0.6 V1_SEPY \
  • 5.0 V1_SEPZ

63
Selecting Appropriate Parallel Hardware
64
Hardware for Parameter Searching
  • Fast processors
  • Network is not critical (100 Mbps suffices)
  • Departmental clusters or even clusters of
    workstations are adequate

65
Hardware for Network Models
  • Fast processors
  • Fast network
  • High bandwidth, low latency for message-passing
  • Options GigE, 10GigE, Infiniband, Myrinet,
    Quadrics
  • Critical factor for PGENESIS Is there an MPI
    library optimized for that network?
  • Nice to have latencies lt 10µs
  • Departmental clusters or supercomputers desirable

66
PGENESIS Installation
  • Install GENESIS on each machine in the
    configuration
  • Install MPI or PVM package
  • Run tests to make sure MPI or PVM works
  • Install PGENESIS
  • Test with Hello, world! script and then with
    examples (param, orient1, and orient2)

67
But I dont have access to a parallel machine
  • Computing cycles are available through the
    NSF-Funded Supercomputing Centers
  • Pittsburgh Supercomputing Center
    (http//www.psc.edu)
  • PGENESIS installed on 3000 processor Alpha
  • NPACI (http//www.npaci.edu)
  • Worked on MPI-based PGENESIS
  • Alliance (http//www.ncsa.uiuc.edu)
  • Grants of time are provided free-of-charge to
    U.S. researchers upon approval of a short proposal

68
Your simulations could be running here
  • 3000-processor Terascale computer at PSC
  • (6 Tflops)

or here
2000-processor Cray XT3 at PSC (10 Tflops)
69
Strategies for Development and Testing
70
Parallel Script Development/Testing (1)
  • 1. Develop single cell prototypes using serial
    GENESIS.
  • 2. (a) For network models, decide partitioning
    and develop scalable scripts. (b) For parameter
    searches, develop scripts to run and evaluate a
    single individual, and a scalable script that
    will control the search.
  • 3. Try out scripts on single processor using the
    minimum number of nodes.

71
Parallel Script Development/Testing (2)
  • 4. Try out scripts on single processor but
    increase the number of nodes.
  • 5. Try out scripts on small multiprocessor
    platform.
  • 6. Try out scripts on large multiprocessor
    platform.

72
Summary and Questions
73
Summary
  • PGENESIS is a GENESIS extension which can let you
    use multiple computers to
  • Perform large parameter searches much more
    quickly
  • Simulate large network models more quickly

74
References
  • http//www.psc.edu/ghood/wam-bamm-05/
  • Goddard, N.H. and Hood, G., Large-scale
    simulation using parallel GENESIS, The Book of
    GENESIS, 2nd ed., Bower, J.M. and Beeman, D.
    (Eds), Springer-Verlag, 1998.
  • Goddard, N.H. and Hood, G., Parallel Genesis for
    large scale modeling, Computational Neuroscience
    Trends in Research 1997, Plenum Publishing, NY,
    1997, p. 911-917.
  • Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
    Goddard, N., De Schutter, E., A large-scale model
    of the cerebellar cortex using PGENESIS,
    Neurocomputing, 32/33 (2000), p. 1041-1046.

75
Questions / Discussion
  • Parallelism will likely be integrated into
    GENESIS 3, not treated as an add-on package
  • If you have suggestions about what you would like
    to see in a parallel neural simulator, please
    contact me (ghood_at_psc.edu)
Write a Comment
User Comments (0)
About PowerShow.com