PGENESIS Tutorial GUM02 - PowerPoint PPT Presentation

About This Presentation
Title:

PGENESIS Tutorial GUM02

Description:

Home directory is located on shared filesystem. Installation: Complex. Heterogeneous cluster ... organized in a master/worker fashion with one node (the master) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 111
Provided by: genes9
Category:

less

Transcript and Presenter's Notes

Title: PGENESIS Tutorial GUM02


1
PGENESIS Tutorial GUM02
  • Greg Hood
  • Pittsburgh Supercomputing Center

2
What is PGENESIS?
  • Library extension to GENESIS that supports
  • communication among multiple processes
  • so nearly everything available in GENESIS is
  • available in PGENESIS
  • Allows multiple processes to perform multiple
  • simulations in parallel
  • Allows multiple processes to work together
  • cooperatively on a single simulation
  • Runs on workstations or supercomputers


3
History
  • PGENESIS developed by Goddard and Hood
  • at PSC (1993-1998)
  • Current contact pgenesis_at_psc.edu


4
Tutorial Outline
  • Installation
  • What PGENESIS provides
  • Using PGENESIS for parallel parameter searching
  • Using PGENESIS for simulating large networks
    more quickly
  • Scaling up for large runs
  • A comparison of PGENESIS with alternatives

5
PGENESIS Installation
6
Installation Requirements
  • At least 1 Unix-like computer on which GENESIS
    will run.
  • Same account name on all computers.
  • If multiple machines are to be used together,
    then it is best if they are all on the same
    network segment (e.g. same 100Mbit/s Ethernet
    switch).

7
Installation GENESIS
  • 1. Install regular (serial) GENESIS
  • Make sure you have configured serial GENESIS to
    include all libraries that you will ever want to
    use with PGENESIS.
  • b. make all make install
  • c. make nxall make nxinstall if you want an
    Xodus-less version of PGENESIS

8
Installation ssh
  • 2. Configure ssh to allow process startup across
    machines without password entry
  • You probably already have ssh/sshd. If not,
    download from http//www.openssh.org and install
    according to instructions.
  • Run ssh-keygen t rsa on each machine from which
    you will launch PGENESIS to generate
    private/public keys.
  • Append all of the public keys (stored in
    /.ssh/id_rsa.pub) to /.ssh/authorized_keys on
    all host on which you want to run PGENESIS
    processes.
  • Test ssh remote_host_name remote_command should
    not ask you for a password.

9
Installation PVM
  • 3. Install PVM message passing library
  • Download from http//www.csm.ornl.gov/pvm
  • Modify .bashrc to set PVM_ROOT to where PVM was
    installed
    export PVM_ROOT/usr/share/pvm3
  • Modify .bashrc to set PVM_RSH to the ssh
    executable
    export PVM_RSH/usr/bin/ssh
  • Build PVM (cd PVM_ROOT make)
  • Test PVM
  • pvm
  • pvmgt add otherhost
  • pvmgt halt

10
Installation PGENESIS
  • 3. Install PGENESIS package
  • Download from http//www.genesis-sim.org
  • cp Makefile.dist Makefile
  • Edit Makefile
  • make install
  • make nxinstall for Xodus-less version

11
Installation Simple
  • Cluster of similar machines
  • Shared filesystem
  • Home directory is located on shared filesystem

12
Installation Complex
  • Heterogeneous cluster
  • Novel processor/OS
  • No shared filesystems
  • Custom libraries linked into GENESIS
  • Recommended approach
  • Install on each machine independently and make
    sure PGENESIS works locally before trying to use
    all machines together

13
The "pgenesis" Startup Script (1)
  • Purpose checks that the proper PVM files are in
    place, starts the PVM daemon, then starts the
    appropriate PGENESIS executable.
  • Basic syntax
  • pgenesis scriptname.g

14
The "pgenesis" Startup Script (2)
  • Options
  • -config ltfilenamegt where ltfilenamegt
    contains a list of hosts to use
  • -debug ltmodegt where ltmodegt is
  • one of the following tty dbx gdb
  • -nox do not use Xodus
  • -v verbose mode
  • -help list the valid pgenesis
  • script flags

15
PGENESIS Functionality
16
How PGENESIS Runs in Parallel
  • Workstation typically one process starts and
    then spawns n-1 other processes
  • mapping of processes to processors is often 1 to
    1, but may be many to 1 during debugging

17
How PGENESIS Runs in Parallel
  • Massively parallel machines
  • all n processes are started simultaneously by
    the operating system
  • mapping of processes to processors is nearly
    always 1 to 1
  • On both
  • every process runs same script
  • this is not a real limitation

18
Nodes and Zones
  • Each process is referred to as a "node".
  • Nodes may be organized into "zones".
  • A node is fully specified by a numeric string of
    the form ltnodegt.ltzonegt.
  • Simulations within a zone are kept synchronized
    in simulation time.
  • Each node joins the parallel platform using the
    paron command.
  • Each node should gracefully terminate by calling
    paroff

19
Every node in its own zone
  • Simulations on each node are not coupled
    temporally.
  • Useful for parameter searching.
  • We refer to nodes as 0.0, 0.1, 0.2,

20
All nodes in one zone
  • Simulations on each node are coupled temporally.
  • Useful for large network models
  • Zone numbers can be omitted since we are dealing
    with only one zone we can thus refer to nodes as
    0, 1, 2,

21
Hybrid schemes
  • Parameter searching on large network models
  • Example The network is partitioned over 8 nodes
    we run 16 simulations in parallel to do parameter
    searching on this model, thus using a total of
    128 nodes.

22
Nodes have distinct namespaces
  • /elem1 on node 0 refers to an element on node 0
  • /elem1 on node 1 refers to an element on node
    1
  • To avoid confusion we recommend that you use
    distinct names for elements on different nodes
    within a zone.

23
GENESIS Terminology
  • GENESIS Computer Science
  • Object Class
  • Element Object
  • Message Connection
  • Value Message

24
Who am I?
  • PGENESIS provides several functions that allow a
    script to determine its place in the overall
    parallel configuration
  • mytotalnode - of this node in platform
  • mynode - of this node in this zone
  • myzone - of this zone
  • ntotalnodes - of nodes in platform
  • nnodes - of nodes in this zone
  • nzones - of zones
  • npvmcpu - of processors in configuration
  • mypvmid - PVM task identifier for this node
    (all numbering starts at 0)

25
Styles of Parallel Scripts
  • Symmetric Each node executes the same script
    commands.
  • Master/Worker One node (usually node 0)
    coordinates processing and issues commands to the
    other nodes.

26
Explicit Synchronization
  • barrier - causes thread to block until all nodes
    within the zone have reached the corresponding
    barrier
  • barrier -wait at default barrier
  • barrier 7 -wait at named barrier
  • barrier 7 100000 -timeout is 100000
    seconds
  • barrierall - causes thread to block until all
    nodes in all zones have reached the corresponding
    barrier
  • barrierall -wait at default barrier
  • barrierall 7 -wait at named barrier
  • barrierall 7 100000 -timeout is 100000 sec

27
Implicit Synchronization
  • Two commands implicitly execute a zone-wide
    barrier
  • step - implicitly causes the thread to block
    until all nodes within the zone are ready to step
    (this behavior can be disabled with setfield
    /post sync_before_step 0)
  • reset - implicitly causes the thread to block
    until all nodes have reset
  • These commands require that all nodes in the zone
    participate, thus the barrier.

28
Remote Function Calls (1)
  • An "issuing" node directs a procedure to run on
    an "executing" node.
  • Examples
  • some_function_at_2 params... some_function_at_all
    params... some_function_at_others params...
    some_function_at_0.4 params... some_function_at_1,3,5
    params...

29
Remote Function Calls (2)
  • Each remote function call causes the creation of
    a new thread on the executing node.
  • All parameters are evaluated on the issuing node.
  • Example if called from node 1,
    some_function_at_2 mynode will execute
    some_function 1 on node 2

30
Remote Function Calls (3)
  • When does the executing node actually perform the
    remote function call, since we don't use hardware
    interrupts?
  • While waiting at barrier or barrierall.
  • While waiting for its own remote operations to
    complete, e.g. func_at_node, raddmsg
  • When the simulator is sitting at the prompt
    waiting for user input.
  • When the executing script calls clearthread or
    clearthreads.

31
Threads
  • A thread is a single flow of control within a
    PGENESIS script being executed.
  • When a node starts, there is exactly one thread
    on it the thread for the script.
  • There may potentially be many threads per node.
    These are stacked up, with only the topmost
    actually executing at any moment.
  • clearthread yield to one thread awaiting
    execution (if one exists)
  • clearthreads yield to all threads awaiting
    execution

32
Asynchronous Calls (1)
  • The async command allows a script to dispatch an
    operation on a remote node without waiting for
    its completion.
  • Example
  • async some_function_at_2 params...

33
Asynchronous Calls (2)
  • One may wait for an async call to complete,
    either individually,
  • future async some_function_at_2 ...
  • ... // do some work locally
  • waiton future
  • or for an entire set
  • async some_function_at_2 ...
  • async some_function_at_5 ...
  • ...
  • waiton all

34
Asynchronous Calls (3)
  • Asynchronous calls may return a value.
  • Example
  • int future async myfunc_at_1 // start thread
    on node 1
    // do some work locally
  • int result waiton future // wait
    for thread's result
  • Thus the term "future" - it is a promise of a
    value some time in the future. waiton calls in
    that promise.

35
Asynchronous Calls (4)
  • async returns a value which is only to be used as
    the parameter of a waiton call, and waiton must
    only be called with such a value.
  • Remote function calls from a particular issuing
    node to a particular executing node are
    guaranteed to be performed in the sequence they
    were sent.
  • There is no guaranteed order among calls
    involving multiple issuing or executing nodes.

36
Advice about Barriers (1)
  • It is very easy to reach deadlock if barriers are
    not handled correctly. PGENESIS tries to warn you
    by printing a message that it is waiting at a
    barrier.
  • Examples of incorrect barrier usage
  • Each node executes barrier mynode
  • Each node executes barrier_at_all
  • A single node executes barrier_at_others
    barrier However async barrier_at_others
    barrier will work!

37
Advice about Barriers (2)
  • Guideline if your script is operating in the
    symmetric style (all nodes execute all
    statements), never use barrier_at_
  • If your script is operating in the master-worker
    style, master must ensure it calls a function on
    each worker that executes a barrier before it
    enters the barrier
  • barrier async barrier_at_others
    will not work.

38
Commands for Network Creation
  • Several new commands permit the creation of
    "remote" (internode) messages
  • raddmsg /local_element /remote_element_at_2 \
  • SPIKE
  • rvolumeconnect /local_elements \
  • /remote_elements_at_2 \
  • -sourcemask ... -destmask ... \
  • -probability 0.5
  • rvolumedelay /local_elements -radial 10.0
  • rvolumeweight /local_elements -fixed 0.2
  • rshowmsg /local_elements

39
Parallel I/O Display
  • How can one display from more than one node?
  • Use an xview object.
  • Add an index field to the displayed elements.
  • Use the ICOORDS and IVAL1 ... IVAL5 messages
    instead of the COORDS and VAL1 .. VAL5 messages
  • raddmsg /src_elems /xview_elem_at_0 \ ICOORDS
    io_index_field x y z
  • raddmsg /src_elems /xview_elem_at_0 \ IVAL1
    io_index_field Vm

40
Interaction with Xodus
  • Xodus introduces another degree of parallelism
    via the X11 event processing mechanism.
    PGENESIS periodically instructs the X Server to
    process any X events. Some of those events may
    result in some script code being run.
  • Race condition processing order is
    unpredictable.
  • Safe 1 ensure all affected nodes are at a
    barrier (or equivalent)
  • Safe 2 ensure mouse/keyboard events do not
    cause remote operations that require the
    participation of another node.

41
Parallel I/O Writing a File
  • How can one write a file from more than one node?
  • Use a par_asc_file or par_disk_out object.
  • Add an index field to the source elements.
  • raddmsg /src_elems \ /par_asc_file_elem_at_0 \
  • SAVE io_index_field Vm

42
Tips for Avoiding Deadlocks
  • Use lots of echo statements.
  • Use barrier IDs.
  • Do not execute barriers remotely (e.g.,
    barrier_at_all).
  • Remember that step usually does an implicit
    barrier.
  • Have each node do its own step command, or have
    one controlling node do a step_at_all. (similarly
    for reset)
  • Do not use the stop command.
  • Keep things simple.

43
Motivation
  • Parallel control of setup can be hard.
  • Parallel control of simulation can be hard.
  • Debugging parallel scripts is hard.


44
How PGENESIS Fits into Schedule
  • Schedule controls the order in which GENESIS
    objects get updated.
  • At beginning of step, all internode data is
    transferred.
  • There will be equivalence to serial GENESIS only
    if remote messages do not pass from earlier to
    later elements in the schedule.

45
How PGENESIS Fits into Schedule
  • addtask Simulate /CLASSpostmaster -action
    PROCESS
  • addtask Simulate /CLASSbuffer -action
    PROCESS
  • addtask Simulate /CLASSprojection -action
    PROCESS
  • addtask Simulate /CLASSspiking -action
    PROCESS
  • addtask Simulate /CLASSgate -action
    PROCESS
  • addtask Simulate /CLASSsegmentCLASS!membran
    e\
  • CLASS!gateCLASS!concentration -action
    PROCESS
  • addtask Simulate /CLASSmembrane -action
    PROCESS
  • addtask Simulate /CLASShsolver -action
    PROCESS
  • addtask Simulate /CLASSconcentration \
  • -action
    PROCESS
  • addtask Simulate /CLASSdevice -action
    PROCESS
  • addtask Simulate /CLASSoutput -action
    PROCESS

46
Adding Custom "C" Code
  • Uses
  • data analysis
  • interfacing
  • custom objects
  • PGENESIS allows user's custom libraries to be
    linked in, similarly to GENESIS
  • We recommend that you first incorporate your
    custom library into serial GENESIS, before trying
    to use it with PGENESIS.

47
Modifiable Parameters
  • /post/sync_before_step boolean (default 1)
  • /post/remote_info boolean (default 1) enables
    rshowmsg
  • /post/perfmon boolean (default 0) enables
    performance monitoring
  • /post/msg_hang_time float (default 120.0)
    seconds before giving up on remote operation
  • /post/pvm_hang_time float (default 3.0) seconds
    between printing dots while waiting for a message
  • /post/xupdate_period float (default 0.01)
    seconds between checking for X events when at
    barrier

48
Limitations of PGENESIS
  • No rplanarweight, rplanardelay use
    corresponding 3-D routines rvolumeweight,
    rvolumedelay
  • Cannot delete remote messages
  • getsyncount, getsynindex, getsyndest no longer
    return the correct values.

49
Parameter Searching with PGENESIS
50
Model Characteristics
  • The following are prerequisites to use PGENESIS
    for optimization on a particular parameter
    searching problem
  • Model must be expressed in GENESIS.
  • Decide on the parameter set.
  • Have a way to evaluate the parameter set.
  • Have some range for each of the parameter values.
  • The evaluations over the parameter-space should
    be reasonably well-behaved.
  • Stopping criterion

51
Trivial Model
  • Rather than do a simulation, we will just
    optimize a function f of four parameters a, b, c,
    and d
  • f(a, b, c, d) 10.0 (a-1)(a-1)
    (b-2)(b-2) (c-3)(c-3)
    (d-4)(d-4)
  • Evaluation of the model
  • fitness f(a, b, c, d)
  • Range of parameters -10 lt a,b,c,d lt 10
  • Evaluation is definitely well-behaved.
  • Stopping criterion Stop after 1000 individuals.

52
Master/Worker Paradigm (1)
53
Master/Worker Paradigm (2)
  • All nodes in a separate zone.
  • Node 0.0 will control the search.
  • Nodes 0.1 through 0.n-1 will run the model and
    perform the evaluation.

54
Commands for Optimization
  • Typically these are organized in a master/worker
    fashion with one node (the master) directing the
    search, and all other nodes evaluating parameter
    sets. Remote function calls are useful in this
    context for
  • sending tasks to workers
  • async task_at_worker param1...
  • having workers return evaluations to master
  • return_result_at_master result

55
Choose a Search Strategy
  • Genetic Search
  • Simulated Annealing
  • Monte Carlo (for very ill-behaved search spaces)
  • Nelder-Mead (for well-behaved search spaces)
  • Use as many constraints as you can to restrict
    the search space
  • Always do a sanity check on results

56
A Parallel Genetic Algorithm
  • We adopt a population-based approach as opposed
    to a generation-based one.
  • We will keep a fixed population "alive" and use
    the workers to evaluate the fitness of candidate
    individuals.
  • If a candidate turns out to be better than some
    member of the current population, then we replace
    the worst member of the current population
    with the new individual.

57
Parameter Representation
  • We represent the set of parameters that define an
    individual as a string of bits. Each 16-bit
    string (one "gene") is interpreted as a signed
    integer and then divided by 1000.0 to yield the
    floating point value. To generate a new
    candidate from the existing population
  • Pick a member of the population at random.
  • Go through each bit of the bit string, and mutate
    it with some small probability.

58
Main Script
  • paron -farm -silent 0 -nodes n_nodes \
  • -output o.out -executable nxpgenesis
  • barrierall
  • if (mytotalnode 0)
  • search
  • end
  • barrierall 7 1000000
  • paroff
  • quit

59
Master Conducts the Search
  • function search
  • int i
  • init_search
  • init_farm
  • for (i 0 i lt individuals i i 1)
  • if (i lt population)
  • init_individual
  • else
  • mutate_individual rand 0
    actual_population
  • end
  • delegate_task i bs_a bs_b bs_c bs_d
  • end
  • finish
  • end

60
Master Conducts the Search
  • function delegate_task
  • while (1)
  • if (free_index gt 0)
  • async worker_at_0.getfield \
  • /freefree_index value \
  • bs_a bs_b bs_c bs_d
  • free_index free_index - 1
  • return
  • else
  • clearthreads
  • end
  • end
  • end

61
Workers Evaluate Individuals
  • function worker (bs_a, bs_b, bs_c, bs_d)
  • int bs_a, bs_b, bs_c, bs_d
  • float a, b, c, d, fit
  • a (bs_a - 32768.0) / 1000.0
  • b (bs_b - 32768.0) / 1000.0
  • c (bs_c 32768.0) / 1000.0
  • d (bs_d 32768.0) / 1000.0
  • fit evaluate a b c d
  • return_result_at_0.0 mytotalnode bs_a bs_b \
  • bs_c bs_d fit
  • end

62
Workers Evaluate Individuals
  • function evaluate (a, b, c, d)
  • float a, b, c, d, fit
  • fit 10.0 (a-1)(a-1) (b-2)(b-2) \
  • - (c-3)(c-3) (d-4)(d-4)
  • return fit
  • end

63
Master Integrates the Results (1)
  • function return_result (node, bs_a, bs_b, bs_c,
    bs_d, fit)
  • int node, bs_a, bs_b, bs_c, bs_d
  • float fit
  • if (actual_population lt population)
  • least_fit actual_population
  • min_fitness -1e10
  • actual_population actual_population 1
  • end

64
Master Integrates the Results (2)
  • if (fit gt min_fitness)
  • setfield /populationleast_fit fitness
    fit
  • setfield /populationleast_fit a_value
    bs_a
  • setfield /populationleast_fit b_value
    bs_b
  • setfield /populationleast_fit c value
    bs_c
  • setfield /populationleast_fit d value
    bs_d
  • if (actual_population population) recompute
    _fitness_extremes
  • end
  • end
  • free_index free_index 1
  • setfield /freefree_index value node
  • end

65
A More Realistic Model
  • We have a one compartment cell model of a spiking
    neuron. Dynamics are probably well-behaved.
  • Parameters are the conductances for the Na, Kdr,
    Ka, and KM channels. We know the conductance
    values to be in the range from 0.1 to 10.0 a
    priori.
  • We write spike times to a file, then compare this
    using a C function, spkcmp, to "experimental"
    data.
  • Stop when our match fitness exceeds 20.0

66
Improved Parameter Representation
  • As before, we still represent the set of
    parameters that define an individual as a string
    of bits. However, now each 16-bit string will
    logarithmically map into the range of 0.1 to 10.0
    so that we will have increased resolution at the
    low end of the scale.

67
Crossover Mutations
  • Pick a member of the population at random.
  • Decide whether to do crossover according to the
    crossover probability. If we are doing crossover,
    pick another random member of the current
    population, and combine the "genes" of those
    individuals. If we aren't doing crossover, just
    copy the bits of the original individual.
  • Go through each bit of the bit string, and mutate
    it with some small probability.

68
Main Script (1)
  • int n_nodes 4
  • int individuals 1000
  • int population 10
  • float stopping_criterion 20.0
  • float crossover_prob 0.5
  • float bit_mutation_prob 0.02

69
Main Script (2)
  • include population.g // functions for GA
  • // population-based
  • // parameter searches
  • // model-specific files
  • include siminit.g // defines parameters of
  • // simulation
  • include fI.g // sets up table of currents
  • include channels.g // defines the channels
  • include simcell.g // functions to load in the
  • // cell model
  • include eval.g // functions to evaluate the
    model

70
Main Script (3)
  • paron -farm -silent 0 -nodes n_nodes \
  • -output o.out -executable nxpgenesis
  • barrierall
  • if (mytotalnode 0)
  • init_master
  • pb_search individuals population
  • else
  • init_worker
  • end
  • barrierall 7 1000000
  • paroff

71
Parameters Are Customizable
  • function init_params
  • setfield /params0 label "Na" scaling "log
  • setfield /params0 min_value 0.1 max_value 10.0
  • setfield /params1 label "Kdr" scaling "log
  • setfield /params1 min_value 0.1 max_value 10.0
  • setfield /params2 label "Ka" scaling "log
  • setfield /params2 min_value 0.1 max_value 10.0
  • setfield /params3 label "KM" scaling "log
  • setfield /params3 min_value 0.1 max_value 10.0
  • end

72
Worker Evaluates Individuals (1)
  • function evaluate
  • float match, fitness
  • // first run the simulation
  • newsim getfield /params0 value \
  • getfield /params1 value \
  • getfield /params2 value \
  • getfield /params3 value runfI
  • call /out/sim_output_file FLUSH

73
Worker Evaluates Individuals (2)
  • // then find the simulated spike times
  • gen2spk sim_output_file delay \
  • current_duration total_duration
  • // then compare the simulated spike
  • // times with the experimental data match
    spkcmp real_spk_file \
  • sim_spk_file -pow1 0.4 -pow2 0.6 \
  • -msp 0.5 -nmp 200.0
  • fitness 1.0 / sqrt match return
    fitness
  • end

74
Tuning Search
  • representation
  • parameter selection
  • generation vs population-based approach
  • generation/population size
  • crossover probability
  • crossover method
  • mutation probability
  • initial ranges

75
Large Networks with PGENESIS
76
Parallel Network Creation
  • In parallel network creation make sure elements
    exist before connecting them up, e.g.
  • create_elements(...)
  • barrier
  • create_messages(...)

77
Goals of decomposition
  • Keep all processors busy all the time on useful
    work
  • Use as many processors as are available
  • Key concepts are
  • Load-balancing
  • Minimizing communication
  • Minimizing synchronization
  • Scalable decomposition
  • Parallel I/O

78
Load balancing
  • Attempt to parcel out the modeled cells such that
    each CPU takes the same amount of time to
    simulate one step
  • This is static load balancing - cells do not move
  • Dedicated access to the CPUs is required for
    effective decomposition
  • Easier if identically configured CPUs.
  • PGENESIS provides no automated load-balancing but
    there are some performance monitoring tools.

79
Minimizing communication
  • Put highly connected clusters of cells on the
    same PGENESIS node.
  • Think of each synapse with a presynaptic cell on
    a remote node as expensive.
  • The same network distributed among more nodes
    will result in more of these expensive synapses
    hence, more nodes can be counterproductive.
  • The time spent communicating can overwhelm the
    time spent computing.

80
Orient_tut Example

81
Non-scalable decomposition

82
Scalable decomposition (1)
  • Goal as the number of available processors
    grows, your model naturally partitions into finer
    divisions

83
Scalable decomposition (2)
84
Scalable decomposition (3)
  • To the extent that you can arrange your
    decomposition to scale with the number of
    processors, it is a very good idea to create the
    scripts using a function of the number of nodes
    anywhere that a node number must be explicitly
    specified.
  • E.g., createmap /library/rec /retina/recplane \
  • REC_NX / n_slices REC_NY \
  • -delta REC_SEPX REC_SEPY \
  • -origin -REC_NX REC_SEPX / 2 \
  • slice REC_SEPX
    REC_NX / n_slices \ -REC_NY REC_SEPY / 2

85
Case Study Cerebellar Model
  • Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
    Goddard, N., De Schutter, E., A large-scale
    model of the cerebellar cortex using PGENESIS,
    Neurocomputing, 32/33 (2000), p. 1041-1046.
  • 16 Purkinje cells embedded in an environment of
    other simpler, but more numerous cells
  • Simulated on 128 processors of PSCs Cray T3E

86
Cell Populations Connectivity
87
3-D Representation of Network
88
Model Partitioning
89
Timings on 128 Processors of T3E
90
Timings vs. Model Size
91
Timings on Workstation Network
92
Significant Overhead on Cluster
93
Scaling Up
94
Getting Cycles
  • NSF-Funded Supercomputing Centers
  • Pittsburgh Supercomputing Center
    (http//www.psc.edu)
  • PGENESIS installed on 512 processor Cray T3E
  • NPACI (http//www.npaci.edu)
  • Worked on MPI-based PGENESIS
  • Alliance (http//www.ncsa.uiuc.edu)

95
The High End
  • 3000-processor Terascale computer at PSC

96
Parallel Script Development
  • 1. Develop single cell prototypes using serial
    GENESIS.
  • 2. (a) For network models, decide partitioning
    and develop scalable scripts. (b) For parameter
    searches, develop scripts to run and evaluate a
    single individual, and a scalable script that
    will control the search.
  • 3. Try out scripts on single processor using the
    minimum number of nodes.

97
Parallel Script Development
  • 4. Try out scripts on single processor but
    increase the number of nodes.
  • 5. Try out scripts on small multiprocessor
    platform.
  • 6. Try out scripts on large multiprocessor
    platform.

98
Resource Limits and Other Tips
  • On the Cray T3E set PVM_SM_POOL to ensure
    adequate PVM buffer space. This should be set to
    the maximum number of messages that might arrive
    at any PE before it gets a chance to process
    them.
  • On other machines, you may need to set PVMBUFSIZE
    to address similar issues.
  • When debugging interactively, set the timeout so
    that other nodes do not timeout
  • setfield /post msg_hang_time 10000.0

99
Reducing Synchronization Delay
  • In network models, axonal delays L are large
    compared to the simulation time step.
  • A spike generated at simulation time T on one
    node need not be physically delivered to the
    destination synapse on another node until
    simulation time TL.
  • PGENESIS can use this to reduce unnecessary
    waiting. Node B can get ahead of node A by the
    minimum of all the axonal delays on the
    connections from cells on A to synapses on B.
    This is called the lookahead of B with respect to
    A.
  • You must set /post/sync_before_step to 0 for this
    to allow this looser synchronization.

100
Reducing Synchronization Delay
  • A goal when you are partitioning a network across
    nodes is to make the lookahead between any pair
    of nodes large.
  • PGENESIS provides the setlookahead command for
    you to inform it of the lookahead between nodes
  • setlookahead 0.01 // sets lookahead to 10 mS
  • setlookahead 3 0.01 // sets lookahead to 10 mS
    w.r.t. node 3
  • The getlookahead command reports the current
    setting with respect to a particular node, and
    the showlookahead command reports the minimum
    lookahead to all other nodes
  • getlookahead 3 // gets lookahead with respect to
    node 3
  • showlookahead // get lookahead with respect to
    all nodes

101
Parallel I/O
  • Currently the I/O facilities (disk elements and
    Xodus elements) are tightly synchronized with the
    simulation (no lookahead). Therefore sending
    messages to Xodus objects or disk objects on
    remote nodes usually slows the simulation to a
    crawl. Use Xodus only for post-processing.
  • Try to arrange input and output to be via local
    elements. On workstations it is preferable to
    access local disk. If access is via a shared file
    system (e.g., NFS, AFS), use different output
    disk files for different nodes, and amalgamate
    the data after the simulation is over.

102
Performance Monitoring (1)
  • A script can turn on performance monitoring with
    setfield /post perfmon 1 and turns it off with
    setfield /post perfmon 0
  • Whenever performance monitoring is active, the
    categories listed below accumulate time.
  • To ignore the time involved in construction of a
    model, do not activate performance monitoring
    until just prior to the first simulation step.
  • The accumulated time values can be dumped to a
    file with the command perfstats This writes a
    file to /tmp (usually a local disk) called
    pgenesis.ppp.nnn.acct where ppp is the process id
    and nnn is the node number. Each time perfstats
    is called it dumps the accumulated values, but it
    does not reset them.

103
Performance Monitoring (2)
  • The monitoring package tracks the amount of time
    in various operations
  • PGENESIS_PROCESS_SNDREC_SND
  • time sending data to other nodes
  • PGENESIS_PROCESS_SNDREC_REC
  • time receiving data from other nodes
  • PGENESIS_PROCESS_SNDREC_GETFIELD
  • time spent gathering local data for
    transmission to other nodes
  • PGENESIS_PROCESS_SNDREC
  • time spent in sending and receiving data not
    accounted for by the three preceding categories
  • PGENESIS_PROCESS_SYNC
  • time spent explicitly synchronizing nodes prior
    to
  • each step

104
Performance Monitoring (3)
  • PGENESIS_PROCESS
  • time spent in parallel overhead of exchanging
    data with other nodes which is not accounted for
    by the preceding categories
  • PGENESIS_EVENT
  • time spent handling incoming spikes
  • PGENESIS
  • time spent in PGENESIS not accounted for by the
    preceding overhead categories. (In other words
    the time spent doing useful work.)

105
Comparisons and Summary
106
Alternatives to PGENESIS (1)
  • Batch scripts (Perl, Python, bash) for parameter
    searching
  • Incurs GENESIS process startup and network setup
    overheads
  • If simulations are long, and evaluation step is
    done externally already, may be simpler
  • NEURON
  • Parallel parameter searching (talk with Mike
    Hines)
  • Vectorized NEURON if you happen to have a vector
    machine handy

107
Alternatives to PGENESIS (2)
  • NEOSIM (http//www.neosim.org/)
  • Prototype stage (Java kernel released)
  • Integration with NEURON simulation engine
  • Supports automatic network partitioning
  • Modular architecture
  • Designed for scalability
  • Hand-coded simulation (Java, C, C, Fortran)
  • Very time-consuming (especially parallel coding)
  • Difficult to share models
  • Specialized code can run much faster
  • Possibly appropriate for large, but simple models
    (e.g. connectionist-style approaches)

108
Summary
  • PGENESIS is a GENESIS extension which can let you
    use multiple computers to
  • Perform large parameter searches much more
    quickly
  • Simulate large network models more quickly

109
Discussion
110
References
  • Goddard, N.H. and Hood, G., Large-scale
    simulation using parallel GENESIS, The Book of
    GENESIS, 2nd ed., Bower, J.M. and Beeman, D.
    (Eds), Springer-Verlag, 1998.
  • Goddard, N.H. and Hood, G., Parallel Genesis for
    large scale modeling, Computational Neuroscience
    Trends in Research 1997, Plenum Publishing, NY,
    1997, p. 911-917.
  • Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
    Goddard, N., De Schutter, E., A large-scale model
    of the cerebellar cortex using PGENESIS,
    Neurocomputing, 32/33 (2000), p. 1041-1046.
Write a Comment
User Comments (0)
About PowerShow.com