Title: PGENESIS Tutorial GUM02
1PGENESIS Tutorial GUM02
- Greg Hood
- Pittsburgh Supercomputing Center
2What is PGENESIS?
- Library extension to GENESIS that supports
- communication among multiple processes
- so nearly everything available in GENESIS is
- available in PGENESIS
- Allows multiple processes to perform multiple
- simulations in parallel
- Allows multiple processes to work together
- cooperatively on a single simulation
- Runs on workstations or supercomputers
3History
- PGENESIS developed by Goddard and Hood
- at PSC (1993-1998)
- Current contact pgenesis_at_psc.edu
4Tutorial Outline
- Installation
- What PGENESIS provides
- Using PGENESIS for parallel parameter searching
- Using PGENESIS for simulating large networks
more quickly - Scaling up for large runs
- A comparison of PGENESIS with alternatives
5PGENESIS Installation
6Installation Requirements
- At least 1 Unix-like computer on which GENESIS
will run. - Same account name on all computers.
- If multiple machines are to be used together,
then it is best if they are all on the same
network segment (e.g. same 100Mbit/s Ethernet
switch).
7Installation GENESIS
- 1. Install regular (serial) GENESIS
- Make sure you have configured serial GENESIS to
include all libraries that you will ever want to
use with PGENESIS. - b. make all make install
- c. make nxall make nxinstall if you want an
Xodus-less version of PGENESIS
8Installation ssh
- 2. Configure ssh to allow process startup across
machines without password entry - You probably already have ssh/sshd. If not,
download from http//www.openssh.org and install
according to instructions. - Run ssh-keygen t rsa on each machine from which
you will launch PGENESIS to generate
private/public keys. - Append all of the public keys (stored in
/.ssh/id_rsa.pub) to /.ssh/authorized_keys on
all host on which you want to run PGENESIS
processes. - Test ssh remote_host_name remote_command should
not ask you for a password.
9Installation PVM
- 3. Install PVM message passing library
- Download from http//www.csm.ornl.gov/pvm
- Modify .bashrc to set PVM_ROOT to where PVM was
installed
export PVM_ROOT/usr/share/pvm3 - Modify .bashrc to set PVM_RSH to the ssh
executable
export PVM_RSH/usr/bin/ssh - Build PVM (cd PVM_ROOT make)
- Test PVM
- pvm
- pvmgt add otherhost
- pvmgt halt
10Installation PGENESIS
- 3. Install PGENESIS package
- Download from http//www.genesis-sim.org
- cp Makefile.dist Makefile
- Edit Makefile
- make install
- make nxinstall for Xodus-less version
11Installation Simple
- Cluster of similar machines
- Shared filesystem
- Home directory is located on shared filesystem
12Installation Complex
- Heterogeneous cluster
- Novel processor/OS
- No shared filesystems
- Custom libraries linked into GENESIS
- Recommended approach
- Install on each machine independently and make
sure PGENESIS works locally before trying to use
all machines together
13The "pgenesis" Startup Script (1)
- Purpose checks that the proper PVM files are in
place, starts the PVM daemon, then starts the
appropriate PGENESIS executable. - Basic syntax
- pgenesis scriptname.g
-
14The "pgenesis" Startup Script (2)
- Options
- -config ltfilenamegt where ltfilenamegt
contains a list of hosts to use - -debug ltmodegt where ltmodegt is
- one of the following tty dbx gdb
- -nox do not use Xodus
- -v verbose mode
- -help list the valid pgenesis
- script flags
15PGENESIS Functionality
16How PGENESIS Runs in Parallel
- Workstation typically one process starts and
then spawns n-1 other processes - mapping of processes to processors is often 1 to
1, but may be many to 1 during debugging
17How PGENESIS Runs in Parallel
- Massively parallel machines
- all n processes are started simultaneously by
the operating system - mapping of processes to processors is nearly
always 1 to 1 - On both
- every process runs same script
- this is not a real limitation
18Nodes and Zones
- Each process is referred to as a "node".
- Nodes may be organized into "zones".
- A node is fully specified by a numeric string of
the form ltnodegt.ltzonegt. - Simulations within a zone are kept synchronized
in simulation time. - Each node joins the parallel platform using the
paron command. - Each node should gracefully terminate by calling
paroff
19Every node in its own zone
- Simulations on each node are not coupled
temporally. - Useful for parameter searching.
- We refer to nodes as 0.0, 0.1, 0.2,
20All nodes in one zone
- Simulations on each node are coupled temporally.
- Useful for large network models
- Zone numbers can be omitted since we are dealing
with only one zone we can thus refer to nodes as
0, 1, 2,
21Hybrid schemes
- Parameter searching on large network models
-
- Example The network is partitioned over 8 nodes
we run 16 simulations in parallel to do parameter
searching on this model, thus using a total of
128 nodes.
22Nodes have distinct namespaces
- /elem1 on node 0 refers to an element on node 0
- /elem1 on node 1 refers to an element on node
1 - To avoid confusion we recommend that you use
distinct names for elements on different nodes
within a zone.
23GENESIS Terminology
- GENESIS Computer Science
- Object Class
- Element Object
- Message Connection
- Value Message
-
24Who am I?
- PGENESIS provides several functions that allow a
script to determine its place in the overall
parallel configuration - mytotalnode - of this node in platform
- mynode - of this node in this zone
- myzone - of this zone
- ntotalnodes - of nodes in platform
- nnodes - of nodes in this zone
- nzones - of zones
- npvmcpu - of processors in configuration
- mypvmid - PVM task identifier for this node
(all numbering starts at 0)
25Styles of Parallel Scripts
- Symmetric Each node executes the same script
commands. - Master/Worker One node (usually node 0)
coordinates processing and issues commands to the
other nodes.
26Explicit Synchronization
- barrier - causes thread to block until all nodes
within the zone have reached the corresponding
barrier - barrier -wait at default barrier
- barrier 7 -wait at named barrier
- barrier 7 100000 -timeout is 100000
seconds - barrierall - causes thread to block until all
nodes in all zones have reached the corresponding
barrier - barrierall -wait at default barrier
- barrierall 7 -wait at named barrier
- barrierall 7 100000 -timeout is 100000 sec
27Implicit Synchronization
- Two commands implicitly execute a zone-wide
barrier - step - implicitly causes the thread to block
until all nodes within the zone are ready to step
(this behavior can be disabled with setfield
/post sync_before_step 0) - reset - implicitly causes the thread to block
until all nodes have reset - These commands require that all nodes in the zone
participate, thus the barrier.
28Remote Function Calls (1)
- An "issuing" node directs a procedure to run on
an "executing" node. - Examples
- some_function_at_2 params... some_function_at_all
params... some_function_at_others params...
some_function_at_0.4 params... some_function_at_1,3,5
params...
29Remote Function Calls (2)
- Each remote function call causes the creation of
a new thread on the executing node. - All parameters are evaluated on the issuing node.
- Example if called from node 1,
some_function_at_2 mynode will execute
some_function 1 on node 2
30Remote Function Calls (3)
- When does the executing node actually perform the
remote function call, since we don't use hardware
interrupts? - While waiting at barrier or barrierall.
- While waiting for its own remote operations to
complete, e.g. func_at_node, raddmsg - When the simulator is sitting at the prompt
waiting for user input. - When the executing script calls clearthread or
clearthreads.
31Threads
- A thread is a single flow of control within a
PGENESIS script being executed. - When a node starts, there is exactly one thread
on it the thread for the script. - There may potentially be many threads per node.
These are stacked up, with only the topmost
actually executing at any moment. - clearthread yield to one thread awaiting
execution (if one exists) - clearthreads yield to all threads awaiting
execution
32Asynchronous Calls (1)
- The async command allows a script to dispatch an
operation on a remote node without waiting for
its completion. - Example
- async some_function_at_2 params...
33Asynchronous Calls (2)
- One may wait for an async call to complete,
either individually, - future async some_function_at_2 ...
- ... // do some work locally
- waiton future
- or for an entire set
- async some_function_at_2 ...
- async some_function_at_5 ...
- ...
- waiton all
34Asynchronous Calls (3)
- Asynchronous calls may return a value.
- Example
- int future async myfunc_at_1 // start thread
on node 1
// do some work locally - int result waiton future // wait
for thread's result - Thus the term "future" - it is a promise of a
value some time in the future. waiton calls in
that promise.
35Asynchronous Calls (4)
- async returns a value which is only to be used as
the parameter of a waiton call, and waiton must
only be called with such a value. - Remote function calls from a particular issuing
node to a particular executing node are
guaranteed to be performed in the sequence they
were sent. - There is no guaranteed order among calls
involving multiple issuing or executing nodes.
36Advice about Barriers (1)
- It is very easy to reach deadlock if barriers are
not handled correctly. PGENESIS tries to warn you
by printing a message that it is waiting at a
barrier. - Examples of incorrect barrier usage
- Each node executes barrier mynode
- Each node executes barrier_at_all
- A single node executes barrier_at_others
barrier However async barrier_at_others
barrier will work!
37Advice about Barriers (2)
- Guideline if your script is operating in the
symmetric style (all nodes execute all
statements), never use barrier_at_ - If your script is operating in the master-worker
style, master must ensure it calls a function on
each worker that executes a barrier before it
enters the barrier - barrier async barrier_at_others
will not work.
38Commands for Network Creation
- Several new commands permit the creation of
"remote" (internode) messages - raddmsg /local_element /remote_element_at_2 \
- SPIKE
- rvolumeconnect /local_elements \
- /remote_elements_at_2 \
- -sourcemask ... -destmask ... \
- -probability 0.5
- rvolumedelay /local_elements -radial 10.0
- rvolumeweight /local_elements -fixed 0.2
- rshowmsg /local_elements
39Parallel I/O Display
- How can one display from more than one node?
- Use an xview object.
- Add an index field to the displayed elements.
- Use the ICOORDS and IVAL1 ... IVAL5 messages
instead of the COORDS and VAL1 .. VAL5 messages - raddmsg /src_elems /xview_elem_at_0 \ ICOORDS
io_index_field x y z - raddmsg /src_elems /xview_elem_at_0 \ IVAL1
io_index_field Vm
40Interaction with Xodus
- Xodus introduces another degree of parallelism
via the X11 event processing mechanism.
PGENESIS periodically instructs the X Server to
process any X events. Some of those events may
result in some script code being run. - Race condition processing order is
unpredictable. - Safe 1 ensure all affected nodes are at a
barrier (or equivalent) - Safe 2 ensure mouse/keyboard events do not
cause remote operations that require the
participation of another node.
41Parallel I/O Writing a File
- How can one write a file from more than one node?
- Use a par_asc_file or par_disk_out object.
- Add an index field to the source elements.
- raddmsg /src_elems \ /par_asc_file_elem_at_0 \
- SAVE io_index_field Vm
42Tips for Avoiding Deadlocks
- Use lots of echo statements.
- Use barrier IDs.
- Do not execute barriers remotely (e.g.,
barrier_at_all). - Remember that step usually does an implicit
barrier. - Have each node do its own step command, or have
one controlling node do a step_at_all. (similarly
for reset) - Do not use the stop command.
- Keep things simple.
43Motivation
- Parallel control of setup can be hard.
- Parallel control of simulation can be hard.
- Debugging parallel scripts is hard.
44How PGENESIS Fits into Schedule
- Schedule controls the order in which GENESIS
objects get updated. - At beginning of step, all internode data is
transferred. - There will be equivalence to serial GENESIS only
if remote messages do not pass from earlier to
later elements in the schedule.
45How PGENESIS Fits into Schedule
- addtask Simulate /CLASSpostmaster -action
PROCESS - addtask Simulate /CLASSbuffer -action
PROCESS - addtask Simulate /CLASSprojection -action
PROCESS - addtask Simulate /CLASSspiking -action
PROCESS - addtask Simulate /CLASSgate -action
PROCESS - addtask Simulate /CLASSsegmentCLASS!membran
e\ - CLASS!gateCLASS!concentration -action
PROCESS - addtask Simulate /CLASSmembrane -action
PROCESS - addtask Simulate /CLASShsolver -action
PROCESS - addtask Simulate /CLASSconcentration \
- -action
PROCESS - addtask Simulate /CLASSdevice -action
PROCESS - addtask Simulate /CLASSoutput -action
PROCESS
46Adding Custom "C" Code
- Uses
- data analysis
- interfacing
- custom objects
- PGENESIS allows user's custom libraries to be
linked in, similarly to GENESIS - We recommend that you first incorporate your
custom library into serial GENESIS, before trying
to use it with PGENESIS.
47Modifiable Parameters
- /post/sync_before_step boolean (default 1)
- /post/remote_info boolean (default 1) enables
rshowmsg - /post/perfmon boolean (default 0) enables
performance monitoring - /post/msg_hang_time float (default 120.0)
seconds before giving up on remote operation - /post/pvm_hang_time float (default 3.0) seconds
between printing dots while waiting for a message - /post/xupdate_period float (default 0.01)
seconds between checking for X events when at
barrier
48Limitations of PGENESIS
- No rplanarweight, rplanardelay use
corresponding 3-D routines rvolumeweight,
rvolumedelay - Cannot delete remote messages
- getsyncount, getsynindex, getsyndest no longer
return the correct values.
49Parameter Searching with PGENESIS
50Model Characteristics
- The following are prerequisites to use PGENESIS
for optimization on a particular parameter
searching problem - Model must be expressed in GENESIS.
- Decide on the parameter set.
- Have a way to evaluate the parameter set.
- Have some range for each of the parameter values.
- The evaluations over the parameter-space should
be reasonably well-behaved. - Stopping criterion
51Trivial Model
- Rather than do a simulation, we will just
optimize a function f of four parameters a, b, c,
and d - f(a, b, c, d) 10.0 (a-1)(a-1)
(b-2)(b-2) (c-3)(c-3)
(d-4)(d-4) - Evaluation of the model
- fitness f(a, b, c, d)
- Range of parameters -10 lt a,b,c,d lt 10
- Evaluation is definitely well-behaved.
- Stopping criterion Stop after 1000 individuals.
52Master/Worker Paradigm (1)
53Master/Worker Paradigm (2)
- All nodes in a separate zone.
- Node 0.0 will control the search.
- Nodes 0.1 through 0.n-1 will run the model and
perform the evaluation.
54Commands for Optimization
- Typically these are organized in a master/worker
fashion with one node (the master) directing the
search, and all other nodes evaluating parameter
sets. Remote function calls are useful in this
context for - sending tasks to workers
- async task_at_worker param1...
- having workers return evaluations to master
- return_result_at_master result
55Choose a Search Strategy
- Genetic Search
- Simulated Annealing
- Monte Carlo (for very ill-behaved search spaces)
- Nelder-Mead (for well-behaved search spaces)
- Use as many constraints as you can to restrict
the search space - Always do a sanity check on results
56A Parallel Genetic Algorithm
- We adopt a population-based approach as opposed
to a generation-based one. - We will keep a fixed population "alive" and use
the workers to evaluate the fitness of candidate
individuals. - If a candidate turns out to be better than some
member of the current population, then we replace
the worst member of the current population
with the new individual.
57Parameter Representation
- We represent the set of parameters that define an
individual as a string of bits. Each 16-bit
string (one "gene") is interpreted as a signed
integer and then divided by 1000.0 to yield the
floating point value. To generate a new
candidate from the existing population - Pick a member of the population at random.
- Go through each bit of the bit string, and mutate
it with some small probability.
58Main Script
- paron -farm -silent 0 -nodes n_nodes \
- -output o.out -executable nxpgenesis
- barrierall
- if (mytotalnode 0)
- search
- end
- barrierall 7 1000000
- paroff
- quit
59Master Conducts the Search
- function search
- int i
- init_search
- init_farm
- for (i 0 i lt individuals i i 1)
- if (i lt population)
- init_individual
- else
- mutate_individual rand 0
actual_population - end
- delegate_task i bs_a bs_b bs_c bs_d
- end
- finish
- end
60Master Conducts the Search
- function delegate_task
- while (1)
- if (free_index gt 0)
- async worker_at_0.getfield \
- /freefree_index value \
- bs_a bs_b bs_c bs_d
- free_index free_index - 1
- return
- else
- clearthreads
- end
- end
- end
61Workers Evaluate Individuals
- function worker (bs_a, bs_b, bs_c, bs_d)
- int bs_a, bs_b, bs_c, bs_d
- float a, b, c, d, fit
- a (bs_a - 32768.0) / 1000.0
- b (bs_b - 32768.0) / 1000.0
- c (bs_c 32768.0) / 1000.0
- d (bs_d 32768.0) / 1000.0
- fit evaluate a b c d
- return_result_at_0.0 mytotalnode bs_a bs_b \
- bs_c bs_d fit
- end
62Workers Evaluate Individuals
- function evaluate (a, b, c, d)
- float a, b, c, d, fit
- fit 10.0 (a-1)(a-1) (b-2)(b-2) \
- - (c-3)(c-3) (d-4)(d-4)
- return fit
- end
63Master Integrates the Results (1)
- function return_result (node, bs_a, bs_b, bs_c,
bs_d, fit) - int node, bs_a, bs_b, bs_c, bs_d
- float fit
- if (actual_population lt population)
- least_fit actual_population
- min_fitness -1e10
- actual_population actual_population 1
- end
64Master Integrates the Results (2)
- if (fit gt min_fitness)
- setfield /populationleast_fit fitness
fit - setfield /populationleast_fit a_value
bs_a - setfield /populationleast_fit b_value
bs_b - setfield /populationleast_fit c value
bs_c - setfield /populationleast_fit d value
bs_d - if (actual_population population) recompute
_fitness_extremes - end
- end
- free_index free_index 1
- setfield /freefree_index value node
- end
65A More Realistic Model
- We have a one compartment cell model of a spiking
neuron. Dynamics are probably well-behaved. - Parameters are the conductances for the Na, Kdr,
Ka, and KM channels. We know the conductance
values to be in the range from 0.1 to 10.0 a
priori. - We write spike times to a file, then compare this
using a C function, spkcmp, to "experimental"
data. - Stop when our match fitness exceeds 20.0
66Improved Parameter Representation
- As before, we still represent the set of
parameters that define an individual as a string
of bits. However, now each 16-bit string will
logarithmically map into the range of 0.1 to 10.0
so that we will have increased resolution at the
low end of the scale.
67Crossover Mutations
- Pick a member of the population at random.
- Decide whether to do crossover according to the
crossover probability. If we are doing crossover,
pick another random member of the current
population, and combine the "genes" of those
individuals. If we aren't doing crossover, just
copy the bits of the original individual. - Go through each bit of the bit string, and mutate
it with some small probability.
68Main Script (1)
- int n_nodes 4
- int individuals 1000
- int population 10
- float stopping_criterion 20.0
- float crossover_prob 0.5
- float bit_mutation_prob 0.02
69Main Script (2)
- include population.g // functions for GA
- // population-based
- // parameter searches
- // model-specific files
- include siminit.g // defines parameters of
- // simulation
- include fI.g // sets up table of currents
- include channels.g // defines the channels
- include simcell.g // functions to load in the
- // cell model
- include eval.g // functions to evaluate the
model
70Main Script (3)
- paron -farm -silent 0 -nodes n_nodes \
- -output o.out -executable nxpgenesis
- barrierall
- if (mytotalnode 0)
- init_master
- pb_search individuals population
- else
- init_worker
- end
- barrierall 7 1000000
- paroff
71Parameters Are Customizable
- function init_params
- setfield /params0 label "Na" scaling "log
- setfield /params0 min_value 0.1 max_value 10.0
- setfield /params1 label "Kdr" scaling "log
- setfield /params1 min_value 0.1 max_value 10.0
- setfield /params2 label "Ka" scaling "log
- setfield /params2 min_value 0.1 max_value 10.0
- setfield /params3 label "KM" scaling "log
- setfield /params3 min_value 0.1 max_value 10.0
- end
72Worker Evaluates Individuals (1)
- function evaluate
- float match, fitness
- // first run the simulation
- newsim getfield /params0 value \
- getfield /params1 value \
- getfield /params2 value \
- getfield /params3 value runfI
- call /out/sim_output_file FLUSH
73Worker Evaluates Individuals (2)
- // then find the simulated spike times
- gen2spk sim_output_file delay \
- current_duration total_duration
- // then compare the simulated spike
- // times with the experimental data match
spkcmp real_spk_file \ - sim_spk_file -pow1 0.4 -pow2 0.6 \
- -msp 0.5 -nmp 200.0
- fitness 1.0 / sqrt match return
fitness - end
74Tuning Search
- representation
- parameter selection
- generation vs population-based approach
- generation/population size
- crossover probability
- crossover method
- mutation probability
- initial ranges
75Large Networks with PGENESIS
76Parallel Network Creation
- In parallel network creation make sure elements
exist before connecting them up, e.g. - create_elements(...)
- barrier
- create_messages(...)
77Goals of decomposition
- Keep all processors busy all the time on useful
work - Use as many processors as are available
- Key concepts are
- Load-balancing
- Minimizing communication
- Minimizing synchronization
- Scalable decomposition
- Parallel I/O
78Load balancing
- Attempt to parcel out the modeled cells such that
each CPU takes the same amount of time to
simulate one step - This is static load balancing - cells do not move
- Dedicated access to the CPUs is required for
effective decomposition - Easier if identically configured CPUs.
- PGENESIS provides no automated load-balancing but
there are some performance monitoring tools.
79Minimizing communication
- Put highly connected clusters of cells on the
same PGENESIS node. - Think of each synapse with a presynaptic cell on
a remote node as expensive. - The same network distributed among more nodes
will result in more of these expensive synapses
hence, more nodes can be counterproductive. - The time spent communicating can overwhelm the
time spent computing.
80Orient_tut Example
81Non-scalable decomposition
82Scalable decomposition (1)
- Goal as the number of available processors
grows, your model naturally partitions into finer
divisions -
83Scalable decomposition (2)
84Scalable decomposition (3)
- To the extent that you can arrange your
decomposition to scale with the number of
processors, it is a very good idea to create the
scripts using a function of the number of nodes
anywhere that a node number must be explicitly
specified. - E.g., createmap /library/rec /retina/recplane \
- REC_NX / n_slices REC_NY \
- -delta REC_SEPX REC_SEPY \
- -origin -REC_NX REC_SEPX / 2 \
- slice REC_SEPX
REC_NX / n_slices \ -REC_NY REC_SEPY / 2
85Case Study Cerebellar Model
- Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
Goddard, N., De Schutter, E., A large-scale
model of the cerebellar cortex using PGENESIS,
Neurocomputing, 32/33 (2000), p. 1041-1046. - 16 Purkinje cells embedded in an environment of
other simpler, but more numerous cells - Simulated on 128 processors of PSCs Cray T3E
86Cell Populations Connectivity
873-D Representation of Network
88Model Partitioning
89Timings on 128 Processors of T3E
90Timings vs. Model Size
91Timings on Workstation Network
92Significant Overhead on Cluster
93Scaling Up
94Getting Cycles
- NSF-Funded Supercomputing Centers
- Pittsburgh Supercomputing Center
(http//www.psc.edu) - PGENESIS installed on 512 processor Cray T3E
- NPACI (http//www.npaci.edu)
- Worked on MPI-based PGENESIS
- Alliance (http//www.ncsa.uiuc.edu)
95The High End
- 3000-processor Terascale computer at PSC
96Parallel Script Development
- 1. Develop single cell prototypes using serial
GENESIS. - 2. (a) For network models, decide partitioning
and develop scalable scripts. (b) For parameter
searches, develop scripts to run and evaluate a
single individual, and a scalable script that
will control the search. - 3. Try out scripts on single processor using the
minimum number of nodes.
97Parallel Script Development
- 4. Try out scripts on single processor but
increase the number of nodes. - 5. Try out scripts on small multiprocessor
platform. - 6. Try out scripts on large multiprocessor
platform.
98Resource Limits and Other Tips
- On the Cray T3E set PVM_SM_POOL to ensure
adequate PVM buffer space. This should be set to
the maximum number of messages that might arrive
at any PE before it gets a chance to process
them. - On other machines, you may need to set PVMBUFSIZE
to address similar issues. - When debugging interactively, set the timeout so
that other nodes do not timeout - setfield /post msg_hang_time 10000.0
99Reducing Synchronization Delay
- In network models, axonal delays L are large
compared to the simulation time step. - A spike generated at simulation time T on one
node need not be physically delivered to the
destination synapse on another node until
simulation time TL. - PGENESIS can use this to reduce unnecessary
waiting. Node B can get ahead of node A by the
minimum of all the axonal delays on the
connections from cells on A to synapses on B.
This is called the lookahead of B with respect to
A. - You must set /post/sync_before_step to 0 for this
to allow this looser synchronization.
100Reducing Synchronization Delay
- A goal when you are partitioning a network across
nodes is to make the lookahead between any pair
of nodes large. - PGENESIS provides the setlookahead command for
you to inform it of the lookahead between nodes - setlookahead 0.01 // sets lookahead to 10 mS
- setlookahead 3 0.01 // sets lookahead to 10 mS
w.r.t. node 3 - The getlookahead command reports the current
setting with respect to a particular node, and
the showlookahead command reports the minimum
lookahead to all other nodes - getlookahead 3 // gets lookahead with respect to
node 3 - showlookahead // get lookahead with respect to
all nodes
101Parallel I/O
- Currently the I/O facilities (disk elements and
Xodus elements) are tightly synchronized with the
simulation (no lookahead). Therefore sending
messages to Xodus objects or disk objects on
remote nodes usually slows the simulation to a
crawl. Use Xodus only for post-processing. - Try to arrange input and output to be via local
elements. On workstations it is preferable to
access local disk. If access is via a shared file
system (e.g., NFS, AFS), use different output
disk files for different nodes, and amalgamate
the data after the simulation is over.
102Performance Monitoring (1)
- A script can turn on performance monitoring with
setfield /post perfmon 1 and turns it off with
setfield /post perfmon 0 - Whenever performance monitoring is active, the
categories listed below accumulate time. - To ignore the time involved in construction of a
model, do not activate performance monitoring
until just prior to the first simulation step. - The accumulated time values can be dumped to a
file with the command perfstats This writes a
file to /tmp (usually a local disk) called
pgenesis.ppp.nnn.acct where ppp is the process id
and nnn is the node number. Each time perfstats
is called it dumps the accumulated values, but it
does not reset them.
103Performance Monitoring (2)
- The monitoring package tracks the amount of time
in various operations - PGENESIS_PROCESS_SNDREC_SND
- time sending data to other nodes
- PGENESIS_PROCESS_SNDREC_REC
- time receiving data from other nodes
- PGENESIS_PROCESS_SNDREC_GETFIELD
- time spent gathering local data for
transmission to other nodes - PGENESIS_PROCESS_SNDREC
- time spent in sending and receiving data not
accounted for by the three preceding categories - PGENESIS_PROCESS_SYNC
- time spent explicitly synchronizing nodes prior
to - each step
104Performance Monitoring (3)
- PGENESIS_PROCESS
- time spent in parallel overhead of exchanging
data with other nodes which is not accounted for
by the preceding categories - PGENESIS_EVENT
- time spent handling incoming spikes
- PGENESIS
- time spent in PGENESIS not accounted for by the
preceding overhead categories. (In other words
the time spent doing useful work.)
105Comparisons and Summary
106Alternatives to PGENESIS (1)
- Batch scripts (Perl, Python, bash) for parameter
searching - Incurs GENESIS process startup and network setup
overheads - If simulations are long, and evaluation step is
done externally already, may be simpler - NEURON
- Parallel parameter searching (talk with Mike
Hines) - Vectorized NEURON if you happen to have a vector
machine handy
107Alternatives to PGENESIS (2)
- NEOSIM (http//www.neosim.org/)
- Prototype stage (Java kernel released)
- Integration with NEURON simulation engine
- Supports automatic network partitioning
- Modular architecture
- Designed for scalability
- Hand-coded simulation (Java, C, C, Fortran)
- Very time-consuming (especially parallel coding)
- Difficult to share models
- Specialized code can run much faster
- Possibly appropriate for large, but simple models
(e.g. connectionist-style approaches)
108Summary
- PGENESIS is a GENESIS extension which can let you
use multiple computers to - Perform large parameter searches much more
quickly - Simulate large network models more quickly
109Discussion
110References
- Goddard, N.H. and Hood, G., Large-scale
simulation using parallel GENESIS, The Book of
GENESIS, 2nd ed., Bower, J.M. and Beeman, D.
(Eds), Springer-Verlag, 1998. - Goddard, N.H. and Hood, G., Parallel Genesis for
large scale modeling, Computational Neuroscience
Trends in Research 1997, Plenum Publishing, NY,
1997, p. 911-917. - Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
Goddard, N., De Schutter, E., A large-scale model
of the cerebellar cortex using PGENESIS,
Neurocomputing, 32/33 (2000), p. 1041-1046.