Title: Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)
1Parallel System for Interactive Multi-Experiment
Computational Studies(pSIMECS)
2Simecs Problem Description
- Multi-Experiment Computational Studies
- Computational Studies involving multiple
experiments, each corresponding to an individual
execution of a simulation software - Example Design Space Exploration
- Goal Given a set of possible parameter values (a
parameter space), an experiment that maps a
parameter value to a performance metric, find a
subset of the parameter space whose performance
metrics fit certain criteria.
3Simecs Problem Description
- Model Application Pareto Frontier Discovery.
- Pareto Frontier is a set of points on the
parameter space that is not completely dominated
by any other point in the parameter space. - p completely dominates q iff there is all
components in p's performance metric perform
better than q's.
4Simecs Pareto Frontier Insights
- Simulations are independent embarrassingly
parallel - An experiment corresponds to an execution of a
simulation software, which can itself be parallel
or sequential - Result from one simulation can be used to speed
up simulations of nearby parameter values (e.g.,
as initial guess for Newton Iteration.)
5Simecs Pareto Frontier Insights
- Decisions can be made with imprecise results can
trade off precision Vs resources - If parameter space is large, sweeps are
inefficient. - Need to prune portions of the space as the study
progresses, either automatically or
interactively. - Active Sampler can automatically pick
"interesting" simulations (e.g., close to
boundary)
6Simecs Example Problem
- Bridge design computational study 1D bridge in
2D space, with end points clamped. Two elastic
supports are added to the middle of bridge. - Parameter space distance of the two supports
from the end of the bridge. - Performance measures maximum deflection of the
bridge, and the cost of supports - Bridge is clamped at all support points, with
bending and stretching forces, and uniform load.
7Simecs Example Problem
Test Problem. Parameter ltr0, r1gt Performance
metric ltmax0ltrltLf(r), c(r0 ) c(r1)gt.
Cost function c(r)
8Simecs Goal
- Simecs Software on parallel systems that manages
simulation processes in a Multi-Experiment
Computational Study. - Frees users and application developers from
micromanaging every simulation process - Goal Interactive, Steerable Design Space
Exploration
9Simecs User View
- Two types of parameters
- technique parameters (e.g., discretisation of
nodes, convergence tolerance) - model parameters (e.g., young's modulus of a
material, viscosity of a fluid). - Goal As the Pareto frontier obtained from one
set of parameters is forming, the user can switch
to another setup and continue the study. - e.g., Limit the exploration space but increase
the resolution.
10Simecs Developer View
- Application Developer provides 3 modules
- Simulation Maps a parameter space point to
performance space point - Visualisation interaction Displays the
relevant information to user Collects
information from user, and maps the information
into the Simulation module - Transformation Transform a state of a simulation
on one technique parameter into another. - e.g., interpolate checkpoints from different
resolutions
11Simecs System View
- Shared object layer, Active sampler, Resource
Allocator
12Simecs System View
- Shared object space layer System-wide repository
of shared objects (e.g., checkpoints, error
estimations, results) - Sampler Based on users' specifications, issues
sample points where simulations will be run - Resource Allocator / Manager Maps simulations
into computing elements, decides whether to use a
checkpoint.
13Simecs SISOL
- Spatially-Indexed Shared Object Layer (SISOL)
- Used for storing system-wide shared objects.
- For the model problem, checkpoints, and results
(performance metric at each parameter point). - ltIndex, object set idgt names a unique object in
the system.
14Simecs SISOL
- Objects are typed SISOL requires pack() and
unpack() implementations for each type. For
parallel object types, also requires a function
to map parallel objects into different
decompositions. - Supports split-phase create, delete, read and
write to enforce read-modify-write consistency - Supports neighborhood query
15Simecs SISOL Implementation
- Ideal implementation directory-based cache,
where each node participates in storing of
objects. - Current implementation
- Single TCP Server
- In core
- Hash-map based lookup
- Linear lookup for nearest neighbor
- Supports only sequential objects
16Simecs SISOL Implementation
- Object sets created on server
- Nearest neighbor query retrieves coordinates only
- Supports Sequential Petsc Vector object type by
default. - Sufficient for small sets, small objects
17Simecs SISOL Use
- Current Pareto Frontier problem uses two object
sets - Result set (parameter point gt performance
metric) - Checkpoint set (parameter point gt Sequential
Petsc vectors) - In the test problem, parameter point is a 2D
vector, so result set checkpoint set have 2D
indices.
18Simecs FUEL
- Frame/Update Exchange Layer Control layer
between the manager and simulation processes - Codes that represent a functional aspect of a
steerable application are grouped together
(called a Satellite). - Event-based on manager process Poll-based on
simulation processes - Dynamic model Satellites can be activated and
decommissioned as a simulation is running
19(No Transcript)
20Simecs Active Sampler
- Resolves the pareto frontier progressively
- Maintains a task queue and a result set
- Task queue points in parameter space of
interest, result set points discovered so far
that are undominated (i.e., current pareto set
candidates) - Seeds a task queue with points from a lattice on
the parameter space. - Run the task queue.
21Simecs Active Sampler
- For each result that comes back, decide if the
point is undominated by all points in the result
set. If so, remove all points in the result set
that are dominated by it, add it to the result
set, and insert its lattice neighbors into the
task queue. - Continue until task queue is empty.
- Refine the lattice, then repeat
- Effect result set contains a set of pareto point
candidates that had originated from a lattice.
The lattice is finer as more time is spent.
22Simecs Active Sampler
Initial Grid
23Simecs Active Sampler
1st level results
24Simecs Active Sampler
First Level Pareto Frontier
25Simecs Active Sampler
First Refinement
26Simecs Active Sampler
2nd level results
27Simecs Active Sampler
Second level Pareto Frontier
28Simecs Active Sampler
2nd Refinement
29Simecs Active Sampler
3rd level results
30Simecs Active Sampler
3rd level Pareto Frontier
31Simecs Manager
- Spawns off simulation processes
- When the result of a simulation comes back (via a
FUEL callback) - Registers the result
- Asks active sampler for the next point to run
- Looks up the SISOL for a checkpoint to jump-start
the next point - Sends the parameters of the next simulation,
coordinates of the checkpoint, and error
tolerances to the simulation process.
32Simecs Test System
- Single Server implementation of SISOL to store
checkpoint set - 3 Versions Samplers Active, Random, and Sweep
- TCP-based FUEL
- Simulation implemented with PETSc SNES solver.
- Jump-start from Checkpoints use checkpoint's
configuration as the starting guess
33Simecs Test System
- Heterogenous cluster
- 1 1.5GHz Athlon node (manager, SISOL Server),
- 22 1.2GHz Duron nodes (simulation processes)
- 10 3 GHz Pentium 4 nodes. (simulation processes)
- 100Mbps switched Ethernet network between Athlon
and Duron nodes, 10Mbps Ethernet between Pentium
4 nodes.
34Simecs Test Result (Sampler)
- Active Sampler compared against 1) Grid-based
sampler, which performs a parameter sweep on the
grid with increasing refinement, 2) Random
sampler - Both run for 1500 simulations, and the partial
frontiers are dumped at periodic intervals.
Housedorff distance is measured, using the final
Active Sampler-based frontier with 1500
simulations as the ground truth.
35Simecs Test Result (Sampler)
36Simecs Test Result (Sampler)
37Simecs Test Result (Sampler)
38Simecs Test Result (Sampler)
39Simecs Test Result (Sampler)
40Simecs Test Result (Sampler)
41Simecs Test Result (Sampler)
42Simecs Test Result (Sampler)
43Simecs Test Result (Sampler)
44Simecs Test Result (Sampler)
45Simecs Test Result (Sampler)
46Simecs Test Results (Sampler)
47Simecs Test Results (Sampler)
48Simecs Test Results (Sampler)
49Simecs Test Results (Sampler)
50Simecs Test Results (Sampler)
51Simecs Test Results (Sampler)
52Simecs - Test Result (Checkpoints)
- Cuts down number of iterations per simulation.
53Simecs Test Result (Scaling)
Duron nodes added (Slower speed, faster
communication)
54Simecs Test Result (Scaling)
55Simecs Conclusions
- Multiple experiments can be managed automatically
- Interactive speed can be achieved via re-use of
checkpoints, active sampling, and partial results
run time goes from 3088 seconds down to 17, and
lower if partial frontiers can be used
56Simecs Conclusions
- TCP-based communication framework provides system
with portability - can be used on heterogeneous
clusters - Spatially-indexed object sets are useful
communication substrate
57Simecs Future work
- Distributed implementation of SISOL
- Parallelise individual simulations (SISOL Support
for Parallel Objects) - MPI-based communication for SISOL and FUEL
- Interactivity