Title: High Performance Computing Engine for Predictive Battlespace Awareness: Final Presentation Air Force Phase II SBIR Topic No. AF03-094 Contract No. FA8750-04-C-0066
1High Performance Computing Engine for Predictive
Battlespace Awareness Final Presentation Air
Force Phase II SBIRTopic No. AF03-094Contract
No. FA8750-04-C-0066
- Gary Blank
- Metron, Inc.
- May 31, 2006
2Presentation Outline
- Overview of Software Framework
- Called Dynamic Simulation Framework (DSF)
- Detailed Description of DSF
- System Components Multiprocessor, SPEEDES
Simulation, Emulation, System Interface - Project issues
3Dynamic Simulation System
- Emulation runs continually, mirrors state of
battle by integrating real-world reports - System interface provides gateway for inserting
reports - Predictive simulation
- Copy emulation and quickly run into future
- Course of action (COA) evaluation
- Copy emulation, insert COA information and run
simulation to assess COA - Simulation Interface
- Display/control emulation and simulations,
request simulations, display simulation results
4Emulation Mirrors Real-World State
Emulation
System Interface
Real- World Reports
5Predictive Simulation
Copy Emulation State
Simulate into future
Clone
Emulation
System Interface
Request Predictive Simulaiton
Return MOE Report
6COA Evaluation
Copy and Insert COA Specifics
Launch First Two Evaluations
COA2
Emulation
COA1
System Interface
Request COA Evaluations
Return MOE Reports
7System Components
- Multiprocessor Machine
- High-Performance Computer (HPC) or Cluster
- SPEEDES-Based Simulation
- Emulation simulation specialized to receive
real-world reports, mirror relevant state - System Interface
- Master System Interface (MSI) external module
attached to emulation - Subsidiary Interfaces external modules attached
to generated simulations - Simulation Output
- MOE Reports
- Any other useful output
8MultiprocessorMachines
9Multiprocessor Architecture
- General Architecture
- N CPUs divided into subsets of K CPUs that share
RAM - Compute Node subset of K CPUs that share RAM
- Examples
- ASCs hpc11 (SGI Origin 3900) 4 compute nodes
with 512 CPUs/node 2048 CPUs total - NAVOs kraken (IBM P655) 368 compute nodes with
8 CPUs/node 2944 CPUs total - ASCs hpc10 (Compaq SC-45) 128 compute nodes
with 4 CPUs/node 1024 CPUs total - Other multiprocessors have 2, 16, 32 CPUs/compute
node (probably others)
10Typical Multiprocessor Architecture
Compute Node
Compute Node
Compute Node
? ? ?
Shared Memory
Shared Memory
Shared Memory
CPUs
High-Speed Communications
11System Resources
- Need to tell system what resources it has to work
with InitialResources.par file - Currently CPUs are only resource
- Need to specify number CPUs and how distributed
over compute nodes - HPCs have batch queues submit job running on N
CPUs - Resource Manager
- Separate component working with system
(ResourceMgr program) - Balances workload over available CPUs
- Tries to allocate favorable communications
topology for COA evaluations and predictive
simulations
12SPEEDES Communications Topology
Shared Memory
Shared Memory
TCP/IP Comms
0
1
2
3
4
5
6
7
- 8-Node SPEEDES Simulation
- Split across two 4-CPU compute nodes
- Good topology for this machine
13InitialResources.par File
- AvailableCPUs
- ComputeNode0 // Compute node name
- int CompNodeID 110 // ID of compute node
- int NumCPUs 8 // 8 cpus on ComputeNode0
-
- ComputeNode1 // Compute node name
- int CompNodeID 111 // ID of compute node
- int NumCPUs 16 // 16 cpus on ComputeNode1
-
- ComputeNode2 // Compute node name
- int CompNodeID 112 // ID of compute node
- int NumCPUs 32 // 32 cpus on ComputeNode2
-
14SPEEDES-BasedSimulation
15SPEEDES-Based SimulationComponent
- Parallel simulation distributed over two or more
CPUs - Simulation Modes same executable must run in two
different modes - Emulation mode
- Run at wall clock speed
- Accept/integrate real-world reports
- Simulate little or nothing
- Simulation mode
- Run as fast as possible
- Do not accept outside reports
- Simulate all future events/state and return MOEs
16Simulation Mode Events
- Partition simulation events into 3 disjoint
subsets - Emulation events run only in emulation mode
- Example accept/integrate real-world reports
- Simulation events run only in simulation mode
- Example simulate sensor detections
- Common events run in either mode
- Example compute sensor coverage
17Emulation Events
- Subset of events to be executed only in emulation
mode - void MyObjSetDamageLevel(int level)
- if (InEmulationMode())
- DamageLevel level
-
18Simulation Events
- Subset of events to be executed only in
simulation mode - void RadarGetDetections()
- if (InSimulationMode())
- for(i0 iltNumObjs i)
- SpSimObj obj GetObject(i)
- if (InRange(obj))
- SimulateDetection(obj)
-
-
-
19Differences from Ordinary SPEEDES Simulations
- Building DSF Libraries
- make debug clone
- make clone
- Simulation Identifiers since system generates
many simulations, need identifiers - int SpGetSimId() unique integer ID
- const char SpGetSimName() user-given
descriptive name - Not necessarily unique, but should be
- speedes.par file
- Need CloneData section only contains one string
item, SpServerDir - Directory containing SpeedesServer and
ResourceMgr programs - CloneData
- string SpServerDir /usr/bin/DSF/servers
-
20Differences from Ordinary SPEEDES Simulations ,
cont
- Clone Level
- Emulation has level 0
- Clone of emulation has level 1
- Clone of a clone of emulation has level 2, etc.
- int SpGetCloneLevel() returns clone level of
current simulation - Identification Utilities (macros)
- IS_ORIG_SIM() returns true if this is the
original (i.e. root) simulation false
otherwise - IS_SIM_NAME(name) returns true if current
simulation name matches name argument false
otherwise
21Differences from Ordinary SPEEDES Simulations,
cont
- Cloning from within a simulation
- Simulation can branch off COA by itself (normally
initiated by user) - Example Red side has two good plans, P1 and P2,
to defend Blues attack - Create branch and simulate both plans
- SpSpawn Function
- Spawns clone at given start time runs until end
time - Simulation schedules events in clone to define
COA - SPAWN Macro
- Calls SpSpawn and schedules event to define COA
in clone - P_SPAWN Macro
- Works with process model
- Waits until given start time branches off clone
- User schedules events to define COA in clone
22Initiating a COA Evaluation from simulation or
emulation
- Schedule Event to define COA
- Event time is after clone is spawned
- Example schedule event at t 401.0 to change
Reds plan - Schedule clone spawn
- Example Call SpSpawn with start time of t
400.0 - Event only affects clone
- Example event changes Reds plan in clone only
- void SetRedPlan(PlanType newPlan)
- if (IsChildSim()) // only change child
- RedPlan newPlan
-
23SpSpawn Function
- SpSpawnHandle
- SpSpawn(const SpSimTime beginSim,
- const SpSimTime endSim,
- char simName NULL,
- int numExtModsToWaitFor 0,
- SpCancelHandle spawnInitCancelHandle
SpCancelHandle()) - SpSpawnHandle Class methods
- bool IsParent()
- bool IsChild()
- int GetParentId()
- void CancelSpawn()
24SpSpawn Example
- SpCancelHandle cancelHdl
- SCHEDULE_SetRedPlan(401.0, redCmdr, plan1)
-
- SpSimTime beginSim(400.0), endSim(850.0)
- char simName Red plan1 at 400.0
- int numExtMods 1
- SpSpawnHandle spawnHdl
- SpSpawn(beginSim, endSim, simName,
- numExtMods, cancelHdl)
-
25SPAWN Macro
- SPAWN(start, end, cloneName,
- numExtMods, initEvent, initObjHdl)
- where
- start clone start time
- end clone end time
- cloneName (string) name to identify clone
sim - numExtMods external modules clone waits
for before executing - initEvent name of initialization event
- initObjHdl object initEvent acts on
-
26SPAWN Example
- SpObjHandle redCommanderHdl GetRedCommander()
-
- SpSimTime beginSim(400.0), endSim(850.0)
- char simName Red plan1 at 400.0
- int numExtMods 1
- SPAWN(beginSim, endSim, simName,
- numExtMods, SetRedPlan,
- redCommanderHdl)
27P_SPAWN Macro(Process Model)
- P_SPAWN(start, end, cloneName, numExtMods)
- where
- start clone start time
- end clone end time
- cloneName (string) name to identify clone
sim - numExtMods external modules clone waits
for before executing -
28P_SPAWN Example
- void S_MyObjSpawnCloneFromProcess()
- P_VAR
- P_LV(SpSpawnHandle, sh) // create
SpSpawnHandle var sh - P_BEGIN(1)
- // Spawn clone at t100.0 end at t700.0 with
name - // "Clone 2" dont wait for any external
modules - P_SPAWN(100.0, 700.0, "Clone 2", 0)
- if (sh.IsParent()) // use SpSpawnHandle var
- RB_cout ltlt "Clone 2 spawned by parent" ltlt
endl -
- else // child sim
- RB_cout ltlt SpGetSimName() ltlt " clone is
running! - ltlt endl
-
- P_END
29The Emulation
30Emulation Description
- Mirrors or emulates state of real world
- Runs at wall clock speed, continually integrating
real-world reports inserted through system
interface - Simulated version of reality is starting point
for launching simulations in real time - Integrating Real World Reports
- System Interface (external module) needs 3
capabilities - Alter existing simulation objects
- Delete existing simulation objects
- Create new simulation objects
31Integrating Real World Reports
- Altering existing simulation object
- External module schedules event on object
- Event changes object state
- Creating/Deleting simulation objects
- New macro system creates function for external
module - Calling function schedules event in simulation to
create/delete object in simulation - Must create event for every possible change
- Challenge is anticipating all needed changes and
implement events that perform these changes
32System Interface
33Master System Interface (MSI)
Speedes Server
Master System Interface
Emulation
Real World Reports and User Input
- External module(s) attached to emulation
34Attaching External Module toClone Simulation
Parent Simulation
Speedes Server
Host Router
External Module
External Module
35Subsidiary Interfaces
Speedes Server
Emulation
MSI
Reports
exec()
Clone
External Modules Attached To COA Evaluations
36Building Master Simulation Interface
- SPEEDES External module program
- Based on SpEvaluatorStateMgr class
- New class inherits from SpStateMgr class
- Inherits SpStateMgr capabilities
- Control simulation time advance
- Schedule events in simulation
- Extract information from simulation
- Request predictive simulations
- Request COA evaluations
- Receive status messages about predictive
simulations and COA evaluations - Started, Finished, Failed to start, Aborted, etc.
37Initiating Predictive Simulations from System
Interface
- SpEvaluatorStateMgr method
- int RequestPredictiveSim(double startTime,
- double endTime,
- int numExtModsToWaitFor0,
- const char simNameNULL)
- where
- return val request ID (or 0 if request fails)
38Initiating COA Evaluationsfrom System Interface
- SpEvaluatorStateMgr method
- int RequestCOAEval(double startTime,
- double endTime,
- SpList cloneEvents,
- int numExtModsToWaitFor0,
- const char simNameNULL)
- where
- cloneEvents list of SpCloneEvent objects
(each of which specifies event to be scheduled in
child) list defines the COA - return val request ID (or 0 if request fails)
39Specifying COA Eventswith SpCloneEvent Objects
- SpCloneEvent(
- const SpSimTime evtTime, // event time
- int simObjGlobalId, // ID of object event
acts on - char eventName, // name of event to
schedule - char msg, // Optional SpMsg arg
- int msgBytes, // sizeof(SpMsg)
- char data, // event data buffer
- int dataBytes, // bytes in data buffer
- // How to handle events in past
- TimeMode tmMode IF_IN_PAST_IGNORE
- )
40Receiving Clone Status Messages
- External module requests clone (predictive
simulation or COA evaluation) - Emulation (or simulation) records request
- At time of request, status message sent to
external module that made request - Clone successfully started
- Clone failed to start (e.g. SpeedesServer failed,
communications failure, fork() failure, etc.) - When clone finishes, status message sent to
external module - Clone finished execution
- Clone aborted (e.g. simulation crashed)
- Status messages automatically received as
SpStateMgr events - No need for user to poll for status messages
41SpCloneStatusMsg Class
- SpCloneStatusMsg object delivered to external
module in state manager event - SpCloneStatusMsg object contains following info
- Simulation request ID integer ID returned by
request method - Simulation clone ID integer ID uniquely
identifies all generated simulations - Simulation name name given in request method
- Status Started, Finished, StartFailure,
NoResources, SpeedesServerFailed,
CopyNodeFailure, CommsFailure, Aborted - SpeedesServer connection data information needed
to connect new external module to simulations
SpeedesServer - Machine name
- Port number
42Inserting Reports into Emulation from System
Interface
- Alter existing emulation object
- Schedule event on object
- Create new simulation object
- Delete existing simulation object
43Dynamic Object Creationfrom External Module
- Generalization of interface-style events
- Example Create I_MyInterface.h file invoke
macro - In simulation object .h file, include
I_MyInterface.h file and invoke another macro - Call PLUG_IN_EXTERNAL_EVENT(CreateAircraftEvent)
- In interface file, include I_MyInterface.h file
and call EM_SCHEDULE_ CreateAircraftEvent(x, y,
z) - Function generated by first macro
- Schedules event in simulation to create new
aircraft object - Calls initialization method on new aircraft
object, ac.Init(x,y,z) - Dynamic object deletion works similarly
44Project Issues
45Unix fork() Problem
- Load balancing need to distribute clones across
entire set of available CPUs - Kraken rude awakening
- Kraken is NAVOs IBM P655 HPC
- Kraken has 2944 CPUs, divided into 8-CPU compute
nodes - Ported code to Kraken, began testing
- After much research and discussion with IBM
consultants at NAVO, we learned that forked
process is confined to compute node of parent
process - Cannot distribute clones over set of available
CPUs - Immediately switched to ASCs SGI Origin 3900
- Huge compute nodes (512 CPUs) sidesteps the issue
- SGI OS (IRIX) automatically distributes processes
across CPUs allocated for the batch job
46Software Portability
- Due to fork() issue, software currently
compatible with SGI HPCs and Beowulf Clusters - Beowulf provides bproc_rfork(int compNode)
- SGI is not long-term solution since we want to be
able to access huge banks of CPUs - Possible fix run one emulation on each compute
node - Route real-world reports to all emulations
- Since compute nodes have 512 CPUs, emulation
overhead is relatively small - This fix wont work on HPCs with small compute
nodes (e.g. 8 CPUs/compute node) - General fix
- Software isolates just two platform-dependent
functions - pid ForkTo(long compNode) and long
GetCompNodeId() - Implement ForkTo(long compNode) ourselves
- Checkpoint process and restart on another compute
node - Process migration provided by some cluster
operating systems - Possible other ways to move simulation state
47DemonstrationSystems
48Demonstration Systems
- Running remotely on ASCs SGI HPC ( hpc11 at
Wright-Patterson AFB) - Simple displays are easy, but more complex ones
present problems - For example ASC blocks port to allow opening
X-window on remote display - Succeeded in circumventing these problems, but
interface is not fancy - Batch queue limits maximum demo size to 32 CPUs
- Bigger jobs could get stuck in queue for long
time - Still could be delay of up to 3-4 minutes
- Software works fine with much bigger jobs (e.g.
200 or more CPUs) - Only difference is wait time in queue before job
gets run
49Demonstration System 1Insert Reports Into
Emulation
- System Emulation (simple simulation),
SpeedesServer, Master System Interface (MSI) - Insert reports into emulation from MSI
- Create new simulation object (S_Ship)
- Change speed of simulation object
- Delete existing simulation object
50Demonstration System 2Run Predictive Simulation
- System FSS-based Emulation, SpeedesServer,
Master System Interface (MSI) - Request predictive simulation from MSI
- Emulation launches predictive simulation at
requested start time - External module attaches to predictive simulation
- Opens window to display GVT, FSS messages, MOEs
- Abort, Pause, Resume buttons
- Predictive simulation runs to requested end time
- Final MOEs displayed
51Demonstration System 3Run COA Evaluation
- System FSS-based Emulation, SpeedesServer,
Master System Interface (MSI) - Request COA evaluation from MSI
- Change attack plan reroute 1 fighter so only 5
fighters attack 2 airports (instead of 6
fighters) - Emulation launches COA evaluation at requested
start time - External module attaches to COA evaluation
- Opens window to display GVT, FSS messages, MOEs
- Abort, Pause, Resume buttons
- COA evaluation runs to requested end time
- Final MOEs displayed only 1 airport destroyed
(instead of 2)
52Demonstration System 4 Clone off of COA
Evaluation
- System FSS-based Emulation, SpeedesServer,
Master System Interface (MSI) - Request COA evaluation from MSI
- Change attack plan reroute 1 fighter so only 5
fighters attack 2 airports (instead of 6
fighters) - Emulation launches COA evaluation at requested
start time - COA evaluation branches off COA clone at t6000.0
- Changes number of Red missiles at SAM site
- Both evaluations share same computations until
t6000.0 - External modules attach to both COA evaluations
- Opens window to display GVT, FSS messages, MOEs
- Abort, Pause, Resume buttons
- COA evaluations run to requested end time
- Final MOEs displayed different results for each
run
53Demonstration System 56 Simultaneous COA
Evaluations
- System FSS-based Emulation, SpeedesServer,
Master System Interface (MSI) - Request 6 COA evaluations from MSI
- No missiles removed from SAM site reroute 0/1
fighter - Two missiles removed from SAM site reroute 0/1
fighter - Four missiles removed from SAM site reroute 0/1
fighter - Emulation launches 6 COA evaluations at requested
start time - External modules attach to COA evaluations
- Opens window to display GVT, FSS messages, MOEs
- Abort, Pause, Resume buttons
- COA evaluations run to requested end time
- Final MOEs displayed
54 55Project Review November 2004
- Development work on Host Router component of
clone simulations SpeedesServer - Middleman for connecting external module
interface to clone - Minimal display small box showing status
(Running, Finished, Aborted), simulation time,
Pause/Abort button - Important to provide feedback to user about what
is happening
Comm Server
Parent Simulation
Clone Simulation
External Module
Host Router
SpeedesServer