High Performance Computing Engine for Predictive Battlespace Awareness: Final Presentation Air Force Phase II SBIR Topic No. AF03-094 Contract No. FA8750-04-C-0066 - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

High Performance Computing Engine for Predictive Battlespace Awareness: Final Presentation Air Force Phase II SBIR Topic No. AF03-094 Contract No. FA8750-04-C-0066

Description:

'Emulation' runs continually, mirrors state of battle by integrating real ... NAVO's kraken (IBM P655): 368 compute nodes with 8 CPUs/node = 2944 CPUs total ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 56
Provided by: garyb64
Category:

less

Transcript and Presenter's Notes

Title: High Performance Computing Engine for Predictive Battlespace Awareness: Final Presentation Air Force Phase II SBIR Topic No. AF03-094 Contract No. FA8750-04-C-0066


1
High Performance Computing Engine for Predictive
Battlespace Awareness Final Presentation Air
Force Phase II SBIRTopic No. AF03-094Contract
No. FA8750-04-C-0066
  • Gary Blank
  • Metron, Inc.
  • May 31, 2006

2
Presentation Outline
  • Overview of Software Framework
  • Called Dynamic Simulation Framework (DSF)
  • Detailed Description of DSF
  • System Components Multiprocessor, SPEEDES
    Simulation, Emulation, System Interface
  • Project issues

3
Dynamic Simulation System
  • Emulation runs continually, mirrors state of
    battle by integrating real-world reports
  • System interface provides gateway for inserting
    reports
  • Predictive simulation
  • Copy emulation and quickly run into future
  • Course of action (COA) evaluation
  • Copy emulation, insert COA information and run
    simulation to assess COA
  • Simulation Interface
  • Display/control emulation and simulations,
    request simulations, display simulation results

4
Emulation Mirrors Real-World State
Emulation
System Interface
Real- World Reports
5
Predictive Simulation
Copy Emulation State
Simulate into future
Clone
Emulation
System Interface
Request Predictive Simulaiton
Return MOE Report
6
COA Evaluation
Copy and Insert COA Specifics
Launch First Two Evaluations
COA2
Emulation
COA1
System Interface
Request COA Evaluations
Return MOE Reports
7
System Components
  • Multiprocessor Machine
  • High-Performance Computer (HPC) or Cluster
  • SPEEDES-Based Simulation
  • Emulation simulation specialized to receive
    real-world reports, mirror relevant state
  • System Interface
  • Master System Interface (MSI) external module
    attached to emulation
  • Subsidiary Interfaces external modules attached
    to generated simulations
  • Simulation Output
  • MOE Reports
  • Any other useful output

8
MultiprocessorMachines
9
Multiprocessor Architecture
  • General Architecture
  • N CPUs divided into subsets of K CPUs that share
    RAM
  • Compute Node subset of K CPUs that share RAM
  • Examples
  • ASCs hpc11 (SGI Origin 3900) 4 compute nodes
    with 512 CPUs/node 2048 CPUs total
  • NAVOs kraken (IBM P655) 368 compute nodes with
    8 CPUs/node 2944 CPUs total
  • ASCs hpc10 (Compaq SC-45) 128 compute nodes
    with 4 CPUs/node 1024 CPUs total
  • Other multiprocessors have 2, 16, 32 CPUs/compute
    node (probably others)

10
Typical Multiprocessor Architecture
Compute Node
Compute Node
Compute Node
? ? ?
Shared Memory
Shared Memory
Shared Memory
CPUs
High-Speed Communications
11
System Resources
  • Need to tell system what resources it has to work
    with InitialResources.par file
  • Currently CPUs are only resource
  • Need to specify number CPUs and how distributed
    over compute nodes
  • HPCs have batch queues submit job running on N
    CPUs
  • Resource Manager
  • Separate component working with system
    (ResourceMgr program)
  • Balances workload over available CPUs
  • Tries to allocate favorable communications
    topology for COA evaluations and predictive
    simulations

12
SPEEDES Communications Topology
Shared Memory
Shared Memory
TCP/IP Comms
0
1
2
3
4
5
6
7
  • 8-Node SPEEDES Simulation
  • Split across two 4-CPU compute nodes
  • Good topology for this machine

13
InitialResources.par File
  • AvailableCPUs
  • ComputeNode0 // Compute node name
  • int CompNodeID 110 // ID of compute node
  • int NumCPUs 8 // 8 cpus on ComputeNode0
  • ComputeNode1 // Compute node name
  • int CompNodeID 111 // ID of compute node
  • int NumCPUs 16 // 16 cpus on ComputeNode1
  • ComputeNode2 // Compute node name
  • int CompNodeID 112 // ID of compute node
  • int NumCPUs 32 // 32 cpus on ComputeNode2

14
SPEEDES-BasedSimulation
15
SPEEDES-Based SimulationComponent
  • Parallel simulation distributed over two or more
    CPUs
  • Simulation Modes same executable must run in two
    different modes
  • Emulation mode
  • Run at wall clock speed
  • Accept/integrate real-world reports
  • Simulate little or nothing
  • Simulation mode
  • Run as fast as possible
  • Do not accept outside reports
  • Simulate all future events/state and return MOEs

16
Simulation Mode Events
  • Partition simulation events into 3 disjoint
    subsets
  • Emulation events run only in emulation mode
  • Example accept/integrate real-world reports
  • Simulation events run only in simulation mode
  • Example simulate sensor detections
  • Common events run in either mode
  • Example compute sensor coverage

17
Emulation Events
  • Subset of events to be executed only in emulation
    mode
  • void MyObjSetDamageLevel(int level)
  • if (InEmulationMode())
  • DamageLevel level

18
Simulation Events
  • Subset of events to be executed only in
    simulation mode
  • void RadarGetDetections()
  • if (InSimulationMode())
  • for(i0 iltNumObjs i)
  • SpSimObj obj GetObject(i)
  • if (InRange(obj))
  • SimulateDetection(obj)

19
Differences from Ordinary SPEEDES Simulations
  • Building DSF Libraries
  • make debug clone
  • make clone
  • Simulation Identifiers since system generates
    many simulations, need identifiers
  • int SpGetSimId() unique integer ID
  • const char SpGetSimName() user-given
    descriptive name
  • Not necessarily unique, but should be
  • speedes.par file
  • Need CloneData section only contains one string
    item, SpServerDir
  • Directory containing SpeedesServer and
    ResourceMgr programs
  • CloneData
  • string SpServerDir /usr/bin/DSF/servers

20
Differences from Ordinary SPEEDES Simulations ,
cont
  • Clone Level
  • Emulation has level 0
  • Clone of emulation has level 1
  • Clone of a clone of emulation has level 2, etc.
  • int SpGetCloneLevel() returns clone level of
    current simulation
  • Identification Utilities (macros)
  • IS_ORIG_SIM() returns true if this is the
    original (i.e. root) simulation false
    otherwise
  • IS_SIM_NAME(name) returns true if current
    simulation name matches name argument false
    otherwise

21
Differences from Ordinary SPEEDES Simulations,
cont
  • Cloning from within a simulation
  • Simulation can branch off COA by itself (normally
    initiated by user)
  • Example Red side has two good plans, P1 and P2,
    to defend Blues attack
  • Create branch and simulate both plans
  • SpSpawn Function
  • Spawns clone at given start time runs until end
    time
  • Simulation schedules events in clone to define
    COA
  • SPAWN Macro
  • Calls SpSpawn and schedules event to define COA
    in clone
  • P_SPAWN Macro
  • Works with process model
  • Waits until given start time branches off clone
  • User schedules events to define COA in clone

22
Initiating a COA Evaluation from simulation or
emulation
  • Schedule Event to define COA
  • Event time is after clone is spawned
  • Example schedule event at t 401.0 to change
    Reds plan
  • Schedule clone spawn
  • Example Call SpSpawn with start time of t
    400.0
  • Event only affects clone
  • Example event changes Reds plan in clone only
  • void SetRedPlan(PlanType newPlan)
  • if (IsChildSim()) // only change child
  • RedPlan newPlan

23
SpSpawn Function
  • SpSpawnHandle
  • SpSpawn(const SpSimTime beginSim,
  • const SpSimTime endSim,
  • char simName NULL,
  • int numExtModsToWaitFor 0,
  • SpCancelHandle spawnInitCancelHandle
    SpCancelHandle())
  • SpSpawnHandle Class methods
  • bool IsParent()
  • bool IsChild()
  • int GetParentId()
  • void CancelSpawn()

24
SpSpawn Example
  • SpCancelHandle cancelHdl
  • SCHEDULE_SetRedPlan(401.0, redCmdr, plan1)
  • SpSimTime beginSim(400.0), endSim(850.0)
  • char simName Red plan1 at 400.0
  • int numExtMods 1
  • SpSpawnHandle spawnHdl
  • SpSpawn(beginSim, endSim, simName,
  • numExtMods, cancelHdl)

25
SPAWN Macro
  • SPAWN(start, end, cloneName,
  • numExtMods, initEvent, initObjHdl)
  • where
  • start clone start time
  • end clone end time
  • cloneName (string) name to identify clone
    sim
  • numExtMods external modules clone waits
    for before executing
  • initEvent name of initialization event
  • initObjHdl object initEvent acts on

26
SPAWN Example
  • SpObjHandle redCommanderHdl GetRedCommander()
  • SpSimTime beginSim(400.0), endSim(850.0)
  • char simName Red plan1 at 400.0
  • int numExtMods 1
  • SPAWN(beginSim, endSim, simName,
  • numExtMods, SetRedPlan,
  • redCommanderHdl)

27
P_SPAWN Macro(Process Model)
  • P_SPAWN(start, end, cloneName, numExtMods)
  • where
  • start clone start time
  • end clone end time
  • cloneName (string) name to identify clone
    sim
  • numExtMods external modules clone waits
    for before executing

28
P_SPAWN Example
  • void S_MyObjSpawnCloneFromProcess()
  • P_VAR
  • P_LV(SpSpawnHandle, sh) // create
    SpSpawnHandle var sh
  • P_BEGIN(1)
  • // Spawn clone at t100.0 end at t700.0 with
    name
  • // "Clone 2" dont wait for any external
    modules
  • P_SPAWN(100.0, 700.0, "Clone 2", 0)
  • if (sh.IsParent()) // use SpSpawnHandle var
  • RB_cout ltlt "Clone 2 spawned by parent" ltlt
    endl
  • else // child sim
  • RB_cout ltlt SpGetSimName() ltlt " clone is
    running!
  • ltlt endl
  • P_END

29
The Emulation
30
Emulation Description
  • Mirrors or emulates state of real world
  • Runs at wall clock speed, continually integrating
    real-world reports inserted through system
    interface
  • Simulated version of reality is starting point
    for launching simulations in real time
  • Integrating Real World Reports
  • System Interface (external module) needs 3
    capabilities
  • Alter existing simulation objects
  • Delete existing simulation objects
  • Create new simulation objects

31
Integrating Real World Reports
  • Altering existing simulation object
  • External module schedules event on object
  • Event changes object state
  • Creating/Deleting simulation objects
  • New macro system creates function for external
    module
  • Calling function schedules event in simulation to
    create/delete object in simulation
  • Must create event for every possible change
  • Challenge is anticipating all needed changes and
    implement events that perform these changes

32
System Interface
33
Master System Interface (MSI)
Speedes Server
Master System Interface
Emulation
Real World Reports and User Input
  • External module(s) attached to emulation

34
Attaching External Module toClone Simulation
Parent Simulation
Speedes Server
Host Router
External Module
External Module
35
Subsidiary Interfaces
Speedes Server
Emulation
MSI
Reports
exec()
Clone
External Modules Attached To COA Evaluations

36
Building Master Simulation Interface
  • SPEEDES External module program
  • Based on SpEvaluatorStateMgr class
  • New class inherits from SpStateMgr class
  • Inherits SpStateMgr capabilities
  • Control simulation time advance
  • Schedule events in simulation
  • Extract information from simulation
  • Request predictive simulations
  • Request COA evaluations
  • Receive status messages about predictive
    simulations and COA evaluations
  • Started, Finished, Failed to start, Aborted, etc.

37
Initiating Predictive Simulations from System
Interface
  • SpEvaluatorStateMgr method
  • int RequestPredictiveSim(double startTime,
  • double endTime,
  • int numExtModsToWaitFor0,
  • const char simNameNULL)
  • where
  • return val request ID (or 0 if request fails)

38
Initiating COA Evaluationsfrom System Interface
  • SpEvaluatorStateMgr method
  • int RequestCOAEval(double startTime,
  • double endTime,
  • SpList cloneEvents,
  • int numExtModsToWaitFor0,
  • const char simNameNULL)
  • where
  • cloneEvents list of SpCloneEvent objects
    (each of which specifies event to be scheduled in
    child) list defines the COA
  • return val request ID (or 0 if request fails)

39
Specifying COA Eventswith SpCloneEvent Objects
  • SpCloneEvent(
  • const SpSimTime evtTime, // event time
  • int simObjGlobalId, // ID of object event
    acts on
  • char eventName, // name of event to
    schedule
  • char msg, // Optional SpMsg arg
  • int msgBytes, // sizeof(SpMsg)
  • char data, // event data buffer
  • int dataBytes, // bytes in data buffer
  • // How to handle events in past
  • TimeMode tmMode IF_IN_PAST_IGNORE
  • )

40
Receiving Clone Status Messages
  • External module requests clone (predictive
    simulation or COA evaluation)
  • Emulation (or simulation) records request
  • At time of request, status message sent to
    external module that made request
  • Clone successfully started
  • Clone failed to start (e.g. SpeedesServer failed,
    communications failure, fork() failure, etc.)
  • When clone finishes, status message sent to
    external module
  • Clone finished execution
  • Clone aborted (e.g. simulation crashed)
  • Status messages automatically received as
    SpStateMgr events
  • No need for user to poll for status messages

41
SpCloneStatusMsg Class
  • SpCloneStatusMsg object delivered to external
    module in state manager event
  • SpCloneStatusMsg object contains following info
  • Simulation request ID integer ID returned by
    request method
  • Simulation clone ID integer ID uniquely
    identifies all generated simulations
  • Simulation name name given in request method
  • Status Started, Finished, StartFailure,
    NoResources, SpeedesServerFailed,
    CopyNodeFailure, CommsFailure, Aborted
  • SpeedesServer connection data information needed
    to connect new external module to simulations
    SpeedesServer
  • Machine name
  • Port number

42
Inserting Reports into Emulation from System
Interface
  • Alter existing emulation object
  • Schedule event on object
  • Create new simulation object
  • Delete existing simulation object

43
Dynamic Object Creationfrom External Module
  • Generalization of interface-style events
  • Example Create I_MyInterface.h file invoke
    macro
  • In simulation object .h file, include
    I_MyInterface.h file and invoke another macro
  • Call PLUG_IN_EXTERNAL_EVENT(CreateAircraftEvent)
  • In interface file, include I_MyInterface.h file
    and call EM_SCHEDULE_ CreateAircraftEvent(x, y,
    z)
  • Function generated by first macro
  • Schedules event in simulation to create new
    aircraft object
  • Calls initialization method on new aircraft
    object, ac.Init(x,y,z)
  • Dynamic object deletion works similarly

44
Project Issues
45
Unix fork() Problem
  • Load balancing need to distribute clones across
    entire set of available CPUs
  • Kraken rude awakening
  • Kraken is NAVOs IBM P655 HPC
  • Kraken has 2944 CPUs, divided into 8-CPU compute
    nodes
  • Ported code to Kraken, began testing
  • After much research and discussion with IBM
    consultants at NAVO, we learned that forked
    process is confined to compute node of parent
    process
  • Cannot distribute clones over set of available
    CPUs
  • Immediately switched to ASCs SGI Origin 3900
  • Huge compute nodes (512 CPUs) sidesteps the issue
  • SGI OS (IRIX) automatically distributes processes
    across CPUs allocated for the batch job

46
Software Portability
  • Due to fork() issue, software currently
    compatible with SGI HPCs and Beowulf Clusters
  • Beowulf provides bproc_rfork(int compNode)
  • SGI is not long-term solution since we want to be
    able to access huge banks of CPUs
  • Possible fix run one emulation on each compute
    node
  • Route real-world reports to all emulations
  • Since compute nodes have 512 CPUs, emulation
    overhead is relatively small
  • This fix wont work on HPCs with small compute
    nodes (e.g. 8 CPUs/compute node)
  • General fix
  • Software isolates just two platform-dependent
    functions
  • pid ForkTo(long compNode) and long
    GetCompNodeId()
  • Implement ForkTo(long compNode) ourselves
  • Checkpoint process and restart on another compute
    node
  • Process migration provided by some cluster
    operating systems
  • Possible other ways to move simulation state

47
DemonstrationSystems
48
Demonstration Systems
  • Running remotely on ASCs SGI HPC ( hpc11 at
    Wright-Patterson AFB)
  • Simple displays are easy, but more complex ones
    present problems
  • For example ASC blocks port to allow opening
    X-window on remote display
  • Succeeded in circumventing these problems, but
    interface is not fancy
  • Batch queue limits maximum demo size to 32 CPUs
  • Bigger jobs could get stuck in queue for long
    time
  • Still could be delay of up to 3-4 minutes
  • Software works fine with much bigger jobs (e.g.
    200 or more CPUs)
  • Only difference is wait time in queue before job
    gets run

49
Demonstration System 1Insert Reports Into
Emulation
  • System Emulation (simple simulation),
    SpeedesServer, Master System Interface (MSI)
  • Insert reports into emulation from MSI
  • Create new simulation object (S_Ship)
  • Change speed of simulation object
  • Delete existing simulation object

50
Demonstration System 2Run Predictive Simulation
  • System FSS-based Emulation, SpeedesServer,
    Master System Interface (MSI)
  • Request predictive simulation from MSI
  • Emulation launches predictive simulation at
    requested start time
  • External module attaches to predictive simulation
  • Opens window to display GVT, FSS messages, MOEs
  • Abort, Pause, Resume buttons
  • Predictive simulation runs to requested end time
  • Final MOEs displayed

51
Demonstration System 3Run COA Evaluation
  • System FSS-based Emulation, SpeedesServer,
    Master System Interface (MSI)
  • Request COA evaluation from MSI
  • Change attack plan reroute 1 fighter so only 5
    fighters attack 2 airports (instead of 6
    fighters)
  • Emulation launches COA evaluation at requested
    start time
  • External module attaches to COA evaluation
  • Opens window to display GVT, FSS messages, MOEs
  • Abort, Pause, Resume buttons
  • COA evaluation runs to requested end time
  • Final MOEs displayed only 1 airport destroyed
    (instead of 2)

52
Demonstration System 4 Clone off of COA
Evaluation
  • System FSS-based Emulation, SpeedesServer,
    Master System Interface (MSI)
  • Request COA evaluation from MSI
  • Change attack plan reroute 1 fighter so only 5
    fighters attack 2 airports (instead of 6
    fighters)
  • Emulation launches COA evaluation at requested
    start time
  • COA evaluation branches off COA clone at t6000.0
  • Changes number of Red missiles at SAM site
  • Both evaluations share same computations until
    t6000.0
  • External modules attach to both COA evaluations
  • Opens window to display GVT, FSS messages, MOEs
  • Abort, Pause, Resume buttons
  • COA evaluations run to requested end time
  • Final MOEs displayed different results for each
    run

53
Demonstration System 56 Simultaneous COA
Evaluations
  • System FSS-based Emulation, SpeedesServer,
    Master System Interface (MSI)
  • Request 6 COA evaluations from MSI
  • No missiles removed from SAM site reroute 0/1
    fighter
  • Two missiles removed from SAM site reroute 0/1
    fighter
  • Four missiles removed from SAM site reroute 0/1
    fighter
  • Emulation launches 6 COA evaluations at requested
    start time
  • External modules attach to COA evaluations
  • Opens window to display GVT, FSS messages, MOEs
  • Abort, Pause, Resume buttons
  • COA evaluations run to requested end time
  • Final MOEs displayed

54
  • THE END

55
Project Review November 2004
  • Development work on Host Router component of
    clone simulations SpeedesServer
  • Middleman for connecting external module
    interface to clone
  • Minimal display small box showing status
    (Running, Finished, Aborted), simulation time,
    Pause/Abort button
  • Important to provide feedback to user about what
    is happening

Comm Server
Parent Simulation
Clone Simulation
External Module
Host Router
SpeedesServer
Write a Comment
User Comments (0)
About PowerShow.com