RTESCMS Potential Collaboration - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

RTESCMS Potential Collaboration

Description:

Faults must be detected/corrected ASAP. semi-autonomously ... Auto-generate marshalling-demarshalling interfaces for communication. April 28, 2005 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 40
Provided by: MHA138
Category:

less

Transcript and Presenter's Notes

Title: RTESCMS Potential Collaboration


1
RTES/CMS Potential Collaboration
  • RTES Group
  • Vanderbilt University
  • University of Illinois, Urbana Champaign
  • University of Pittsburgh
  • Syracuse University
  • Fermilab

(NSF ITR grant ACI-0121658)
2
Outline
  • RTES Overview
  • Goals, Team, Deliverables
  • Tool Approach
  • Modeling, Armors, VLA
  • Demo Description
  • Potential Collaborations
  • System Configuration
  • Run-Control
  • Fault Mitigation
  • GUI

3
RTES Team
  • The Real Time Embedded System Group
  • A collaboration of five institutions,
  • University of Illinois
  • University of Pittsburgh
  • University of Syracuse
  • Vanderbilt University (PI)
  • Fermilab
  • Physicists and Computer Scientists/Electrical
    Engineers with expertise in
  • High performance, real-time system software and
    hardware,
  • Reliability and fault tolerance,
  • System specification, generation, and modeling
    tools.
  • NSF ITR grant ACI-0121658

4
RTES Goals
  • High availability
  • Fault handling infrastructure capable of
  • Accurately identifying problems (where, what, and
    why)
  • Compensating for problems (shift the load,
    changing thresholds)
  • Automated recovery procedures (restart /
    reconfiguration)
  • Accurate accounting
  • Extensibility (capturing new detection/recovery
    procedures)
  • Policy driven monitoring and control
  • Dynamic reconfiguration
  • adjust to potentially changing resources

5
RTES Goals (continued)
  • Faults must be detected/corrected ASAP
  • semi-autonomously
  • with as little human intervention as possible
  • distributed and hierarchical monitoring and
    control
  • Life-cycle maintainability and evolvability
  • to deal with new algorithms, new hardware and
    new versions of the OS
  • User-defined Actions
  • Customized to application/users

6
The RTES Solution (for BTeV)
Modeling
Analysis
Resource
Reconfigure
Performance Diagnosability Reliability
Synthesis
Design and Analysis
Fault Behavior
Feedback
Algorithms
Synthesis
Runtime
Region Operations Mgr
ExperimentControl Interface
L1
L2/3
Soft Real Time
Hard
High Level Hierarchical fault management
Low Level
7
RTES Concepts
  • A hierarchical fault management system and
    toolkit
  • Model Integrated Computing
  • GME (Generic Modeling Environment) system
    modeling tools
  • ARMORs (Adaptive, Reconfigurable, and Mobile
    Objects for Reliability)
  • Robust framework for detection and reaction to
    faults in processes
  • VLAs (Very Lightweight Agents for limited
    resource environments)
  • Sensors/actuators to monitor/mitigate at every
    level

8
Configuration through Modeling
  • Multi-aspect tool, separate views of
  • Hardware components and physical connectivity
  • Executables configuration and logical
    connectivity
  • Fault handling behavior using hierarchical state
    machines
  • Model interpreters can generate the system
  • At the code fragment level (for fault handling)
  • Download scripts and configurations
  • Modeling languages are application specific
  • Shapes, properties, associations, constraints
  • Appropriate for application/context
  • System model
  • Messaging
  • Fault mitigation
  • GUI, etc.

9
Modeling Environment GME
  • Fault handling
  • Process dataflow
  • HW Configuration

GME is an Open-Source, Meta-configurable,
multi-aspect graphical modeling tool
10
System Integration Modeling Language SIML
  • Model Component Hierarchy and Interactions
  • Loosely specified model of computation
  • Model information relevant for system
    configuration
  • Links to other narrowly focused modeling
    languages
  • provides overall picture and access to models in
    other languages
  • Overall Deployment View

11
System Architecture expressed with SIML
  • RunControl Manager
  • Router Information
  • How many regions ?
  • How many worker nodes inside the region?
  • Node Identification information

12
SIML - Generation
  • Configuration files
  • Build Scripts
  • Deployment Scripts
  • Router Configurations

13
Data Type Modeling Language DTML
  • Modeling of Data Types and Structures
  • Auto-generate marshalling-demarshalling
    interfaces for communication

14
Fault Mitigation Modeling Language - FMML
C
  • Specification of Fault Mitigation Behavior using
    Hierarchical Finite State Machines (A)
  • Configuration and instantiation of FM behaviors
    as ARMORs (B)
  • Specification of FM Triggering Communication (C)

A
B
15
FMML Generation
  • Model translator generates fault-tolerant
    strategies and communication flow strategy from
    FMML models
  • Strategies are plugged into ARMOR infrastructure
    as ARMOR elements
  • ARMOR infrastructure uses these custom elements
    to provide customized fault-tolerant protection
    to the application

16
User Interface Modeling Language
  • Enables reconfiguration of user interfaces
  • Structural and data flow codes generated from
    models
  • User Interface produced by running the generated
    code

17
User Interface Generation
Generator
18
RTES Demonstration at IEEE RTAS05
  • Used Tools and Models to Generate a Family of
    Demos
  • 4, 16, 32, and 64 Processor Systems
  • Demonstrates Fault Mitigation in a L2/L3 Trigger
    Prototype for BTeV
  • GUI Matlab-based
  • GUI design specified by GME models (GUIML)
  • Network/Messaging Elvin publish/subscribe
  • Messages defined by GME models (DTML)
  • RunControl (RC) state machines
  • Defined by GME models (SIML)
  • Infrastructure ARMORs
  • Custom Fault Mitigation elements defined by
    models (FMML)
  • Application L2/3 FilterApp, DataSource
  • Actual physics trigger code
  • File-reader supplies physics/simulation data to
    the FilterApp

19
(No Transcript)
20
Potential RTES contributions to CMS
  • Graphical Modeling Tools for
  • Specifying FunctionManager StateMachines with
    FMML
  • Specifying Communication Messages at a
    higher-level of abstraction with DTML
  • Can synthesize serialization/deserialization code
    for the specific implementation technology such
    as SOAP
  • Designing GUIs independent of the implementation
    technology with GUIML
  • Can synthesize Java applet code for rendering and
    communication over SOAP
  • Designing System Configurations with SIML a la
    DuckCAD
  • Can synthesize artifacts in addition to XML
    configuration files
  • Fault Tolerance Approach and Concepts
  • Hierarchical Fault Mitigation via
    collaborating/coordinating FM Managers
  • Custom fault-mitigation behavior specification as
    hierarchical finite state machines
  • ARMORs and VLAs

21
Potential RTES/RCMS Mapping
SIML
DTML
Configures
FMML
GML
And others
22
Modeling Configurations with SIML
XDAQ ptMAZE example
Defining a partition or region
  • lt?xml version'1.0'?gt
  • ltPartitiongt
  • ltDefinitionsgt
  • ltClassDef id"15"gtptMAZElt/ClassDefgt
  • ltClassDef id"11"gtRoundTriplt/ClassDefgt
  • lt/Definitionsgt
  • ltHost id "0" url"http//host140000"gt
  • ltAddress type"ptMAZE"
  • port"56"
  • boardId"0"
  • service"maze_service_immediate"
  • switch"MAZE_SWITCH_M3E128"/gt
  • ltApplication class"RoundTrip"
  • targetAddr"auto"
  • instance"0" network"ptMAZE"gt
  • ltDefaultParametersgt
  • ltParameter name"samples"
  • type"unsigned long"gt
  • 1000000

Host or application attributes
23
Modeling Configurations with SIML
XDAQ ptMAZE example
Defining communications
  • ltTransport class"ptMAZE"
  • targetAddr"auto" instance"0"gt
  • ltDefaultParametersgt
  • ltParameter name"pollingMode" type"bool"gt
  • false
  • lt/Parametergt
  • ltParameter name"mtuSize" type"int"gt
  • 4096
  • lt/Parametergt
  • . . .
  • lt/DefaultParametersgt
  • lt/Transportgt
  • lturlTransportgt
  • //linux/x86/libptMAZE.so
  • lt/urlTransportgt
  • lt/Hostgt
  • ltHost id "1" url"http//host240000"gt
  • ltAddress type"ptMAZE"

protocol attributes
Defining an application
24
Function Manager StateMachine Artifacts
  • Statemachine.java
  • Setup state-machine.
  • States.java
  • Set of possible states.
  • Inputs.java
  • Set of possible triggers.
  • TransitionActions.java
  • During state transition.
  • TransitionFailedAction.java
  • Transition Action failed.
  • StateChangedAction.java
  • When state has changed.
  • FailureAction.java
  • Transition failed.

25
StateMachine.java
  • public class HelloStateMachine extends
    UserStateMachine
  • ...
  • StateMachineDefinition fsmdef new
    StateMachineDefinition()
  • // Inputs (Commands)
  • fsmdef.addInput(HelloInputs.GOTOHELLO)
  • fsmdef.addInput(HelloInputs.GOTOINIT)
  • // Initial state
  • fsmdef.setInitialState (HelloStates.INITIAL)
  • // States
  • fsmdef.addState(HelloStates.INITIAL)
  • fsmdef.addState(HelloStates.HELLO)
  • fsmdef.addTransition(
  • HelloInputs.GOTOHELLO,
  • HelloStates.INITIAL,

As expressed in FMML Models
26
States.java
  • public final class HelloStates
  • public static final State INITIAL new State(
    "Initial" )
  • public static final State HELLO new State(
    "Hello" )
  • public static final State ERROR new State(
    "Error" )

27
Inputs.java
  • public class HelloInputs
  • public static final Input GOTOHELLO new Input(
    "GoToHello" )
  • public static final Input GOTOINIT new Input(
    "GoToInit" )

28
TransitionActions.java
  • public class HelloTransitionActions
  • extends UserTransitionActions
  • public void helloAction()
  • throws UserActionException
  • System.out.println("helloAction Executed" )
  • logger.info( "helloAction Executed")

29
TransitionFailedAction.java
  • public class HelloTransitionFailedActions
  • extends UserTransitionFailedActions
  • public void helloFailedAction()
  • throws UserActionException
  • logger.info("Executing helloFailedAction")
  • getUserStateMachine().setState(DaqkitStates.ERROR
    )
  • logger.info(helloFailedAction Executed")

(This requires extensions to the modeling
language)
30
SOAP Messages - Client-Server ExampleSerializing
SOAPName commandName envelope.createName (
increment ) SOAPName originator
envelope.createName ( originator ) SOAPName
targetAddr envelope.createName ( targetAddr
) SOAPBody body envelope.getBody() SOAPElement
command body.addBodyElement ( commandName )
..
We can provide an abstract API call for creating
message such that the user code need not have
any understanding of the underlying SOAP calls
31
Deserializing SOAP Messages
SOABBody body reply.getSOAPPart().getEnvelope().
getBody() if (body.hasFault()) SOAPFault
fault body.getFault() string msg Server
error msg fault.getFaultString() XDAQ_RAI
SE (xdaqException, msg) else SOAPName
counterTag (Counter, , ) vectorltSOAPElement
gt content body.getChildElements() for (int i
0 i lt content.size() i)
vectorltSOAPElementgt c contenti.getChildElemen
ts(counterTag) for (int j 0 j lt c.size()
j) 54 if (c0.getElementName()
counterTag) cout ltlt The server replied with
counter cout ltlt c0.getValue() ltlt endl
59
Reply to the message deserializing
32
I2O Message RU Builder example
  • DTML language
  • allows both simple
  • and composite types
  • Floats ,integers ,
  • signed , unsigned
  • can be specified.
  • Corresponding
  • marshall demarshall
  • code can be
  • generated from models

33
Discussions
  • Relevancy/Interest?
  • Tuning the concepts.
  • Next Steps?
  • More documentation CMS/XDAQ
  • GUI, SOAP messages, I2O Messages, State Machines,
    Data monitor,
  • XDAQ Examples
  • Full, Larger-scale, Applications
  • How can we contribute?
  • Visit? June 4th
  • Goals, Preparations?

34
Backup Slides
35
ARMOR Adaptive Reconfigurable Mobile Objects of
Reliability
36
Very Lightweight Agents
  • Minimal footprint
  • Platform independence
  • Employable everywhere in the system!
  • Monitors hardware and software
  • Handles fault detection communications with
    higher level entities

37
L2/3 Prototype Farm Setup
38
The Demonstration System Architecture
public
private
39
Matlab GUI
  • Monitoring/Display of Node, and Region Health and
    Performance
  • Command Interface for starting/stopping system
  • Debug Interface for injecting faults
Write a Comment
User Comments (0)
About PowerShow.com