Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns - PowerPoint PPT Presentation

About This Presentation
Title:

Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns

Description:

Avalon. Taliesin. Gorlois. Uther. Arthur. Gareth ... Avalon. sen. xi,j. Only consider changed values. Crash Failure. Service migration should be avoided! ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 26
Provided by: mgeh2
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns


1
Design of Self-Managing Dependable Systems with
UML and Fault Tolerance Patterns
  • Matthias Tichy, Daniela Schilling, Holger Giese

2
Application Example - RailCab
3
Application Example - RailCab
  • Vision Combine
  • individual transport
  • cost-effectiveness of public transport
  • Autonomously operating shuttles using linear
    drive (maglev train)
  • Build contact-free convoys for energy savings
  • Information about the exact position necessary
  • Computation based on the stator waves

4
Motivation
  • Example service structure
  • Problem position calculation service must be
    highly reliable

PositionCalculation
ConvoyController
StatorWaveSensor
5
Contents
  • Problem position calculation service must be
    highly reliable
  • Solution self-healing, fault-tolerant software
  • Application of software fault tolerance patterns,
    which capture
  • Abstract service structure of fault tolerance
    techniques
  • Deployment restrictions (explicitly taking fault
    tolerance into account)
  • Automatic deployment
  • Initial deployment
  • Self-healing by deployment reconfiguration during
    runtime

6
Fault Tolerance Patterns
  • Capture the abstract structure of a well known
    fault tolerance technique
  • Example Triple Modular Redundancy (TMR)

Triple Modular Redundancy
Service1
Multiplier
Service2
Provider
User
Voter
Service3
7
Fault Tolerance Patterns - Deployment
  • Naïve, unrestricted deployment may yield unwanted
    results
  • Deployment must respect restrictions imposed by
    the fault tolerance technique (redundancy,
    heterogeneity)
  • Instead of manual deployment, enrich fault
    tolerance patterns by deployment restrictions

Arthur
ltltdeploygtgt
ltltdeploygtgt
ltltdeploygtgt
ltltService3gtgt PositionCalculation
ltltService1gtgt PositionCalculation
ltltService2gtgt PositionCalculation
8
Deployment restrictions
  • Deployment restrictions for the TMR pattern

Avoid single-point-of-failure of voter /
multiplier -gt Deploy voter and user to same
node (if the user fails, the failure of the voter
is no problem)
Node1
Node2
Multiplier
Provider
User
Voter
Avoid crash failures -gt Deploy redundant services
to distinct nodes
Service2
Service3
Service1
Node3
Node4
Node5
Heterogeneous hardware platform -gt require
different CPU
Node3.CPU ? Node4.CPU ? Node4.CPU ? Node5.CPU ?
Node3.CPU ? Node 5.CPU
9
Fault Tolerance Patterns
PositionCalculation
ConvoyController
StatorWaveSensor
Application of TMR pattern
ltltService1gtgt pc1Position Calculation
ltltMultipliergtgt mult Multiplier
ltltService2gtgt pc2Position Calculation
ltltProvidergtgt sen StatorWaveSensor
ltltUsergtgt ccConvoy Controller
ltltVotergtgt votVoter
ltltService3gtgt pc3Position Calculation
10
Compute a correct / reliable deployment
Deployment
  • Use a standard constraint solver

11
Compute a correct / reliable deployment
  • Mapping to a standard constraint solver

Services (i)
1 Service pc1 is deployed to node Gorlois
xi,j sen mul pc1 pc2 pc3 vot cc
Avalon
Gareth
Taliesin
Gorlois 1
Uther
Arthur
Nodes (j)
  • Variables xi,j ?0,1
  • Constraint (each service is deployed to exactly
    one node)
  • ? i ? Services ?xi,j 1

j
12
Compute a correct / reliable deployment
  • Restriction Services must be executed on same
    node
  • Graphically
  • Constraint

ltltdeploysgtgt
ltltdeploysgtgt
ltltVotergtgt votVoter
ltltUsergtgt cc ConvoyController
xi,j sen mul pc1 pc2 pc3 vot cc ?
Avalon 0 0 0
Gareth 0 0 0
Taliesin 1 1 2
Gorlois 0 0 0
Uther 0 0 0
Arthur 0 0 0
? j ? Nodes (xvot,jxcc,j) 2 or (xvot,jxcc,j)
0
13
Compute a correct / reliable deployment
  • Initial deployment
  • Graphically

xi,j sen mul pc1 pc2 pc3 vot cc
Avalon 1 1 0 0 0 0 0
Gareth 0 0 0 0 0 0 0
Taliesin 0 0 0 0 0 1 1
Gorlois 0 0 1 0 0 0 0
Uther 0 0 0 1 0 0 0
Arthur 0 0 0 0 1 0 0
pc2
pc3
pc1
Avalon
Taliesin
Gareth
mul
sen
cc
vot
Gorlois
Uther
Arthur
14
Repair
  • Due to TMR, one crashed redundant service is
    tolerable
  • Second crash is not tolerable
  • Therefore Self-heal by restarting the crashed
    service on a working node.
  • But on which node? Compute it timely!

Gorlois
ltltService1gtgt pc1PositionCalculation
ltltdeploysgtgt
Uther
ltltService2gtgt pc2PositionCalculation
ltltdeploysgtgt
Arthur
ltltService3gtgt pc3PositionCalculation
ltltdeploysgtgt
15
Repair
  • Solve restricted deployment problem

Only consider changed values
Gorlois
xi,j sen mul pc1 pc2 pc3 vot cc
Avalon 1 1 ? 0 0 0 0
Gareth 0 0 ? 0 0 0 0
Taliesin 0 0 ? 0 0 1 1
Gorlois 0 0 1 0 0 0 0
Uther 0 0 ? 1 0 0 0
Arthur 0 0 ? 0 1 0 0
ltltdeploysgtgt
Crash Failure
ltltService1gtgt pc1PositionCalculation
  • Fix variables xi,j of unaffected service/node
    combinations
  • Free variables xi,j for affected service/node
    combinations

Service migration should be avoided!
16
Repair
  • No solution found -gt relax the problem
  • First round ?
  • Service migration
  • necessary
  • Second round ?
  • Third round ?

xi,j sen mul pc1 pc2 pc3 vot cc
Avalon 1 1 ? 0 0 0 0
Gareth 0 0 ? 0 0 0 0
Taliesin 0 0 ? 0 0 1 1
Gorlois 0 0 1 0 0 0 0
Uther 0 0 ? 1 0 0 0
Arthur 0 0 ? 0 1 0 0
xi,j sen mul pc1 pc2 pc3 vot cc
Avalon 1 1 ? ? 0 0 0
Gareth 0 0 ? ? 0 0 0
Taliesin 0 0 ? ? 0 1 1
Gorlois 0 0 1 0 0 0 0
Uther 0 0 ? ? 0 0 0
Arthur 0 0 ? ? 1 0 0
xi,j sen mul pc1 pc2 pc3 vot cc
Avalon 1 1 ? ? ? 0 0
Gareth 0 0 ? ? ? 0 0
Taliesin 0 0 ? ? ? 1 1
Gorlois 0 0 1 0 0 0 0
Uther 0 0 ? ? ? 0 0
Arthur 0 0 ? ? ? 0 0
  • Fixed variables
  • Free variables

17
Tool Support
  • Current work

18
Conclusions / Future Work
  • Fault tolerance patterns capture fault tolerance
    techniques
  • Contain Structure Deployment restrictions
  • Graphical specification
  • Easy application
  • Automatic deployment
  • Initial deployment
  • Self-healing by deployment reconfiguration during
    runtime
  • Future Work
  • Heterogeneous service restrictions in pattern
    application
  • Synthesize behavior of voter and multiplier
    services
  • Use fault tolerance application knowledge for
    automatic fault tree analysis

.de
www.
19
  • Questions?

.de
www.
20
Compute a correct / reliable deployment
  • Restriction Services must be deployed on
    distinct nodes
  • Graphically
  • Constraint

ltltService1gtgt pc1 PositionCalculation
ltltdeploysgtgt
ltltService2gtgt pc2 PositionCalculation
ltltdeploysgtgt
xi,j sen mul pc1 pc2 pc3 vot cc ?
Avalon 0 0 0
Gareth 0 0 0
Taliesin 0 0 0
Gorlois 1 0 1
Uther 0 1 1
Arthur 0 0 0
? j ? Nodes (xpc1,jxpc2,j) lt1
21
Compute a correct / reliable deployment
  • Restriction CPU of nodes for redundant service
    must be distinct
  • Constraint

xs1,j2 ? xs2,j2 ? xs3,j3 ? j1.cpu ? j2.cpu ?
j3.cpu
22
Multistage
  • Multiple applications of TMR
  • Transform to a multiple stage arrangement
  • with tripled voter and multiplier
  • Deployment restrictions are transformed too

ltltMultipliergtgt PCMultiplier
ltltUsergtgt ltltService1gtgt ccConvoy
ltltVotergtgt Voter
ltltService1gtgt ltltProvidergtgt pc1Position Calculation
ltltMultipliergtgt mult Multiplier
ltltService2gtgt ltltProvidergtgt pc2Position Calculation
ltltProvidergtgt senSensor
ltltUsergtgt ltltService2gtgt ccConvoy
ltltVotergtgt Voter
ltltMultipliergtgt PCMultiplier
ltltService3gtgt ltltProvidergtgt pc3Position Calculation
ltltMultipliergtgt PCMultiplier
ltltUsergtgt ltltService3gtgt ccConvoy
ltltVotergtgt Voter
23
Tool Support
  • Already working
  • Graphical specification of fault tolerance
    patterns and deployment restrictions
  • Automatic mapping to ILOG solver software
  • Next
  • Runtime support

24
Motivation
  • System without fault-tolerance, node crash
  • (1) fault tolerance pattern
  • Introduce redundancy (TMR) but deploy two
    redundant service on one node, crash of that node
  • Better deploy redundant services to distinct
    nodes
  • (2) deployment restrictions for fault tolerance
    pattern
  • Example with distinct nodes, crash failure,
    everything ok
  • (3) which nodes
  • Now self-heal the system by restarting a crashed
    failure on a new node
  • (4) which node

25
Motivation
  • Goals
  • Easy application of fault tolerance techniques
  • Deployment issues taken into account
  • Easy deployment specification
  • Reusable deployment specification
Write a Comment
User Comments (0)
About PowerShow.com