Requirements Analysis and Design Engineering - PowerPoint PPT Presentation

1 / 150
About This Presentation
Title:

Requirements Analysis and Design Engineering

Description:

Independent reviews; unrelated but knowledgeable group examines the element and ... car won't start unless it is in 'Park' Means of Hazard Control ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 151
Provided by: RobOs3
Category:

less

Transcript and Presenter's Notes

Title: Requirements Analysis and Design Engineering


1
RequirementsAnalysis andDesignEngineering
  • Southern Methodist University
  • CSE 7313

2
Module 17 Validating the system
3
Agenda
  • Traceability
  • Validating the system
  • Safety critical systems

4
Role of traceability
  • Ability to trace is significant factor in quality
    software implementation
  • Tracking relationships and relating them is key
    in many high assurance processes
  • Impact of change is often missed
  • Small changes can create significant safety and
    reliability problems

5
Traceability defined
  • IEEE The degree to which a relationship can be
    established between two or more products of the
    development process, especially products having a
    predecessor-successor or master-subordinate
    relationship to one another

6
Traceability relationship
  • traced-to
  • traced-from

Vision Document (features)
Traceability link
Actor
SW requirement (use case)
7
Traceability relationship
  • Additional meanings can be placed on these
    relationships
  • tested by
  • traced to
  • implemented by

8
Implicit vs Explicit
  • Explicit traceability development of
    relationships stemming from external
    considerations supplied by the team
  • product feature and use case
  • Implicit traceability driven by methodology and
    structure of the model
  • hierarchical requirements have implicit
    relationship between parent and related child

9
Implicit vs Explicit
  • Other implicit examples
  • modeling tools in the development process may
    provide other traceability relationships (use
    cases and actors that interact with the use case)

10
Project Relationships
Need
Note This traceability link is optional, as it
can be derived from the link between the product
fea- ture and the use case selection. This link
is often used to relate the prod- uct features to
the use cases before the use case selections are
written
Traces to
Product Feature
Traces to
Traces to
Use case
Traces to
Software Requirements
Use case selection
11
Additional Traceability Options
  • Additional less traditional elements of a project
    can be traced if they add value
  • issue for unresolved issues
  • assumptions and rationales
  • action items
  • requests for new/revised features
  • glossary and acronym terms
  • bibliographic references

12
Augmented Traceability relationships
Need
Note This traceability link is optional, as it
can be derived from the link between the product
feature and the use case section. This link is
often used to relate the product features to the
use cases before the use case sections are written
The SW requirements make up the formal SRS,
of which the use case model is an interpretation
Traces to
Product Features
Traces to
SW Rqmts
Traces to
Traces to
Use case
Glossary term
Traces to
Traces to
Traces to
In this case, we are tracing items to the
glossary terms, as well as from then,
as described when defining glossary terms as one
of the supporting traceability types
Actor
Use case selection
13
Verification and traceability
  • Must consider whether or not you have correctly
    and completely considered all of the links that
    should be established
  • Deeper consideration often leads to some
    revisions
  • Should hold formal and informal reviews
  • Its not all mechanical processing

14
Validating the system
15
Validation (IEEE)
  • the process of evaluating a system or component
    during or at the end of the development process
    to determine whether it satisfies requirements
  • Use validation to conform that the implemented
    system conforms to the requirements established.

16
Acceptance Tests
  • Bringing the customer into the final validation
    process in order to gain assurance that the
    product works the way the customer really needs
    it to
  • May be part of the contract provisions
  • IT environments do this by a customer alpha or
    beta evaluation

17
Acceptance Tests
  • Based on a specific number of scenarios that the
    user specifies and executes in the usage
    environment
  • Freedom to think outside the box
  • Construct interesting ways to test the system to
    gain confidence that the system works as needed
  • Based on certain key use cases

18
Acceptance Tests
  • Apply these use cases interesting combinations
  • under certain types of system load and other
    environmental factors
  • interoperability with other applications
  • OS dependencies
  • others likely to be present in the users
    environment

19
Acceptance Tests
  • Iterative development environments will have
    generations of acceptance tests run at various
    milestones
  • Will most often find at least some undiscovered
    ruins

20
Validation Testing
  • Primary activities in validation are testing
    activities
  • IEEE 829-1983 IEEE Standard for SW test
    documentation
  • The development process must
  • include planning for test activities
  • time and resources to design the tests
  • time and resources to execute the tests

21
Implementation documentation
User needs
Vision document
SRS package
Requirement specification
Hazard Analysis
Use cases
Implementation units (functions, use case
realizations, modules)
Test protocols
Test suites
22
Validation Traceability
  • Validation traceability gives confidence that two
    important goals have been addressed
  • 1. Do we have enough tests to cover everything
    that needs testing?
  • 2. Do we have any extra or gratuitous tests that
    serve no useful purpose?
  • Validation focuses on whether the product works
    as it supposed to

23
Requirements based testing
  • Quality can be achieved only by testing the
    system against its requirements
  • Many complex systems will pass all unit tests but
    fail as a system
  • unit tests interact in more complex behaviors
  • resulting system has not been adequately tested
    against the requirements

24
Use case and test cases
Test case 2
Test model
Test case 3
Test case 1
(traceability links)
Use case 2
Use case 1
Use cases
25
Testing design constraints
  • Consider design constraints requirements
  • Include design constraints as part of the
    validation effort
  • Many design constraints will yield to simple
    inspections
  • use abbreviated test procedure

26
Design Constraint validation approaches
27
Using ROI to determine effort
  • Must perform cost/benefit wrt VV activities
  • Plan VV activities based on
  • 1. What are the social and economic consequences
    of a failure of our system?
  • 2. How much VV do we need to do to ensure that
    we do not experience these consequences?

28
VV depth
  • Depth defines the level of VV effort to be
    applied to a system element
  • greater the depth, the more resources
  • Match depth of the review to the importance of
    the element
  • inspection
  • simple test
  • extensive white box testing

29
VV depth activities
  • Examination review the code or take some
    measurements.
  • Prescribed and minimally invasive look is taken
    at the element under test
  • minimal depth of review of an element
  • Walkthrough peer group walks the element through
    its paces
  • process is a structured inspection performed by a
    wider audience
  • search for weaknesses, oversights, etc

30
VV depth activities
  • Independent reviews unrelated but knowledgeable
    group examines the element and searches for
    weaknesses
  • may provide additional insights that were not in
    the mind set of the project group

31
VV depth activities
  • Black box test treats the element as a module
    that cannot be internally inspected
  • supply inputs to the box and observe the boxs
    outputs to ensure that the element is working to
    the required standards
  • performed via instrumented code or with system
    emulators and other tools to simulate and record
    operation

32
VV depth activities
  • White box test allows you to open the box and
    examine the internal workings of the element
  • most modules have too many combinatorial pathways
    to test in a reasonable amount of time
  • apply reasonable approach that does not take too
    much time
  • coverage instead of combination

33
VV coverage
  • Coverage defines the extent of coverage of system
    elements to be verified and validated
  • The amount of traceability and the corresponding
    level of specificity in the requirements
    determines coverage

34
What to verify and validate
  • 1. Verify and validate everything
  • smaller projects
  • simple and consistent application
  • ensures uniformity
  • selective VV can be appropriate if you know
    what the risks are
  • omitted elements are done so for a good reason
  • NOT run out of time or money

35
What to verify and validate
  • Possible repercussions
  • embarrassment over and element not conforming to
    customer specification
  • elements not working properly per the
    specification
  • worst case an unsafe product that can cause harm
    to its users

36
What to verify and validate
  • 2. Use a hazard analysis to determine VV
    necessities
  • Hazard analysis is the detailed examination of a
    device from the user and patient perspectives.
    Its purpose is to detect potential design flaws -
    possibilities of failure that could cause harm -
    and to enable manufacturers to correct them
    before a device is released for use

37
What to verify and validate
  • Hazard analysis guides the selection of project
    elements for VV
  • Always perform a hazard analysis for a human
    safety critical system

38
Safety Critical Systems
39
Why worry about safety?
  • Safety is not discussed in the literature
  • Safety is not taught in colleges
  • Without training or guidance, embedded systems
    are assuming more safety roles every day!

40
Examples of safety-related computing systems
  • Medical equipment (monitoring and therapy)
  • Flight computers
  • Automobile braking and engine control
  • Chemical process control
  • Robotic assembly systems
  • couple dozen deaths each year in Japan because of
    wayward robots

41
Examples of safety-related computing systems
  • Military Weapondry
  • Nuclear power plants
  • Financial systems

42
Therac-25 Story
  • Radiation therapy treatment device
  • Released in 1982
  • Used S/W to enhance usability and lower cost of
    production
  • Compounding of process, design, and
    implementation failures led to massive overdoses
    that killed 3 patients
  • Fixing identified problem did not make the device
    safer

43
Other stories
  • First Shuttle launch delayed 2 days because
    backup computer would not start correctly when an
    error was discovered
  • Patriot missle failed because of clock drift and
    effectiveness downgraded from 95 to 13
  • 8080-based cement factory process control system
    mistakenly stacked huge boulders 80 ft above the
    ground which fell and crushed cars and damaged
    the building

44
Other stories
  • Stray electromagetic interference blamed for 19
    robot inflicted deaths in Japan
  • Low energy radiation blamed for several deaths
    related to reprogramming cardiac pacemakers
  • Attempted suicides after incorrect diagnosis of
    diseases

45
Other Stories
  • Grady Booch The last announcement I want to hear
    from a pilot when I am flying is Is there a
    programmer on board?

46
Errors are systematic faults
  • They are designed in
  • The FORTRAN line
  • DO I1.10
  • rather than
  • DO I1,10
  • caused problems for the Project Mercury flight
    control software. This is a design error. Entire
    mainframes have been brought down with an
    inadvertent semicolon !

47
What is safety?
  • Safety is the freedom from accidents or losses
  • Safety is not reliability!
  • Reliability is the probability that a system will
    perform its intended function satisfactorily
  • A handgun is a very reliable piece of equipment
    but it is not very safe!
  • Windows 95 is safe, but not very reliable!

48
What is safety?
  • Safety is not security!
  • Security is protection or defense against attack,
    interference, or espionage

49
Safety related concepts
  • Accident is a loss of some kind, such as injury,
    death, or equipment damage
  • telephone system going down on the East coast was
    a big loss -gt safety in one sense
  • Risk is a combination of the likelihood of an
    accident and its severity
  • risk p(a) s(a)

50
Safety related concepts
  • an airplane crashing has a high severity but a
    very low probability gt ultimately low risk
  • Hazard is a set of conditions and/or events that
    leads to an accident

51
Other safety related concepts
  • A failure is the nonperformance of a system or
    component, a random fault
  • a random failure is one that can be estimated
    from a pdf,
  • failures are events
  • e.g. a component failure

52
Other safety related concepts
  • An error is a systematic fault
  • a systematic fault is a design error
  • Errors are states or conditions
  • e.g. a software bug
  • A fault is either a failure or an error

53
Other safety related concepts
  • Safety must be considered in the context of the
    system, not the component
  • It is less expensive and far more cost effective
    to build in safety early than try to tack it on
    later
  • The Hazard Analysis ties together hazards,
    faults, and safety measures

54
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

55
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

Safety analysis
Handled at the architectural level and
mechanistic level
56
Safety Analysis
  • You must identify the hazards of the system
  • You must identify the faults that can lead to
    hazards
  • You must define safety control measures to handle
    hazards
  • These culminate in the Hazard Analysis
  • The Hazard Analysis feeds into the Requirements
    Specification

57
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

58
Hazard Causes
  • Release of energy
  • electromagnetism (microwave oven)
  • radiation (nulcear power plant)
  • electricity (electrocution hazard from ECG leads)
  • heat (infant warmer)
  • kinetic (runaway train)
  • Release of toxins

59
Hazard Causes
  • Interference with life support or other
    safety-related function
  • Misleading safety personnel
  • Failure to alarm
  • alarming too much - Therac 25. These were
    ignored and people were killed

60
Types of Hazards
  • Actions
  • inappropriate system actions taken
  • F-18 pilot pulling up landing gear
  • appropriate system actions not taken
  • Timing
  • too soon
  • too late
  • fault latency time

61
Types of Hazards
  • Sequence
  • skipping actions
  • actions out of order
  • Amount
  • too much
  • too little

62
Example Hazards
  • Actions
  • incorrectly energizing a medical treatment laser
  • failure to engage landing gear
  • Timing
  • cardiac pacemaker paces too fast
  • flight control surface adjusted too slowly

63
Example Hazards
  • Sequence
  • empty the vat, THEN add the reagent
  • out of sequence network packets controlling
    industrial robot
  • Amount
  • electrocution from muscle stimulator
  • too little oxygen delivered to ventilator patient

64
Means of Hazard Control
  • Obviation the possibility of the hazard can be
    removed by being made physically impossible
  • use incompatible fasteners to prevent cross
    connections
  • Education the hazard can be handled by
    educating the users so that they wont create
    hazardous conditions through equipment misuse
  • dont look down the barrel when cleaning your
    rifle

65
Means of Hazard Control
  • Alarming announcing the hazard to the user when
    it appears so that they can take appropriate
    action
  • alarming when the heart stops beating
  • Interlocks the hazard can be removed by using
    secondary devices and/or logic to intercede when
    a hazard presents itself
  • car wont start unless it is in Park

66
Means of Hazard Control
  • Internal checking the hazard can be handled by
    ensuring that a system can detect that it is
    malfunctioning prior to an incident
  • CRC checks data for corruption whenever it is
    accessed
  • Safety equipment
  • goggles, gloves

67
Means of Hazard Control
  • Restricting access to potential hazards so that
    only knowledgeable users have such access
  • using passwords to prevent inadvertently starting
    service mode
  • Labelling
  • High Voltage -- DO NOT TOUCH

68
Hazard Analysis
What do you do about it?
How long is the exposure to hazard?
How can this happen?
How long to discover?
How long can it be tolerated
How bad if it occurs?
Hazardous condition
How frequently?
Hazard
Level of
Toleran
Fault
Likeli
Detection
Control
Exposure
risk
ce time
hood
time
Measure
time
T1
Hypo-
Severe
5 min
Ventilator
rare
30 sec
Independent
1 min
ventilation
fans
pressure
alarm,
action by
doctor
Esphageal
often
30 sec
C)2 sensor
1 min
Intubation
alarm
User
often
0
Noncompati
0
misattaches
ble
breathing
mechanical
circuit
fasteners
used
Overpressur
Severe
250 ms
Release
rare
50 ms
Secondary
55 ms
e
valve
valve opens
failure
69
When is a system safe enough?
  • (Minimal) No hazards in the absence of faults
  • (Minimal) No hazards in the presence of any
    single point failure
  • a common mode failure is a single point failure
    that affects multiple channels
  • a latent fault is an undetected fault which
    allows another fault to cause a hazard
  • Your mileage may vary depending on the risk
    introduced by your system

70
TUV Single Fault Assessment
71
Single Fault Assessment
  • T0 is fault tolerance time for the first fault
  • T1 is the time after the first fault that the
    second fault is likely (via MTBF)
  • For testing used for safety, test time TT must be
    done periodically
  • TT lt T1 lt T0
  • This is not always possible
  • e.g. RAM tests

72
Safety Measures
  • You cannot depend on a safety measure that you
    cannot test!
  • CAN bus with 2 nodes provides a CRC on mesages
    checked at the chip level, but the chips provide
    no way of testing to see if it is working.
  • Therefore, it cannot be relied on as a safety
    measure

73
Fail-Safe States
  • Off
  • Emergency stop -- immediately cut power
  • Production stop -- stop after the current task
  • Protection stop -- shut down without removing
    power
  • Partial Shutdown
  • Degraded level of functionality

74
Fail-Safe States
  • Hold
  • No functionality, but with safety actions taken
  • Manuel or External control
  • Restart (reboot)

75
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

76
Risk Assessment
  • For each hazard
  • determine the potential severity
  • determine the likelihood of the hazard
  • determine how long the user is exposed to the
    hazard
  • determine whether the risk can be removed

77
TUV Risk Level Determination Chart
W3
W2
W1
S1
1
-
-
G1
2
1
-
E1
G2
3
2
1
S2
G1
4
3
2
E2
G2
5
4
3
E1
6
5
4
S3
E2
7
6
5
S4
8
7
6
Risk parameters S Extent of damage S1 slight
injury S2 severe irreversible injury, to one of
more persons or the death of a single person S3
death of several persons S4 Catestrophic
consequences, several deaths E Exposure
time E1 seldom to relatively infrequent E2
frequent to continuous G Hazard Prevention G1
possible under certain conditions G2 hardly
possible W Occurrence probability of hazardous
event W1 very low W2 low W3 relatively high
78
Sample Risk Assessments
Device
Hazard
Extent of
Exposure
Hazard
Probability
TUV Risk
damage
time
Prevention
level
Microwave
Irradiation
S2
E2
G2
W3
5
oven
Pacemaker
Pace too
S2
E2
G2
W3
5
slowly
Pace too
S2
E2
G2
W3
5
fast
Power
Explosion
S3
E1
--
W3
6
station
Airliner
Crash
S4
E2
G2
W2
8
79
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

80
Safety Measures
  • Safety measures do one of the following
  • remove the hazard
  • reduce the risk
  • identify the hazard to supervisory control
  • The purpose of the safety measure is to ensure
    the system remains in a safe state

81
Safety Measures
  • Adequacy of measures
  • safety measures mut be able to reliably detect
    the fault
  • safety measures must be able to take appropriate
    actions

Component
Fault/Error
Software class
Examples of acceptable measures
1
2
Interrupt handling
no interrupt or too
rq
functional test or time-slot
and execution
frequent
monitoring
no interrupt or too
rq
comparison of redundant
frequent and
functional channles by either
interrupt related
- reciprocal comparison
to different
- independent hardware
sources
comparator
- independent time-slot and logical
monitoring
82
Risk Reduction
  • Identify the fault
  • Take corrective action, either
  • use redundancy to correct and move on
  • feedforward error correction (Hamming codes)
  • redo the computational step
  • feedback error detection (take corrective action
    first)
  • go to a fail-safe state

83
Fault Identification at Run-Time
  • Faults must be identified in lt TO
  • Fault identification requires redundancy
  • Redundancy can be in terms of
  • channel
  • device
  • data
  • control


Architectural

Detailed design
84
Fault Identification at Run-Time
  • Redundancy may be either
  • homogenous (random faults only)
  • does not detect errors
  • peform functions the same way on the same thing
    multiple times
  • heterogenous (systematic and random faults)
  • includes errors -gt present in all channels
  • perform processing differently and hopefully you
    didnt make the same mistake!

85
Fault Tree Analysis Symbology
A condition that must be present to produce the
output of a gate
An event that results from a combination of
events through a logic gate
Transfer
A basic fault event that requires no further
development
A fault event because the event is
inconsequential or the necessary information is
not available
AND gate (also OR gate)
An event that is expected to occur normally
NOT gate
86
Subset of Pacemaker Fault Analysis
Pacing too slowly
Condition or event to avoid
Secondary conditions or events
OR
Shutdown fault
Time-base fault
Invalid pacing rate
OR
AND
OR
AND
Crystal failure
Watchdog failure
Bad command rate
Data corrupted in vivo
Software failure
CPU H/W failure
Rate command corrupted
CRC hardware failure
Primary or fundamental faults
87
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

88
Safe Requirements
  • Requirements specification follows initial hazard
    analysis
  • Specific requirements should track back to hazard
    analysis
  • must be shown to FDA, etc
  • Architectural framework should be selected with
    safety needs in mind
  • has the hooks in place

89
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

90
Use Good Design Practices
  • Good design practices allow you to
  • manage complexity
  • view the system at various levels of abstraction
  • zoom in on a particular area of interest
  • identify hot spots of special concern
  • have consistent quality
  • easily test
  • build and use high quality components
  • Regulatory agencies look at this!!

91
Use Good Design Practices
  • Manage your requirements
  • trace requirements to design elements
  • trace design elements back to requirements

remote communications
adjust trajectory
class a
class b
remote communication
requirements specification
class c
class d
class e
use cases
design model
92
Use Good Design Practices
  • Use iterative development
  • integrating many times finds more defects
  • iterative prototypes can result in more reliable
    and safe systems

93
Use Good Design Practices
  • Use component-based design architectures
  • third party components may be very well tested in
    they are in wide use
  • require bug lists from component vendors
  • this bit Microsoft once

94
Use Good Design Practices
  • Use Visual Modeling
  • UML
  • Ward-Mellor
  • Use executable models
  • animate models
  • execute and debug at modeling level of abstraction

95
Use Good Design Practices
  • Use frameworks
  • a framework is a partially completed application
    which is specialized by the user
  • Microsoft foundation classes
  • Object Execution Framework
  • frameworks reduce the work of developing new
    applications
  • frameworks rely on well-tested patterns

96
Use Good Design Practices
User Model
Framework

80-90 of application code is housekeeping code

System
97
Use Good Design Practices
  • Use Configuration Management
  • only use unit-testing components in builds

parameters
data aquisition
SYSTEM
CM Database
drivers
OS
98
Use Good Design Practices
  • Design for test
  • product testing
  • built-in-testing to ensure
  • invariants are truly invariant
  • functional invariants
  • quality of service invariants (e.g. performance)
  • faults are detected

99
Good Design Practices
  • Isolate Safety Functions
  • Safety-relevant systems are 200-300 more effort
    to produce
  • Isolation of safety systems allows more expedient
    development
  • Care must be taken that the safety system is
    truly isolated so that a defect in the non-safety
    system cannot affect the safety system
  • different processor
  • different heavy-weight tasks (depends on the OS)

100
Safety Architecture Patterns
  • Protected Single-Channel Pattern
  • Dual-Channel Pattern
  • Homogenous Dual Channel Pattern
  • Heterogenous Peer-Channel Pattern
  • Sanity Check Pattern
  • Actuator-Monitor Pattern
  • Voting Multichannel Pattern

101
Protected Single Channel Pattern
  • Within the single channel, mechanisms exist to
    identify and handle faults
  • All faults must be detected within the fault
    tolerance time
  • May be imposssible
  • to test for all faults within the fault tolerance
    time
  • to remove common mode failures from the single
    channel
  • Generally, less recurring system cost
  • no additional hardware required

102
Protected Single Channel Pattern
If Im not getting life ticks, Ill shut down!
Single Channel Train Braking System
103
Dual Channel Architecture Patterns
  • Separation of safety-relevant from nonsafty
    relevant where possible
  • Separation of monitoring from control
  • Generally easier to meet safety requirements
  • timing
  • common mode failures
  • Generally higher recurring system cost
  • additional hardware required

104
Homogenous Dual-Channel Pattern
  • Identical channels used
  • Channels may operate simultaneously (Multichannel
    Vote Pattern)
  • Channels may operate in series (Backup Pattern)
  • Good at identifying random faults but not
    systematic faults
  • Low RD cost, higher recurring cost

105
Homogenous Dual-Channel Pattern
106
Heterogenous Peer-Channel Pattern
  • Equal weight, differently implemented channels
  • may use algorithmic inversion to recreate initial
    data
  • may use different algorithm
  • may use different teams (not fool proof because
    of hot spots that can cause failures)
  • Good at identifying both random and systematic
    faults

107
Heterogenous Peer-Channel Pattern
  • Generally safest, but higher RD and recurring
    cost

108
Heterogenous Peer-Channel Pattern
109
Sanity Check Pattern
  • A primary actuator channel does real computations
  • A light-weight secondary channel checks the
    reasonableness of the primary channel
  • Good for detection of both random and systematic
    faults
  • May not detect faults which result in small
    variance
  • Relatively inexpensive to implement

110
Monitor-Actuator Pattern
  • Separates actuation from the monitoring of that
    actuation
  • If the actuator channel fails, the monitor
    channel detects it
  • If the monitor channel fails, the actuator
    channel continues correctly
  • Requires fault isolation to be single-fault
    tolerant
  • actuator channel cannot use the monitor itself

111
Monitor-Actuator Pattern
112
Dual-Channel Design Architecture
113
Safety Executive Pattern
  • Large scale architectural pattern
  • Controller subsystem (safety executive)
  • One or more watchdog subsystems
  • check on system health
  • ensure proper actuation is occurring
  • One or more actuation channels
  • Recovery subsystem (Fail safe processing channel)

114
Safety Executive Pattern
  • Appropriate when
  • A set of fail-safe system states needs to be
    entered when failures identified
  • Determination of failures is complex
  • Several safety-related system actions are
    controlled simultaneously
  • Safety-related actions are not independent
  • Determining proper safety action in the event of
    a failure can be complex

115
Safety Executive Pattern
116
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

117
Detailed Design for Safety
  • Make it right before you make it fast
  • simple, clear algorithms and code
  • optimize only the 10-20 of code which affects
    performance
  • use safe language subsets
  • ensure you havent introduced any common mode
    failures

118
Detailed Design for Safety
  • Thoroughly test
  • unit test and peer review
  • integration test
  • validation test

119
Detailed Design for Safety
  • Verify that it remains right throughout program
    execution
  • exceptions
  • invariant assertions
  • range checking
  • index and boundary checking
  • When its not right during execution, then make it
    right with corrective or protective measures

120
Detailed Design for Safety
  • Use safe language subsets
  • strong compile-time checking
  • if you use C, use lint
  • strong run-time checking
  • exception handling
  • avoid void
  • avoid error prone statements and syntax
  • you can make C safe but its not safe out of the
    box

121
Detailed Design for Safety
  • Language choice
  • Compile time checking (C versus Ada)
  • Run-time checking (C versus Ada)
  • Exceptions versus error codes
  • Language selection
  • C treats you like a consenting adult. Pascal
    treats you like a naughty child. Ada treats you
    like a criminal

122
Pascal example
  • Program WontCompile
  • type
  • MySubRange 0 .. 20
  • Day (Monday, Tuesday, Wednesday, Thursday,
    Friday, Saturday, Sunday)
  • var
  • MyVar MySubRange
  • MyDate Day
  • begin
  • MyVar 9 will not compile -- range error!
  • MyDate 0 will not compile -- wrong type!
  • end.

123
Ada example
Procedure MyProc is Var MyArray array (1..10)
of integer j integer b byte begin for j
in 0 .. 10 loop MyArray(j) j6 -- raises
exception on first time
--through end loop b MyArray(10) -- will
fail run-time range check end MyProc
124
Exceptions
  • Some languages (Pascal, Modula-2) have a
    draconian error handling policy
  • exception raised and program terminated
  • not good for embedded systems
  • Ada and C allow run time recovery through
    user-defined exceptions and exception handlers

125
Exceptions
  • A lot of extra code to check the statement
  • aj b

126
Detailed Design for Safety
  • Do not allow ignoring of error indications
  • checking of return values is a manuel process
  • user of the function must remember each and every
    time
  • easy to circumvent this error handling system
  • Separate normal code from error handling code

127
Detailed Design for Safety
  • Handle errors at the lowest level with sufficient
    contect to correct the problem

128
Error handling code
  • a getfone(b, c)
  • if (a)
  • switch (a)
  • case 1 ..
  • case 2 ..
  • d getftwo(b,c)
  • if (d)
  • switch (a)
  • case 1 ..
  • case 2 ..

in this code the normal execution path is a
getfone(b,c) d getftwo(b,c)
129
Built-in exception types
  • procedure enqueue (q in out queue v in FLOAT)
    is
  • begin
  • if full (q) then
  • raise overflow
  • end if
  • q.body(q.head q.length) mod qSize v
  • q.length q.length 1
  • end enqueue

130
Caller of the sequence handles exception
  • procedure testQ(q in out queue) is
  • begin
  • for j in 1 .. 10 loop
  • enqueue(q, random(1000))
  • end loop
  • exception
  • when overflow gt
  • puts(Test failed due to queue overflow)
  • end testQ

131
C exception handling
  • Extends capabilities beyond that of Ada
  • Exceptions extended by type rather than value
  • possible to create hierarchies of exception
    classes and catch by thrown subclass type
  • class can contain different types of information
    about the kind of device that failed
  • this facilitates error recovery, debugging, and
    user error reporting

132
Making C safe
  • Overloading the operator with index range
    checking improves the safety of arrays
  • Make classes of scalars and overload the
    assignment operator allows additional range and
    value checking

133
Detailed Design for Safety
  • Data Validity Checks
  • CRC (16 bit or 32 bit)
  • identifies all single or dual bit errors
  • detects high percentage of multiple bit errors
  • table or compute-driven
  • chips are available
  • checksum
  • redundant storage
  • ones complement

134
Detailed Design for Safety
  • Redundancy should be set every write access
  • Data should be checked every read access

135
ANSI C Exception Class Hierarchy
exception
logic error
runtime error
domain error
out of range
range error
overflow error
invalid argument
length error
136
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

137
Safety Process (Development)
  • Do Hazard Analysis early and often
  • Track safety measures from hazard analysis to
  • requirments specification
  • design
  • code
  • validation tests
  • Test safety measures with fault seeding

138
Safety Process (Deployment)
  • Install safely
  • ensure proper means are used to set up system
  • safety measures are installed and checked
  • Deploy safely
  • ensure safety measures are periodically checked
    and serviced
  • Do not turn off safety measures
  • Decommision safely
  • removal of hazardous materials

139
Concept
IEC Overall Safety Lifecycle
Overall scope definition
Hazard and risk analysis
SRS Safety Related System E/E/PES
Electrical/Electronic/Programmable electronic
system
Overall safety requirements
Safety requirements allocation
SRS E/E/PES realization
Overall planning
SRS other technology realization
External risk reduct. facilities
Ops mainten. planning
Valida tion planning
Install. planning
Overall installation and commissioning
Overall safety validation
Overall modification and retrofit
Overall operation and maintenance
Decommissioning
140
Eight steps to safety
  • Identify the hazards
  • Determine the risks
  • Define the safety measures
  • Create safe requirements
  • Create safe designs
  • Implement safety
  • Assure the safety process
  • Test, test, test

141
Safety in Testing in RD
  • Use fault-seeding
  • Unit (class) testing
  • white box
  • procedural invariant violation assertions
  • peer reviews
  • Integration testing
  • grey box
  • Validation testing
  • black box
  • externally caused faults
  • (Grey box) internally seeded faults

142
Safety Testing During Operation
  • Power on Self-Test (POST)
  • Check for latent faults
  • All safety measures must be tested at power on
    and periodically
  • RAM (stuck-at, shorts, cell failures)
  • ROM
  • Flash
  • Disks
  • CPU
  • Interfaces
  • Buses

143
Safety Testing During Operation
  • Built-In Tests
  • Repeats some of POST
  • Data integrity checks
  • Index and pointer validity checking
  • Subrange value invariant assertions
  • Proper functioning
  • Watchdogs
  • Reasonableness checks
  • Lifeticks

144
A simplified Example A linear Accelerator
145
Unsafe Linear Accelerator
Beam Intensity Beam Duration
CPU
Radiation Dose
1. Set Dose 2. Start Beam 3. End Beam
Sensor
146
Fault Tree Analysis
Over radiation
AND
OR
Radiation command invalid
CPU Halted
OR
OR
Shutoff timer failure
Beam engaged
CPU failure
Software defect
Software defect
EMI
EMI
147
Hazards of the Linear Accelerator
Hazard
Level of
Tolerance
Fault
Likelihood
Detection
Control
Exposur
risk
Time T1
time
measure
e time
Over
Severe
100 ms
CPU
rare
50 ms
Safety
50m ms
radiati
locks
CPU
on
up
checks
lifetick at
2 5 ms
Corru
often
10 ms
32 bit
15 ms
pt data
CRCs on
setting
data
s
checked
every
access
Under
Moderat
2 weeks
corrup
often
10 ms
32 bit
15 ms
radiati
e
t data
CRCs on
on
setting
data
checked
every
access
Inadve
sefere
100 ms
beam
often
n/a
curtain
0 ms
rtant
left
mechanica
radiati
engage
lly shuts
on on
d
at power
power
during
down
on
power
down
148
Safe Linear Accelerator
Self test results shared prior to operation
Periodic watchdog service
Safety CPU
CPU
Beam Intensity Beam Duration
Radiation Dose
1. Set Dose 2. Start Beam 3. End Beam
Sensor
Deenergize
Mechanical shutoff when curtain low
149
Summary
  • Safety is a system issue
  • It is cheaper and more effective to include
    safety early on then to add it later
  • Safety architectures provide programming in the
    large safety
  • Safe coding rules and detailed design provide
    programming in-the-small safety

150
End of module
Write a Comment
User Comments (0)
About PowerShow.com