Requirements Analysis and Design Engineering

About This Presentation

Title:

Requirements Analysis and Design Engineering

Description:

Independent reviews; unrelated but knowledgeable group examines the element and ... car won't start unless it is in 'Park' Means of Hazard Control ... – PowerPoint PPT presentation

Number of Views:184

Avg rating:3.0/5.0

Slides: 151

Provided by: RobOs3

Category:

more less

Transcript and Presenter's Notes

Title: Requirements Analysis and Design Engineering

1
RequirementsAnalysis andDesignEngineering

Southern Methodist University
CSE 7313

2
Module 17 Validating the system
3
Agenda

Traceability
Validating the system
Safety critical systems

4
Role of traceability

Ability to trace is significant factor in quality
software implementation
Tracking relationships and relating them is key
in many high assurance processes
Impact of change is often missed
Small changes can create significant safety and
reliability problems

5
Traceability defined

IEEE The degree to which a relationship can be
established between two or more products of the
development process, especially products having a
predecessor-successor or master-subordinate
relationship to one another

6
Traceability relationship

traced-to
traced-from

Vision Document (features)
Traceability link
Actor
SW requirement (use case)
7
Traceability relationship

Additional meanings can be placed on these
relationships
tested by
traced to
implemented by

8
Implicit vs Explicit

Explicit traceability development of
relationships stemming from external
considerations supplied by the team
product feature and use case
Implicit traceability driven by methodology and
structure of the model
hierarchical requirements have implicit
relationship between parent and related child

9
Implicit vs Explicit

Other implicit examples
modeling tools in the development process may
provide other traceability relationships (use
cases and actors that interact with the use case)

10
Project Relationships
Need
Note This traceability link is optional, as it
can be derived from the link between the product
fea- ture and the use case selection. This link
is often used to relate the prod- uct features to
the use cases before the use case selections are
written
Traces to
Product Feature
Traces to
Traces to
Use case
Traces to
Software Requirements
Use case selection
11
Additional Traceability Options

Additional less traditional elements of a project
can be traced if they add value
issue for unresolved issues
assumptions and rationales
action items
requests for new/revised features
glossary and acronym terms
bibliographic references

12
Augmented Traceability relationships
Need
Note This traceability link is optional, as it
can be derived from the link between the product
feature and the use case section. This link is
often used to relate the product features to the
use cases before the use case sections are written
The SW requirements make up the formal SRS,
of which the use case model is an interpretation
Traces to
Product Features
Traces to
SW Rqmts
Traces to
Traces to
Use case
Glossary term
Traces to
Traces to
Traces to
In this case, we are tracing items to the
glossary terms, as well as from then,
as described when defining glossary terms as one
of the supporting traceability types
Actor
Use case selection
13
Verification and traceability

Must consider whether or not you have correctly
and completely considered all of the links that
should be established
Deeper consideration often leads to some
revisions
Should hold formal and informal reviews
Its not all mechanical processing

14
Validating the system
15
Validation (IEEE)

the process of evaluating a system or component
during or at the end of the development process
to determine whether it satisfies requirements
Use validation to conform that the implemented
system conforms to the requirements established.

16
Acceptance Tests

Bringing the customer into the final validation
process in order to gain assurance that the
product works the way the customer really needs
it to
May be part of the contract provisions
IT environments do this by a customer alpha or
beta evaluation

17
Acceptance Tests

Based on a specific number of scenarios that the
user specifies and executes in the usage
environment
Freedom to think outside the box
Construct interesting ways to test the system to
gain confidence that the system works as needed
Based on certain key use cases

18
Acceptance Tests

Apply these use cases interesting combinations
under certain types of system load and other
environmental factors
interoperability with other applications
OS dependencies
others likely to be present in the users
environment

19
Acceptance Tests

Iterative development environments will have
generations of acceptance tests run at various
milestones
Will most often find at least some undiscovered
ruins

20
Validation Testing

Primary activities in validation are testing
activities
IEEE 829-1983 IEEE Standard for SW test
documentation
The development process must
include planning for test activities
time and resources to design the tests
time and resources to execute the tests

21
Implementation documentation
User needs
Vision document
SRS package
Requirement specification
Hazard Analysis
Use cases
Implementation units (functions, use case
realizations, modules)
Test protocols
Test suites
22
Validation Traceability

Validation traceability gives confidence that two
important goals have been addressed
1. Do we have enough tests to cover everything
that needs testing?
2. Do we have any extra or gratuitous tests that
serve no useful purpose?
Validation focuses on whether the product works
as it supposed to

23
Requirements based testing

Quality can be achieved only by testing the
system against its requirements
Many complex systems will pass all unit tests but
fail as a system
unit tests interact in more complex behaviors
resulting system has not been adequately tested
against the requirements

24
Use case and test cases
Test case 2
Test model
Test case 3
Test case 1
(traceability links)
Use case 2
Use case 1
Use cases
25
Testing design constraints

Consider design constraints requirements
Include design constraints as part of the
validation effort
Many design constraints will yield to simple
inspections
use abbreviated test procedure

26
Design Constraint validation approaches
27
Using ROI to determine effort

Must perform cost/benefit wrt VV activities
Plan VV activities based on
1. What are the social and economic consequences
of a failure of our system?
2. How much VV do we need to do to ensure that
we do not experience these consequences?

28
VV depth

Depth defines the level of VV effort to be
applied to a system element
greater the depth, the more resources
Match depth of the review to the importance of
the element
inspection
simple test
extensive white box testing

29
VV depth activities

Examination review the code or take some
measurements.
Prescribed and minimally invasive look is taken
at the element under test
minimal depth of review of an element
Walkthrough peer group walks the element through
its paces
process is a structured inspection performed by a
wider audience
search for weaknesses, oversights, etc

30
VV depth activities

Independent reviews unrelated but knowledgeable
group examines the element and searches for
weaknesses
may provide additional insights that were not in
the mind set of the project group

31
VV depth activities

Black box test treats the element as a module
that cannot be internally inspected
supply inputs to the box and observe the boxs
outputs to ensure that the element is working to
the required standards
performed via instrumented code or with system
emulators and other tools to simulate and record
operation

32
VV depth activities

White box test allows you to open the box and
examine the internal workings of the element
most modules have too many combinatorial pathways
to test in a reasonable amount of time
apply reasonable approach that does not take too
much time
coverage instead of combination

33
VV coverage

Coverage defines the extent of coverage of system
elements to be verified and validated
The amount of traceability and the corresponding
level of specificity in the requirements
determines coverage

34
What to verify and validate

1. Verify and validate everything
smaller projects
simple and consistent application
ensures uniformity
selective VV can be appropriate if you know
what the risks are
omitted elements are done so for a good reason
NOT run out of time or money

35
What to verify and validate

Possible repercussions
embarrassment over and element not conforming to
customer specification
elements not working properly per the
specification
worst case an unsafe product that can cause harm
to its users

36
What to verify and validate

2. Use a hazard analysis to determine VV
necessities
Hazard analysis is the detailed examination of a
device from the user and patient perspectives.
Its purpose is to detect potential design flaws -
possibilities of failure that could cause harm -
and to enable manufacturers to correct them
before a device is released for use

37
What to verify and validate

Hazard analysis guides the selection of project
elements for VV
Always perform a hazard analysis for a human
safety critical system

38
Safety Critical Systems
39
Why worry about safety?

Safety is not discussed in the literature
Safety is not taught in colleges
Without training or guidance, embedded systems
are assuming more safety roles every day!

40
Examples of safety-related computing systems

Medical equipment (monitoring and therapy)
Flight computers
Automobile braking and engine control
Chemical process control
Robotic assembly systems
couple dozen deaths each year in Japan because of
wayward robots

41
Examples of safety-related computing systems

Military Weapondry
Nuclear power plants
Financial systems

42
Therac-25 Story

Radiation therapy treatment device
Released in 1982
Used S/W to enhance usability and lower cost of
production
Compounding of process, design, and
implementation failures led to massive overdoses
that killed 3 patients
Fixing identified problem did not make the device
safer

43
Other stories

First Shuttle launch delayed 2 days because
backup computer would not start correctly when an
error was discovered
Patriot missle failed because of clock drift and
effectiveness downgraded from 95 to 13
8080-based cement factory process control system
mistakenly stacked huge boulders 80 ft above the
ground which fell and crushed cars and damaged
the building

44
Other stories

Stray electromagetic interference blamed for 19
robot inflicted deaths in Japan
Low energy radiation blamed for several deaths
related to reprogramming cardiac pacemakers
Attempted suicides after incorrect diagnosis of
diseases

45
Other Stories

Grady Booch The last announcement I want to hear
from a pilot when I am flying is Is there a
programmer on board?

46
Errors are systematic faults

They are designed in
The FORTRAN line
DO I1.10
rather than
DO I1,10
caused problems for the Project Mercury flight
control software. This is a design error. Entire
mainframes have been brought down with an
inadvertent semicolon !

47
What is safety?

Safety is the freedom from accidents or losses
Safety is not reliability!
Reliability is the probability that a system will
perform its intended function satisfactorily
A handgun is a very reliable piece of equipment
but it is not very safe!
Windows 95 is safe, but not very reliable!

48
What is safety?

Safety is not security!
Security is protection or defense against attack,
interference, or espionage

49
Safety related concepts

Accident is a loss of some kind, such as injury,
death, or equipment damage
telephone system going down on the East coast was
a big loss -gt safety in one sense
Risk is a combination of the likelihood of an
accident and its severity
risk p(a) s(a)

50
Safety related concepts

an airplane crashing has a high severity but a
very low probability gt ultimately low risk
Hazard is a set of conditions and/or events that
leads to an accident

51
Other safety related concepts

A failure is the nonperformance of a system or
component, a random fault
a random failure is one that can be estimated
from a pdf,
failures are events
e.g. a component failure

52
Other safety related concepts

An error is a systematic fault
a systematic fault is a design error
Errors are states or conditions
e.g. a software bug
A fault is either a failure or an error

53
Other safety related concepts

Safety must be considered in the context of the
system, not the component
It is less expensive and far more cost effective
to build in safety early than try to tack it on
later
The Hazard Analysis ties together hazards,
faults, and safety measures

54
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

55
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

Safety analysis
Handled at the architectural level and
mechanistic level
56
Safety Analysis

You must identify the hazards of the system
You must identify the faults that can lead to
hazards
You must define safety control measures to handle
hazards
These culminate in the Hazard Analysis
The Hazard Analysis feeds into the Requirements
Specification

57
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

58
Hazard Causes

Release of energy
electromagnetism (microwave oven)
radiation (nulcear power plant)
electricity (electrocution hazard from ECG leads)
heat (infant warmer)
kinetic (runaway train)
Release of toxins

59
Hazard Causes

Interference with life support or other
safety-related function
Misleading safety personnel
Failure to alarm
alarming too much - Therac 25. These were
ignored and people were killed

60
Types of Hazards

Actions
inappropriate system actions taken
F-18 pilot pulling up landing gear
appropriate system actions not taken
Timing
too soon
too late
fault latency time

61
Types of Hazards

Sequence
skipping actions
actions out of order
Amount
too much
too little

62
Example Hazards

Actions
incorrectly energizing a medical treatment laser
failure to engage landing gear
Timing
cardiac pacemaker paces too fast
flight control surface adjusted too slowly

63
Example Hazards

Sequence
empty the vat, THEN add the reagent
out of sequence network packets controlling
industrial robot
Amount
electrocution from muscle stimulator
too little oxygen delivered to ventilator patient

64
Means of Hazard Control

Obviation the possibility of the hazard can be
removed by being made physically impossible
use incompatible fasteners to prevent cross
connections
Education the hazard can be handled by
educating the users so that they wont create
hazardous conditions through equipment misuse
dont look down the barrel when cleaning your
rifle

65
Means of Hazard Control

Alarming announcing the hazard to the user when
it appears so that they can take appropriate
action
alarming when the heart stops beating
Interlocks the hazard can be removed by using
secondary devices and/or logic to intercede when
a hazard presents itself
car wont start unless it is in Park

66
Means of Hazard Control

Internal checking the hazard can be handled by
ensuring that a system can detect that it is
malfunctioning prior to an incident
CRC checks data for corruption whenever it is
accessed
Safety equipment
goggles, gloves

67
Means of Hazard Control

Restricting access to potential hazards so that
only knowledgeable users have such access
using passwords to prevent inadvertently starting
service mode
Labelling
High Voltage -- DO NOT TOUCH

68
Hazard Analysis
What do you do about it?
How long is the exposure to hazard?
How can this happen?
How long to discover?
How long can it be tolerated
How bad if it occurs?
Hazardous condition
How frequently?
Hazard
Level of
Toleran
Fault
Likeli
Detection
Control
Exposure
risk
ce time
hood
time
Measure
time
T1
Hypo-
Severe
5 min
Ventilator
rare
30 sec
Independent
1 min
ventilation
fans
pressure
alarm,
action by
doctor
Esphageal
often
30 sec
C)2 sensor
1 min
Intubation
alarm
User
often
0
Noncompati
0
misattaches
ble
breathing
mechanical
circuit
fasteners
used
Overpressur
Severe
250 ms
Release
rare
50 ms
Secondary
55 ms
e
valve
valve opens
failure
69
When is a system safe enough?

(Minimal) No hazards in the absence of faults
(Minimal) No hazards in the presence of any
single point failure
a common mode failure is a single point failure
that affects multiple channels
a latent fault is an undetected fault which
allows another fault to cause a hazard
Your mileage may vary depending on the risk
introduced by your system

70
TUV Single Fault Assessment
71
Single Fault Assessment

T0 is fault tolerance time for the first fault
T1 is the time after the first fault that the
second fault is likely (via MTBF)
For testing used for safety, test time TT must be
done periodically
TT lt T1 lt T0
This is not always possible
e.g. RAM tests

72
Safety Measures

You cannot depend on a safety measure that you
cannot test!
CAN bus with 2 nodes provides a CRC on mesages
checked at the chip level, but the chips provide
no way of testing to see if it is working.
Therefore, it cannot be relied on as a safety
measure

73
Fail-Safe States

Off
Emergency stop -- immediately cut power
Production stop -- stop after the current task
Protection stop -- shut down without removing
power
Partial Shutdown
Degraded level of functionality

74
Fail-Safe States

Hold
No functionality, but with safety actions taken
Manuel or External control
Restart (reboot)

75
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

76
Risk Assessment

For each hazard
determine the potential severity
determine the likelihood of the hazard
determine how long the user is exposed to the
hazard
determine whether the risk can be removed

77
TUV Risk Level Determination Chart
W3
W2
W1
S1
1
-
-
G1
2
1
-
E1
G2
3
2
1
S2
G1
4
3
2
E2
G2
5
4
3
E1
6
5
4
S3
E2
7
6
5
S4
8
7
6
Risk parameters S Extent of damage S1 slight
injury S2 severe irreversible injury, to one of
more persons or the death of a single person S3
death of several persons S4 Catestrophic
consequences, several deaths E Exposure
time E1 seldom to relatively infrequent E2
frequent to continuous G Hazard Prevention G1
possible under certain conditions G2 hardly
possible W Occurrence probability of hazardous
event W1 very low W2 low W3 relatively high
78
Sample Risk Assessments
Device
Hazard
Extent of
Exposure
Hazard
Probability
TUV Risk
damage
time
Prevention
level
Microwave
Irradiation
S2
E2
G2
W3
5
oven
Pacemaker
Pace too
S2
E2
G2
W3
5
slowly
Pace too
S2
E2
G2
W3
5
fast
Power
Explosion
S3
E1
--
W3
6
station
Airliner
Crash
S4
E2
G2
W2
8
79
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

80
Safety Measures

Safety measures do one of the following
remove the hazard
reduce the risk
identify the hazard to supervisory control
The purpose of the safety measure is to ensure
the system remains in a safe state

81
Safety Measures

Adequacy of measures
safety measures mut be able to reliably detect
the fault
safety measures must be able to take appropriate
actions

Component
Fault/Error
Software class
Examples of acceptable measures
1
2
Interrupt handling
no interrupt or too
rq
functional test or time-slot
and execution
frequent
monitoring
no interrupt or too
rq
comparison of redundant
frequent and
functional channles by either
interrupt related
- reciprocal comparison
to different
- independent hardware
sources
comparator
- independent time-slot and logical
monitoring
82
Risk Reduction

Identify the fault
Take corrective action, either
use redundancy to correct and move on
feedforward error correction (Hamming codes)
redo the computational step
feedback error detection (take corrective action
first)
go to a fail-safe state

83
Fault Identification at Run-Time

Faults must be identified in lt TO
Fault identification requires redundancy
Redundancy can be in terms of
channel
device
data
control

Architectural

Detailed design
84
Fault Identification at Run-Time

Redundancy may be either
homogenous (random faults only)
does not detect errors
peform functions the same way on the same thing
multiple times
heterogenous (systematic and random faults)
includes errors -gt present in all channels
perform processing differently and hopefully you
didnt make the same mistake!

85
Fault Tree Analysis Symbology
A condition that must be present to produce the
output of a gate
An event that results from a combination of
events through a logic gate
Transfer
A basic fault event that requires no further
development
A fault event because the event is
inconsequential or the necessary information is
not available
AND gate (also OR gate)
An event that is expected to occur normally
NOT gate
86
Subset of Pacemaker Fault Analysis
Pacing too slowly
Condition or event to avoid
Secondary conditions or events
OR
Shutdown fault
Time-base fault
Invalid pacing rate
OR
AND
OR
AND
Crystal failure
Watchdog failure
Bad command rate
Data corrupted in vivo
Software failure
CPU H/W failure
Rate command corrupted
CRC hardware failure
Primary or fundamental faults
87
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

88
Safe Requirements

Requirements specification follows initial hazard
analysis
Specific requirements should track back to hazard
analysis
must be shown to FDA, etc
Architectural framework should be selected with
safety needs in mind
has the hooks in place

89
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

90
Use Good Design Practices

Good design practices allow you to
manage complexity
view the system at various levels of abstraction
zoom in on a particular area of interest
identify hot spots of special concern
have consistent quality
easily test
build and use high quality components
Regulatory agencies look at this!!

91
Use Good Design Practices

Manage your requirements
trace requirements to design elements
trace design elements back to requirements

remote communications
adjust trajectory
class a
class b
remote communication
requirements specification
class c
class d
class e
use cases
design model
92
Use Good Design Practices

Use iterative development
integrating many times finds more defects
iterative prototypes can result in more reliable
and safe systems

93
Use Good Design Practices

Use component-based design architectures
third party components may be very well tested in
they are in wide use
require bug lists from component vendors
this bit Microsoft once

94
Use Good Design Practices

Use Visual Modeling
UML
Ward-Mellor
Use executable models
animate models
execute and debug at modeling level of abstraction

95
Use Good Design Practices

Use frameworks
a framework is a partially completed application
which is specialized by the user
Microsoft foundation classes
Object Execution Framework
frameworks reduce the work of developing new
applications
frameworks rely on well-tested patterns

96
Use Good Design Practices
User Model
Framework

80-90 of application code is housekeeping code

System
97
Use Good Design Practices

Use Configuration Management
only use unit-testing components in builds

parameters
data aquisition
SYSTEM
CM Database
drivers
OS
98
Use Good Design Practices

Design for test
product testing
built-in-testing to ensure
invariants are truly invariant
functional invariants
quality of service invariants (e.g. performance)
faults are detected

99
Good Design Practices

Isolate Safety Functions
Safety-relevant systems are 200-300 more effort
to produce
Isolation of safety systems allows more expedient
development
Care must be taken that the safety system is
truly isolated so that a defect in the non-safety
system cannot affect the safety system
different processor
different heavy-weight tasks (depends on the OS)

100
Safety Architecture Patterns

Protected Single-Channel Pattern
Dual-Channel Pattern
Homogenous Dual Channel Pattern
Heterogenous Peer-Channel Pattern
Sanity Check Pattern
Actuator-Monitor Pattern
Voting Multichannel Pattern

101
Protected Single Channel Pattern

Within the single channel, mechanisms exist to
identify and handle faults
All faults must be detected within the fault
tolerance time
May be imposssible
to test for all faults within the fault tolerance
time
to remove common mode failures from the single
channel
Generally, less recurring system cost
no additional hardware required

102
Protected Single Channel Pattern
If Im not getting life ticks, Ill shut down!
Single Channel Train Braking System
103
Dual Channel Architecture Patterns

Separation of safety-relevant from nonsafty
relevant where possible
Separation of monitoring from control
Generally easier to meet safety requirements
timing
common mode failures
Generally higher recurring system cost
additional hardware required

104
Homogenous Dual-Channel Pattern

Identical channels used
Channels may operate simultaneously (Multichannel
Vote Pattern)
Channels may operate in series (Backup Pattern)
Good at identifying random faults but not
systematic faults
Low RD cost, higher recurring cost

105
Homogenous Dual-Channel Pattern
106
Heterogenous Peer-Channel Pattern

Equal weight, differently implemented channels
may use algorithmic inversion to recreate initial
data
may use different algorithm
may use different teams (not fool proof because
of hot spots that can cause failures)
Good at identifying both random and systematic
faults

107
Heterogenous Peer-Channel Pattern

Generally safest, but higher RD and recurring
cost

108
Heterogenous Peer-Channel Pattern
109
Sanity Check Pattern

A primary actuator channel does real computations
A light-weight secondary channel checks the
reasonableness of the primary channel
Good for detection of both random and systematic
faults
May not detect faults which result in small
variance
Relatively inexpensive to implement

110
Monitor-Actuator Pattern

Separates actuation from the monitoring of that
actuation
If the actuator channel fails, the monitor
channel detects it
If the monitor channel fails, the actuator
channel continues correctly
Requires fault isolation to be single-fault
tolerant
actuator channel cannot use the monitor itself

111
Monitor-Actuator Pattern
112
Dual-Channel Design Architecture
113
Safety Executive Pattern

Large scale architectural pattern
Controller subsystem (safety executive)
One or more watchdog subsystems
check on system health
ensure proper actuation is occurring
One or more actuation channels
Recovery subsystem (Fail safe processing channel)

114
Safety Executive Pattern

Appropriate when
A set of fail-safe system states needs to be
entered when failures identified
Determination of failures is complex
Several safety-related system actions are
controlled simultaneously
Safety-related actions are not independent
Determining proper safety action in the event of
a failure can be complex

115
Safety Executive Pattern
116
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

117
Detailed Design for Safety

Make it right before you make it fast
simple, clear algorithms and code
optimize only the 10-20 of code which affects
performance
use safe language subsets
ensure you havent introduced any common mode
failures

118
Detailed Design for Safety

Thoroughly test
unit test and peer review
integration test
validation test

119
Detailed Design for Safety

Verify that it remains right throughout program
execution
exceptions
invariant assertions
range checking
index and boundary checking
When its not right during execution, then make it
right with corrective or protective measures

120
Detailed Design for Safety

Use safe language subsets
strong compile-time checking
if you use C, use lint
strong run-time checking
exception handling
avoid void
avoid error prone statements and syntax
you can make C safe but its not safe out of the
box

121
Detailed Design for Safety

Language choice
Compile time checking (C versus Ada)
Run-time checking (C versus Ada)
Exceptions versus error codes
Language selection
C treats you like a consenting adult. Pascal
treats you like a naughty child. Ada treats you
like a criminal

122
Pascal example

Program WontCompile
type
MySubRange 0 .. 20
Day (Monday, Tuesday, Wednesday, Thursday,
Friday, Saturday, Sunday)
var
MyVar MySubRange
MyDate Day
begin
MyVar 9 will not compile -- range error!
MyDate 0 will not compile -- wrong type!
end.

123
Ada example
Procedure MyProc is Var MyArray array (1..10)
of integer j integer b byte begin for j
in 0 .. 10 loop MyArray(j) j6 -- raises
exception on first time
--through end loop b MyArray(10) -- will
fail run-time range check end MyProc
124
Exceptions

Some languages (Pascal, Modula-2) have a
draconian error handling policy
exception raised and program terminated
not good for embedded systems
Ada and C allow run time recovery through
user-defined exceptions and exception handlers

125
Exceptions

A lot of extra code to check the statement
aj b

126
Detailed Design for Safety

Do not allow ignoring of error indications
checking of return values is a manuel process
user of the function must remember each and every
time
easy to circumvent this error handling system
Separate normal code from error handling code

127
Detailed Design for Safety

Handle errors at the lowest level with sufficient
contect to correct the problem

128
Error handling code

a getfone(b, c)
if (a)
switch (a)
case 1 ..
case 2 ..
d getftwo(b,c)
if (d)
switch (a)
case 1 ..
case 2 ..

in this code the normal execution path is a
getfone(b,c) d getftwo(b,c)
129
Built-in exception types

procedure enqueue (q in out queue v in FLOAT)
is
begin
if full (q) then
raise overflow
end if
q.body(q.head q.length) mod qSize v
q.length q.length 1
end enqueue

130
Caller of the sequence handles exception

procedure testQ(q in out queue) is
begin
for j in 1 .. 10 loop
enqueue(q, random(1000))
end loop
exception
when overflow gt
puts(Test failed due to queue overflow)
end testQ

131
C exception handling

Extends capabilities beyond that of Ada
Exceptions extended by type rather than value
possible to create hierarchies of exception
classes and catch by thrown subclass type
class can contain different types of information
about the kind of device that failed
this facilitates error recovery, debugging, and
user error reporting

132
Making C safe

Overloading the operator with index range
checking improves the safety of arrays
Make classes of scalars and overload the
assignment operator allows additional range and
value checking

133
Detailed Design for Safety

Data Validity Checks
CRC (16 bit or 32 bit)
identifies all single or dual bit errors
detects high percentage of multiple bit errors
table or compute-driven
chips are available
checksum
redundant storage
ones complement

134
Detailed Design for Safety

Redundancy should be set every write access
Data should be checked every read access

135
ANSI C Exception Class Hierarchy
exception
logic error
runtime error
domain error
out of range
range error
overflow error
invalid argument
length error
136
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

137
Safety Process (Development)

Do Hazard Analysis early and often
Track safety measures from hazard analysis to
requirments specification
design
code
validation tests
Test safety measures with fault seeding

138
Safety Process (Deployment)

Install safely
ensure proper means are used to set up system
safety measures are installed and checked
Deploy safely
ensure safety measures are periodically checked
and serviced
Do not turn off safety measures
Decommision safely
removal of hazardous materials

139
Concept
IEC Overall Safety Lifecycle
Overall scope definition
Hazard and risk analysis
SRS Safety Related System E/E/PES
Electrical/Electronic/Programmable electronic
system
Overall safety requirements
Safety requirements allocation
SRS E/E/PES realization
Overall planning
SRS other technology realization
External risk reduct. facilities
Ops mainten. planning
Valida tion planning
Install. planning
Overall installation and commissioning
Overall safety validation
Overall modification and retrofit
Overall operation and maintenance
Decommissioning
140
Eight steps to safety

Identify the hazards
Determine the risks
Define the safety measures
Create safe requirements
Create safe designs
Implement safety
Assure the safety process
Test, test, test

141
Safety in Testing in RD

Use fault-seeding
Unit (class) testing
white box
procedural invariant violation assertions
peer reviews
Integration testing
grey box
Validation testing
black box
externally caused faults
(Grey box) internally seeded faults

142
Safety Testing During Operation

Power on Self-Test (POST)
Check for latent faults
All safety measures must be tested at power on
and periodically
RAM (stuck-at, shorts, cell failures)
ROM
Flash
Disks
CPU
Interfaces
Buses

143
Safety Testing During Operation

Built-In Tests
Repeats some of POST
Data integrity checks
Index and pointer validity checking
Subrange value invariant assertions
Proper functioning
Watchdogs
Reasonableness checks
Lifeticks

144
A simplified Example A linear Accelerator
145
Unsafe Linear Accelerator
Beam Intensity Beam Duration
CPU
Radiation Dose
1. Set Dose 2. Start Beam 3. End Beam
Sensor
146
Fault Tree Analysis
Over radiation
AND
OR
Radiation command invalid
CPU Halted
OR
OR
Shutoff timer failure
Beam engaged
CPU failure
Software defect
Software defect
EMI
EMI
147
Hazards of the Linear Accelerator
Hazard
Level of
Tolerance
Fault
Likelihood
Detection
Control
Exposur
risk
Time T1
time
measure
e time
Over
Severe
100 ms
CPU
rare
50 ms
Safety
50m ms
radiati
locks
CPU
on
up
checks
lifetick at
2 5 ms
Corru
often
10 ms
32 bit
15 ms
pt data
CRCs on
setting
data
s
checked
every
access
Under
Moderat
2 weeks
corrup
often
10 ms
32 bit
15 ms
radiati
e
t data
CRCs on
on
setting
data
checked
every
access
Inadve
sefere
100 ms
beam
often
n/a
curtain
0 ms
rtant
left
mechanica
radiati
engage
lly shuts
on on
d
at power
power
during
down
on
power
down
148
Safe Linear Accelerator
Self test results shared prior to operation
Periodic watchdog service
Safety CPU
CPU
Beam Intensity Beam Duration
Radiation Dose
1. Set Dose 2. Start Beam 3. End Beam
Sensor
Deenergize
Mechanical shutoff when curtain low
149
Summary

Safety is a system issue
It is cheaper and more effective to include
safety early on then to add it later
Safety architectures provide programming in the
large safety
Safe coding rules and detailed design provide
programming in-the-small safety

150
End of module

Write a Comment

User Comments (0)