Title: Software%20Safety%20Engineering%20(S2E)%20Program%20Status
1Software Safety Engineering(S2E) Program Status
2Software Safety Program - Overview
- General Safety Concepts - WHY
- Software Safety and CLCS - HOW
- Known Hazards
- Designing for Safety
- Safety Reliability Thread
- Current Status
3Software Safety What is it?
Anticipate
Detect
Rate Slope
Absolute Value
Control
Mitigate
Limit Damage Return to Safe State
Prevent
4Software Safety What is it?
- Definitions
- Functionally-critical
- Mission completion
- Safety-Critical
- Humans Life Limb
- Hardware 106
- Some set theory
- Input versus output
5Some Theory
Set of Inputs (?)
Unknowns (?)
Set of Outputs
Known
Known Safe
Unsafe
Sources Normal Operation Hardware
Failures Human Intervention
Models/Simulators
Assumed Safe
6Software Safety Why do it?
- Direction
- DoD Mil-Std-882D, DoD-Std-2167
- NASA NSTS-07700, NSS-8719.13, NASA-GB-1740.1
3, NSS-22206, NSS-22254, Direction from Dan
Goldin - CLCS 84K-00055, KDP-P-2901
7Software Safety Why do it?
- Objective Identify Mitigate Risk
- Known Fault Scenarios by requirements,
analyses test - Possible Unknowns by design approach
further test
8Knowns
- Hardware fault-driven scenarios
- Legacy of hardware failure data available from
the 1970s - Hardware-driven hazards
- May be analyzed the SSA
- May be tested specific fault injection
- Identifies Risk Yields Design Changes
Issues/ESRs - The Safety Case Summary of Risk Findings
9Unknowns
- Stuff Happens
- Software doesnt fail It just doesnt do what
we thought it would - Hardware and some functions (e.g., seeds races)
cause most random errors - Specification Coding errors Prime Cause
- 90 of errors are in the specifications
- C and Java are inherently powerful, but
dangerous
10Farengi Software Safety Rule 76
If it "touches" hardware that can impact
the safety of people or equipment,
an SSA is absolutely necessary.
(i.e., controls, monitors, or mitigates the risk
of using)
11SSA - What and When
- Assessment of risk factors due to software
- Hardware Hazards ? SFMEA and SFTA
- KDP-P-2901
- Schedule 30 days before the first interaction
with Flight Hardware - In time for 5A/B Testing
- Presented at TRR/ORR
12System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Conceptual Design
13System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Val/Ver Test
Conceptual Design
3A/B 4A/B
TRR/ ORR
Readiness Reviews
System Test
5A/B (With Hdwr)
14System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Val/Ver Test
Conceptual Design
3A/B 4A/B
TRR/ ORR
Readiness Reviews
S-C Matrix
Hazards
System Test
5A/B (With Hdwr)
PHA
KDP-P-2901 SSA Process
15System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Val/Ver Test
Conceptual Design
3A/B 4A/B
TRR/ ORR
Readiness Reviews
Issues
S-C Matrix
Hazards
System Test
5A/B (With Hdwr)
FTA/ FMEA
PHA
KDP-P-2901 SSA Process
16System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Val/Ver Test
Conceptual Design
3A/B 4A/B
TRR/ ORR
Readiness Reviews
Issues
S-C Matrix
Issues
CHAWS
Hazards
System Test
5A/B (With Hdwr)
FTA/ FMEA
Risk Assessment
PHA
KDP-P-2901 SSA Process
CHAWS CLCS Hazard Analysis Worksheet
17System Safety Analysis
IPT/DP-1
SRS/DP-2
DDS/ODS/DP-3
Detail Design
Code Development
Val/Ver Test
Conceptual Design
3A/B 4A/B
TRR/ ORR
Risk
Readiness Reviews
Issues
S-C Matrix
Issues
CHAWS
Hazards
SSA Report
System Test
5A/B (With Hdwr)
FTA/ FMEA
Risk Assessment
PHA
CM-Driven Changes
KDP-P-2901 SSA Process
CHAWS CLCS Hazard Analysis Worksheet
18Software Fault Tree Analysis
- Works backward from the fault to its root causes
- Uses design details of the entire system
- Leads to better understanding of causes and their
prevention - Unknown fault events not considered
19Fault Tree Analysis
Fill Valve not closed
Top Event
AND
Causal Relationship
Other Root Cause
S/W did not react to over pressure
S/W did not anticipate rapid pressure rise
Human did not notice pressure
Intermediate Events
Basic Fault Events
20Analysis CLCS Architecture
Hazardous Event
Detection Anticipation
Hardware Safing
Applications
Apps Srvcs
Control Mitigation
Sys Srvcs
Remaining Risk
System S/W
21The Software FMEA
- Predicted hardware failures followed to their
conclusion through the software - What can go wrong?
- What happens when it does?
- Must know system failures up front
- Wont prevent the unexpected
22CLCS
- Spiral Development
- Cultural Changes
- Failure of software
- Test
23SSA Traditional Approach
Fault Tree Analysis
Failure Modes Effects Analysis
Traditional Development
- All or most code available
- A lot known about the system
- Too late
24SSA - An Iterative Process
Fault Tree Analysis
Safety Criticality Assessment
Spiral Development
Failure Modes Effects Analysis
Engineering Design Changes
25SMA will perform a Software Safety Analysis
(SSA) for each Delivery and every location i.e.,
as we step up to each new drop. After the
initial SSA, an update of the analysis and a new
SSA report will be done for each modification to
the safety critical software.
SSA - Where
26SSA - Planning
Design Begin
Val/Ver Test
PHA
FTA FMEA
Risk Assessment
SSA Report
On a Pert chart, the SSA preparation activity
will begin during the preparation of the design
specifications and have a finish-to-finish
relationship with the validation/verification
(4A/B) testing.
27Farengi Software Safety Rule 304
The SSA isnt enough.
28CLCS
- Spiral Development
- Cultural Changes
- Failure of software
- Test
29Paradigms
- Software Failures
- Software does not fail - it just does not
perform as intended - Dr Nancy Leveson, MIT
30Paradigms
- Design and test for functionality
- Also specify what the system
- should not do.
- Then test it.
31Some Theory 2nd Look
Set of Inputs (?)
Unknowns (?)
Set of Outputs
Known
Known Safe
Unsafe
Sources Normal Operation Hardware
Failures Human Intervention
Models/Simulators
Assumed Safe
32Design for Safety
- Program and Project Responsibilities
- Dan Goldin message
- Safety is more than FMEA and FTA
- Safety must be designed in at the earliest
- Existing Specifications
- Must include safety
- Methods techniques for mitigation of hazards
- Requirements Traceable and Testable
33Initiatives
- Dan Goldin Design for Safety
- Smart Practices applied early to designs
- Early engineering changes are cheaper
- Provide draft guidance for design of
safety-critical software - Process changes
- Design Guidelines NASA-GB-7410.13
- Peer reviews enhanced checklist
- Test development Fault Injection for Robustness
- Works to prevent unforeseen fault scenarios
34Objectives
- Known fault scenarios
- Analysis
- Redesign
- Test functionality and robustness
- Unknowns
- Design them out of the system
- Test fault injection
35S/W Safety Where we are.
- Safety-Critical software identified in
engineering review - Software Safety Integration Team formed
- Software FTA/FMEA in work
- Will be recurring due to spiral development
- Design for Safety concepts being integrated
- Safety Reliability Thread introduced
- Post-SSA Analysis Tools being procured
36S/W Safety Whats Next?
- Today
- Design for Safety and Known Fault Analyses
- Tomorrow
- Recursive and bi-directional analyses
- Reliability predictions, Markov, Numerical
Integration, Weibull analysis techniques - Probabilistic fault injection techniques
37Summary
- Life on the Leading Edge
- Probably the Largest real-time safety-critical
control system on the planet - Safety is our 1 core value
- We are on front and center stage The NASA
team is watching
38Comparison of Safety Analysis Techniques
- Fault Tree Analysis (FTA) - Easy to understand,
very well known, - however has difficulty handling
non-combinatorial problems. Should be - limited to combinatorial types.
- Markov Analysis (MA) Solves non-combinatorial
type problems and - should be limited to such. However, limited
to devices that exhibit - constant failure rates i.e. devices that
exhibit an exponential distribution - of failure.
- Numerical Integration - Can handle all aspects
of probability including - mechanical and electro-mechanical devices.
However not intuitive. - Weibull (algebraic approximation methods) - Can
be used for problems - involving mechanical and electro-mechanical
devices.
- From ISSC Presentation by Dr Vito Faraci
- Lockheed Martin Fairchild Systems, Syosset N.Y.