Title: Quality for Safety Critical Software Case Study from OPG
1Quality for Safety Critical SoftwareCase Study
from OPG
- CFICSE
- October 2002
- Diane Kelly
2References
- 1. Ontario Hydros Experience with New Methods
for Engineering Safety Critical Software
M.Viola, Proceedings Safecomp 1995, Italy - 2. CE-1001-STD Rev. 1, January 1995, OHN
P.Joannou et al. - 3. Safety Critical Software - Then and Now
N.Ichiyen et al. October 2, 1995, COG CANDU
Computer Conference - 4. David Parnas, Inspection of Safety-Critical
Software using Program Function tables, Chapter
19, Software Fundamentals, Collected Papers,
Addison-Wesley, 2001
3Background (1)
- 1987 Darlington reactor safety shutdown system
(SDS) (systems 1 and 2) designed entirely in
software (first time) - regulator (AECB) concerns
- not properly engineered
- software functional but not of high quality
- uncertain risk
- lack of confidence in product, process, and
people - software already written
- two independent systems (physically and
logically) - designed differently to reduce common mode errors
- 7000 lines of FORTRAN
- 13000 lines of PASCAL
4Background (2)
- regulator requirements
- software be verified before put into use
- formal verification of specification to code
- random testing program
- hazard analysis program
- underlying problems
- no agreed upon, measurable definition of
acceptability for the engineering of safety
critical software - no widely accepted common practices for
specification, design, verification, and testing
of safety critical software - not possible to quantify the achieved reliability
of software component of a safety system
5Background (3)
- David Parnas hired by regulator to advise on
process - documentation based on Parnas Tables
- rendered code into tabular format
- rendered requirements into tabular format
- proofs to show code and requirements the same
- took about 1 year to complete
- about 60 people involved on FORTRAN side
- AECB allowed software to go into production
- Darlington Nuclear Generating Station brought
on-line January 1990 - OHN agreed to redesign and rewrite software
- had to establish standards and rigorous process
- standard issued at the end of 1990
6Comments from the trenches ...
- Hydro agreed reluctantly to the verification
exercise - why - If you borrow a lot of money someone usually
wants it back - Energy Probes estimate of the cost of building
Darlington NGS - 14B or 3972/kWe
- From an estimate posted on the McMaster website
(engphys.mcmaster.ca) - 2000/kWe
- At 8 this works out to about 1.8M per day in
interest using the lower figure - If the reactor is running, in one day you could
earn 6,343,200 _at_ 75/Mwhr - Cost of the formal verification was on the order
of 60 X 60 X 52 X 100 (personhours/week
weeks/year /hour/person) 18M - as long as it
didnt hold up the license to operate - Hydro hired Nancy Leveson and the AECB hired
David Parnas
7Comments from the trenches (contd)
- From a paper by Parnas 4
- Inspectors ...need quiet time to think
- ..inspections must be interrupted by breaks,
evenings, and weekends - ..results of inspections must be scrutinized
carefully in open discussions - Typical hours were 7am to 8pm plus some
over-nights - no holidays or weekends off for a
year
8Comments from the trenches (contd)
- Typical sample of code from SDS1
- Sample is 433 lines
- 328 lines are comments
- 68 lines are declaration (one variable per line)
- 34 lines are executable (6K/line)?
- This would be considered reasonably complex
- The corresponding program function tables would
be about 21 pages. - The complete set of function tables was
twenty-four 2 binders
9Comments from the trenches (contd)
- Misconceptions and issues (quotes from 4)
- Misconception the SDS software initiates the
shutdown - shutdown of the reactor is poised to trigger
based on a timer - on each successful execution cycle, the software
prevents the shutdown from automatically
triggering - failsafe system design
- An increase in reliability results in an increase
in safety - safety and reliability are not the same
- ..safety requires correctness..
- the software can be incorrect and still safe
- safety and correctness are disjoint
- There was a coding error that did affect
safety... - not aware of any coding error that affected
safety - ..hazard analysis should not have been performed
on the code - hazard analysis focuses on the safety critical
aspect of the code
10Comments from the trenches (contd)
- Final comments
- The formal verification process grinds incredibly
fine - Upwards of 30M was spent with no increase in
safety and possibly a decrease - There was extensive focus on meeting requirements
but not on meeting safety objectives - The degree of rigor applied to the symbolic code
was disproportionate to the risk - the formal process ignores issues such as kernel,
compiler, timing
11OASES
- Ontario Hydro/ AECL Software Engineering and
Standards - prepared high-level standard for safety critical
software - methodology independent
- define requirements for the software engineering
process - define the outputs from the process
- define the requirements that must be met by each
output - developed a process for categorization of
software according to impact on safety - developed procedures for development of safety
critical software - developed tools to support software engineering
process
12Safety Critical Software Process
13Features of this Process
- formal specification
- behaviour of software described using
mathematical functions in a notation with well
defined syntax and semantics - review and verification
- outputs from each process must comply with inputs
to that process - outputs written using mathematical functions must
be systematically verified against inputs using
mathematical verification techniques - manual reviews against standards, checklists
14Features of this Process (contd)
- reliability testing
- uses statistically valid random testing
- software hazard analysis
- analyses performed to identify and evaluate
potential unsafe failures in the computer system
and in the software component of the system - eliminate or assist in reduction of risks to an
acceptable level
15Experience with the Process (1)
- trial use of process
- microprocessor based digital trip meter for
Pickering B NGS - temperature detectors send current back to trip
meter - trip action initiated if signal exceeds setpoint
- 1500 lines of C code
- 25 of entire development cost was in two formal
verifications - 27 of entire development cost was in four levels
of testing - 5 of entire development cost was writing the code
16Experience with the Process (2)
- SDS rewrite
- new software installed in the field
- field installation started February 1999 ended
December 1999 - 5-year project
- kept existing hardware
- redeveloped software according to step-wise
refinement and information hiding - about 40 to 50 people overall on project
- largest group was formal verification group at 20
people - have some tool support
- prototype verification system (PVS) (theorem
prover) from Stanford Research - other in-house tools for formatting and
consistency checking - input to one tool had to be produced by hand
- 600 pages with word processor
17Strengths
- formal tabular notation
- provides for more complete specifications
- errors eliminated earlier in lifecycle
- facilitates formal verification
- precise specifications leads to code with few
errors - hazard analysis provides confidence in safety
objectives of software - SRS and SDD used to generate test cases
- achieved known test coverage
- validation tests using DID
- done by individual outside development process
- identified problems both during test case
creation and during testing
18Weaknesses
- lack of tool support
- in particular, code review manually intensive
- checklist with 154 questions
- for creating and verifying test cases
- hazard analysis done too late in design cycle
- changes may have to be done back at SDD or SRS
- unit testing did not find any functionality
errors not already identified through other
verification processes
19Standard CE-1001-STDRev. 1 January 1995
- Still in use
- Specifies requirements for the engineering
characteristics of safety critical software for
nuclear generating stations - Not for all types of applications
- Purpose is to provide confidence that the safety
critical software product is developed with an
acceptable level of quality
20Structure of Standard (1)
- covers
- requirements definition and verification
- software design and verification,
- code implementation,
- verification and testing,
- planning,
- configuration management and training.
21Structure of Standard (2)
- Minimum set of software engineering tasks
- Minimum set of outputs for each task
- Quality of outputs for each task must meet
defined quality objectives, quality attributes
and fundamental principles. - To be acceptable, a software product must meet
all these requirements.
22Three categories of tasks
- development
- verification
- support
23Software Engineering Development Tasks, Outputs
and Sample Requirements.
24 25Software Engineering Verification Tasks, Outputs
and Sample Requirements.
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Software Engineering Support Tasks, Outputs and
Sample Requirements.
30 31(No Transcript)
32Quality Objectives
- Functionality
- Maintainability
- Reliability
- Reviewability and
- Safety.
- These quality objectives are supported by a set
of quality attributes.
33Quality Attributes
- Completeness
- Consistency
- Correctness
- Modifiability
- Modularity
- Predictability
- Robustness
- Structuredness
- Traceability
- Understandability and
- Verifiability.
34Fundamental Principles
- Set of high level guidelines on which the
software engineering principles in this standard
are based. - Measures of the presence or the degree of
adherence to a quality attribute are derived from
the fundamental principles - When measures are satisfied, the quality
objectives are met and the product is fit for
use.
35(No Transcript)
36Need measures Many measures are subjective out
of necessity.