Title: Engineering Judgement
1Engineering Judgement
- Martyn Thomas
- Visiting Professor of Software Engineering
- Oxford University Computing Laboratory
- martyn_at_thomas-associates.co.uk
2Engineering Judgement
When I hear the words engineering judgement I
know they are just going to make up
numbers. Richard Feynman, 1988.
3The argument in brief
- Almost all safety-related systems have target
failure probabilities (pfh) below 10-5/hour - Assuring such a pfh would require evidence that
is rarely available at the time of certification. - Assessors therefore rely on their engineering
judgement. In effect, they make up numbers. - Accepting that this is inevitable, we need to
make radical changes in the way we develop and
maintain systems, and certify them.
4Safety Integrity LevelsHigh demand
IEC 61508
Even SIL 1 is beyond reasonable assurance by
testing. It would take 10 years under
operational conditions, no failures no
modifications. What sense does it make to attempt
to distinguish single factors of 10 in this way?
Do we really know so much about the effect of
different development methods on product failure
rates? Of course not!
5What would provide adequate evidence for 10-5 pfh?
- Sufficient operational measurements
- Proof of correct implementation of a correct
specification - What do we actually use?
- Testing
- Process-based evidence
- Compliance with standards
6Sufficient Operational Measurements
- For 10-n pfh, at least 10n hours without unsafe
failure or modification. - Such criteria are used for ETOPS certification of
aircraft engines - Such an approach is impractical for most
safety-related transport systems
7Proof of Correctness
- Proof is an important form of verification. It
can show that a system meets its specification,
but provides no absolute information about the
probability of unsafe failure. - It is very difficult to prove that all possible
unsafe system states have been considered. - Full formal proof is very expensive.
8What do we actually do?
- Testing
- Process-based evidence
- Compliance with standards
9Testing
- What can testing tell us?
- If the tests were statistically representative of
the operation, then sufficient tests would show
pfh. - If a mathematical analysis had established
equivalence classes, then testing a member of
each class would allow an inductive proof that
there could be no failures. - How the system behaves on the tests.
- nothing else ?
10Process-based evidence
- Good processes do not guarantee safe products
- but poor processes almost guarantee unsafe ones
- Good processes are essential if you need to trust
their output (eg version control). - The output from a good process may provide useful
evidence. - For example, if you can trust a proof process,
the proof may tell you something about the
systems properties
11Compliance with standards
- The nice thing about standards is that there are
so many to choose from Andrew Tanenbaum - Standards result from negotiation in committee,
often with strong vested interest from industry. - It would be surprising if they represented best
practice - and astonishing if they led to radical
improvements - Much effort goes into meeting standards that
would be better spent improving safety.
12An aside on SIL 0
- If your safety argument allows the use of
components with pfh gt 10-5 then IEC 61508 assumes
that normal industrial software will be good
enough. That is absurd. - Little industrial/commercial software has an MTBF
approaching one year - nor does it come with a safety analysis, or
failure history - I believe that all safety-related software should
be developed to higher standards than almost all
industrial software has been to date.
13An aside about maintenance
- In principle, any system change invalidates all
the operational history of that system - unless you can prove that the change has some
restricted impact (which, typically, you cannot) - So should all the original assurance activities
be repeated? - Obviously, yes. Although some of the outputs may
be able to be re-used. - Does this happen? Not in my experience.
- It seems likely that we shall see an increasing
number of incidents caused by defects introduced
in maintenance.
14Safety Assurance the state of practice
- There is insufficient empirical evidence to
justify even the pfh associated with SIL 1, to
99 confidence. - Development methods and tools in common use are
too informal to support reasoning about
correctness. - So most attention is given to process issues and
conformance with standards, despite the very weak
causal link with safety. - We usually get away with it because people are
very careful and try very hard (and very
expensively). - It seems unlikely that this approach will scale
up.
15- We are like the barber-surgeons of earlier ages,
who prided themselves on the sharpness of their
knives and the speed with which they dispatched
their duty -- either shaving a beard or
amputating a limb. - Imagine the dismay with which they greeted some
ivory-towered academic who told them that the
practice of surgery should be based on a long and
detailed study of human anatomy, on familiarity
with surgical procedures pioneered by great
doctors of the past, and that it should be
carried out only in a strictly controlled
bug-free environment, far removed from the hair
and dust of the normal barbers shop. (Professor
Sir Tony Hoare 1984)
16A possible future
- Greater rigour with minimal innovation
- Minimal defect construction
- Maintenance as the central activity
- Licensing of independent safety assessors
- New-generation Safe COTS components
- Regulation to drive radical change
17Greater rigour with minimal innovation
- Our systems are among the most complex ever
attempted. We must adopt the power of mathematics
to master that complexity. - A good scientist is a person with original ideas.
A good engineer is a person who makes a design
that works with as few original ideas as
possible. There are no prima donnas in
engineering. Freeman Dyson 2001.
18Minimal defect construction
- Dijkstra observed in 1972 that most of the cost
in developing software came from the effort
required to remove the defects. - Praxis Correct by Construction methods are
delivering lt0.04 defects/KLoC with a productivity
of gt25 LoC/person-day. - That should become the benchmark for professional
work in safety-related systems. If your methods
do not deliver such high quality at such low
costs, change to CbC.
19Maintenance as the central activity
- A successful system will spend far more time
being used and maintained than being developed. - Our development methods and tools, and our
assessment and certification protocols, should
focus on safe and cost-effective maintenance.
20Licensing of independent safety assessors
- Even with far better methods and tools, safety
assessment and certification will continue to
depend on judgement. - We need to enforce standards of competence
(education, training and experience) for the
people whom society trusts to take such
decisions.
21New-generation Safe COTS components
- Most COTS components have not been developed to
be highly dependable and do not come with the
evidence needed to allow adequate safety
assessment. - We could redevelop the entire suite of core COTS
components for a few B. - This would be a worthwhile focus for
international engineering collaboration.
22Conclusion
- Current practices cannot be justified they are
unsafe and/or too expensive. (Either way, not
ALARP). - Radical change must be created progress is too
slow - Software engineers need competence in mathematics
(discrete and continuous) and statistics. Core
curriculum. - All safety-related systems should be formally
specified and developed using fully-defined
languages supported by powerful static analysis
tools. Not C or C. - Safety assessment should be based on the best
practicable evidence, evaluated by a licensed
assessor. - Core COTS components must be re-implemented
properly - or avoided.