Engineering Judgement - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Engineering Judgement

Description:

... radical changes in the way we develop and maintain systems, and certify them. ... Such criteria are used for ETOPS certification of aircraft engines ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 23
Provided by: martyn2
Category:

less

Transcript and Presenter's Notes

Title: Engineering Judgement


1
Engineering Judgement
  • Martyn Thomas
  • Visiting Professor of Software Engineering
  • Oxford University Computing Laboratory
  • martyn_at_thomas-associates.co.uk

2
Engineering Judgement
When I hear the words engineering judgement I
know they are just going to make up
numbers. Richard Feynman, 1988.
3
The argument in brief
  • Almost all safety-related systems have target
    failure probabilities (pfh) below 10-5/hour
  • Assuring such a pfh would require evidence that
    is rarely available at the time of certification.
  • Assessors therefore rely on their engineering
    judgement. In effect, they make up numbers.
  • Accepting that this is inevitable, we need to
    make radical changes in the way we develop and
    maintain systems, and certify them.

4
Safety Integrity LevelsHigh demand
IEC 61508
Even SIL 1 is beyond reasonable assurance by
testing. It would take 10 years under
operational conditions, no failures no
modifications. What sense does it make to attempt
to distinguish single factors of 10 in this way?
Do we really know so much about the effect of
different development methods on product failure
rates? Of course not!
5
What would provide adequate evidence for 10-5 pfh?
  • Sufficient operational measurements
  • Proof of correct implementation of a correct
    specification
  • What do we actually use?
  • Testing
  • Process-based evidence
  • Compliance with standards

6
Sufficient Operational Measurements
  • For 10-n pfh, at least 10n hours without unsafe
    failure or modification.
  • Such criteria are used for ETOPS certification of
    aircraft engines
  • Such an approach is impractical for most
    safety-related transport systems

7
Proof of Correctness
  • Proof is an important form of verification. It
    can show that a system meets its specification,
    but provides no absolute information about the
    probability of unsafe failure.
  • It is very difficult to prove that all possible
    unsafe system states have been considered.
  • Full formal proof is very expensive.

8
What do we actually do?
  • Testing
  • Process-based evidence
  • Compliance with standards

9
Testing
  • What can testing tell us?
  • If the tests were statistically representative of
    the operation, then sufficient tests would show
    pfh.
  • If a mathematical analysis had established
    equivalence classes, then testing a member of
    each class would allow an inductive proof that
    there could be no failures.
  • How the system behaves on the tests.
  • nothing else ?

10
Process-based evidence
  • Good processes do not guarantee safe products
  • but poor processes almost guarantee unsafe ones
  • Good processes are essential if you need to trust
    their output (eg version control).
  • The output from a good process may provide useful
    evidence.
  • For example, if you can trust a proof process,
    the proof may tell you something about the
    systems properties

11
Compliance with standards
  • The nice thing about standards is that there are
    so many to choose from Andrew Tanenbaum
  • Standards result from negotiation in committee,
    often with strong vested interest from industry.
  • It would be surprising if they represented best
    practice
  • and astonishing if they led to radical
    improvements
  • Much effort goes into meeting standards that
    would be better spent improving safety.

12
An aside on SIL 0
  • If your safety argument allows the use of
    components with pfh gt 10-5 then IEC 61508 assumes
    that normal industrial software will be good
    enough. That is absurd.
  • Little industrial/commercial software has an MTBF
    approaching one year
  • nor does it come with a safety analysis, or
    failure history
  • I believe that all safety-related software should
    be developed to higher standards than almost all
    industrial software has been to date.

13
An aside about maintenance
  • In principle, any system change invalidates all
    the operational history of that system
  • unless you can prove that the change has some
    restricted impact (which, typically, you cannot)
  • So should all the original assurance activities
    be repeated?
  • Obviously, yes. Although some of the outputs may
    be able to be re-used.
  • Does this happen? Not in my experience.
  • It seems likely that we shall see an increasing
    number of incidents caused by defects introduced
    in maintenance.

14
Safety Assurance the state of practice
  • There is insufficient empirical evidence to
    justify even the pfh associated with SIL 1, to
    99 confidence.
  • Development methods and tools in common use are
    too informal to support reasoning about
    correctness.
  • So most attention is given to process issues and
    conformance with standards, despite the very weak
    causal link with safety.
  • We usually get away with it because people are
    very careful and try very hard (and very
    expensively).
  • It seems unlikely that this approach will scale
    up.

15
  • We are like the barber-surgeons of earlier ages,
    who prided themselves on the sharpness of their
    knives and the speed with which they dispatched
    their duty -- either shaving a beard or
    amputating a limb.
  • Imagine the dismay with which they greeted some
    ivory-towered academic who told them that the
    practice of surgery should be based on a long and
    detailed study of human anatomy, on familiarity
    with surgical procedures pioneered by great
    doctors of the past, and that it should be
    carried out only in a strictly controlled
    bug-free environment, far removed from the hair
    and dust of the normal barbers shop. (Professor
    Sir Tony Hoare 1984)

16
A possible future
  • Greater rigour with minimal innovation
  • Minimal defect construction
  • Maintenance as the central activity
  • Licensing of independent safety assessors
  • New-generation Safe COTS components
  • Regulation to drive radical change

17
Greater rigour with minimal innovation
  • Our systems are among the most complex ever
    attempted. We must adopt the power of mathematics
    to master that complexity.
  • A good scientist is a person with original ideas.
    A good engineer is a person who makes a design
    that works with as few original ideas as
    possible. There are no prima donnas in
    engineering. Freeman Dyson 2001.

18
Minimal defect construction
  • Dijkstra observed in 1972 that most of the cost
    in developing software came from the effort
    required to remove the defects.
  • Praxis Correct by Construction methods are
    delivering lt0.04 defects/KLoC with a productivity
    of gt25 LoC/person-day.
  • That should become the benchmark for professional
    work in safety-related systems. If your methods
    do not deliver such high quality at such low
    costs, change to CbC.

19
Maintenance as the central activity
  • A successful system will spend far more time
    being used and maintained than being developed.
  • Our development methods and tools, and our
    assessment and certification protocols, should
    focus on safe and cost-effective maintenance.

20
Licensing of independent safety assessors
  • Even with far better methods and tools, safety
    assessment and certification will continue to
    depend on judgement.
  • We need to enforce standards of competence
    (education, training and experience) for the
    people whom society trusts to take such
    decisions.

21
New-generation Safe COTS components
  • Most COTS components have not been developed to
    be highly dependable and do not come with the
    evidence needed to allow adequate safety
    assessment.
  • We could redevelop the entire suite of core COTS
    components for a few B.
  • This would be a worthwhile focus for
    international engineering collaboration.

22
Conclusion
  • Current practices cannot be justified they are
    unsafe and/or too expensive. (Either way, not
    ALARP).
  • Radical change must be created progress is too
    slow
  • Software engineers need competence in mathematics
    (discrete and continuous) and statistics. Core
    curriculum.
  • All safety-related systems should be formally
    specified and developed using fully-defined
    languages supported by powerful static analysis
    tools. Not C or C.
  • Safety assessment should be based on the best
    practicable evidence, evaluated by a licensed
    assessor.
  • Core COTS components must be re-implemented
    properly - or avoided.
Write a Comment
User Comments (0)
About PowerShow.com