An Investigation of the Therac-25 Accidents - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

An Investigation of the Therac-25 Accidents

Description:

Between June 1985 and January 1987, 6 known accidents involving massive ... June 3 1985: First overdose ... 'There is always another software bug' ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 14
Provided by: jackkust
Category:

less

Transcript and Presenter's Notes

Title: An Investigation of the Therac-25 Accidents


1
An Investigation of the Therac-25 Accidents
  • Nancy G. Leveson
  • Clark S. Turner
  • IEEE, 1993
  • Presented by Jack Kustanowitz
  • April 26, 2005
  • University of Maryland

2
Overview
  • What happened
  • Accident history
  • Development history
  • Technical problems
  • Company responses
  • Lessons learned
  • Ethical questions
  • Resources

3
What Happened
  • Between June 1985 and January 1987, 6 known
    accidents involving massive overdoses, causing
    death serious injury

4
Accident History
  • June 3 1985 First overdose
  • July-Dec 1985 Two more overdoses, patient sues
    AECL and hospital, two requests for modifications
  • Jan-Feb 1986 Denial of possibility of overdose
  • Mar-Apr 1986 Two more overdoses, software blamed
  • May-Dec 1986 FDA declares Therac-25 defective,
    CAPs (Corrective Action Plans) sent back and
    forth between FDA and AECL. First Therac-25 user
    group meeting.
  • Jan 1987 Sixth overdose
  • Feb-July 1987 More CAPs back and forth until
    fifth revision of CAP sent to FDA
  • Nov 1988 Final safety analysis report issued
  • Grueling first-hand descriptions of what it felt
    like to get a massive radiation overddose

5
Development History
  • Therac-6 6 MeV accelerator for x-rays
  • Therac-20 20 MeV dual-mode (x-rays or electrons)
  • Separate hardware interlocks
  • Therac-25 25 MeV dual-mode
  • All safeguards done in software
  • Testing
  • Unit and software testing was minimal, with
    most effort directed at the integrated system
    test
  • Software written in assembly on a PDP-11

6
The Operator Interface
7
The Operator Interface
  • At first, operator needed to enter information at
    the treatment table, and then re-enter at a
    console in the control room
  • Operators complained safeguard was removed
  • Error codes are reported on the screen with no
    English explanation
  • Example (East Texas Cancer Center) Malfunction
    54 reported, caused by dose input 2. An AECL
    technician testified that does input 2 means
    the dose delivered was either too high or too low
    (!)
  • Treatment Pause after non-critical error, which
    operator can ignore by pressing P
  • Causes operators to become insensitive to errors

8
Example Bugs
  • Data Entry Bug
  • Setting the bending magnets takes 8 seconds
  • Delay subroutine uses shared memory with the
    data entry subroutine
  • So data changes within 8 seconds will be wiped
    out when Delay exits!
  • Causes bugs that only show up with proficient
    users who do data entry in lt8 seconds
  • Set-Up Test Bug
  • On every 256th pass through Set-Up (one-byte
    counter), the upper collimator is not checked
  • Problem if operator hits set exactly when
    counter rolls over to 0
  • These kinds of bugs are notoriously difficult to
    track down

9
AECL Responses
  • Denial
  • We did not believe that there could have been
    any accelerator malfunction
  • Incremental, local band-aid fixes
  • Example P key removed to prevent operators
    from ignoring warnings
  • Dragging feet, doing minimum of FDAs requests
  • Perhaps justified? See ethics discussion
  • Knee-jerk responses fix the bugs as they are
    reported
  • Difficulty reproducing bugs (that only happened
    once in several hundred runs)

10
Lessons General
  • Focusing on particular software bugs is not the
    way to make a safe system
  • Assumption that fixing one error would prevent
    further accidents
  • There is always another software bug
  • It is a bad idea to remove independent hardware
    interlocks, and to believe too much in software
  • Assume software will fail, and handle that
    properly, rather than trying to write perfect
    software
  • Dont believe in numerical claims
  • Risk assessment can be like the captured spy if
    you torture it long enough, it will tell you
    anything you want to know
  • Record the reasons for design decisions (like
    duplicate data entry)
  • Design for the worst case
  • Dont enhance usability at the expense of safety
  • Power of user groups to cause change when
    companies drag their feet

11
Lessons Software Engineering
  • Documentation should not be an afterthought
  • Establish QA practices standards
  • Keep designs simple
  • Design audit trails and logging from the
    beginning
  • Perform extensive testing and formal analysis at
    the module and software level, rather than
    relying on system-level testing

Summary of this course!
12
Ethical Questions
  • 500 patients treated in East Texas before first
    serious accident
  • Too much government oversight slows progress
  • If 1 person was getting hurt for every 1000
    helped, would you take the machine out of use?
    How about 1100? 110000? Wheres the line?

13
Resources
  • http//www.technology.niagarac.on.ca/courses/ctec1
    435/notes/Therac-25/SouthPark/01.htm
Write a Comment
User Comments (0)
About PowerShow.com