Title: Decisions, Decisions
1Risk Reliability An Application of High
Reliability Theory
TIAS Lecture 2006 April 27 afternoon
session Risk management
Gerd Van Den Eede Department of Business
Administration VLEKHO Business School
Brussels gvdeede_at_vlekho.wenk.be PhD Candidate at
UvT, supervised by Prof. Dr. P. Ribbers Dr. B.
Van de Walle
2Overview
- An Era of Complexity
- On Reliability
- On Error
- On Operational Risk
- The Rationale of Reliability
- High Reliability Theory
3An Era of Complexity
4Probability of performing perfectly in complex
systems
Probability of success, each element
0.95
0.99
0.999
0.9999
of steps
1798
5The Systems Space
Bennet Bennet, 2004
6Cynefin (kun-evin) Framework
Un-ordered Domains
Ordered Domains
0647
7Interaction/Coupling Matrix
Ch. Perrow. Normal Accidents, 1984, p. 327.
8Reflection 1
- Can you think of a process within your
organization that is characterized by complexity
and/or tight-coupling? - Why is this so? How does the organization deal
with it?
9On Reliability
10Defining Reliability
- 1.The measurable capability of an object to
perform its intended function in the required
time under specified conditions. (Handbook of
Reliability Engineering, Igor Ushakov editor) - 2.The probability of a product performing without
failure a specified function under given
conditions for a specified period of time.
(Quality Control Handbook, Joseph Juran editor) - 3.The extent of failure-free operation over time.
(David Garvin)
1966
11Quantifying Reliability
- Reliability Number of actions that achieve
the intended result Total number of actions
taken - Unreliability 1 minus Reliability
- It is convenient to use Unreliability as an
index, expressed as an order of magnitude (e.g.
10-2 means that 1 time in 100, the action fails
to achieve its intended result) - Related measure Time or counts between failures,
for example transplant cases between organ
rejection, employee work hours between lost time
injuries.
1966
12Different Views on Reliability
1966
13Combining Flexibility Reliability
14Reliability Flexibility as vectors
www.physicsclassroom.com/ mmedia/vectors/va.html
15Reflection 2
- What is meant by Reliability in your
organization? Is this always the case? On what
does it depend? - What measures are in place to guarantee this
reliability? - How does reliability relate to flexibility?
16On Operational Risk
17Criticality of resources depends on business
processes running on these systems!
Marc Geerts, KBC 2004
18Definition Operational Risk (Basel II)
- Operational Risk (OpR) is the risk of loss
resulting from inadequate or failed internal
processes, people or systems and from external
events (BIS - Basel II) - BIS / EU definition
- includes legal tax risk
- excludes strategic, reputational and systemic
risks -
KBC, Ph. Theus, July 2004
19Operational risk other risk categories
Business Risk
Operational Risk
Credit Risk
Liquidity Risk
Market Risk
Raft International, 2003
20- Examples of operational risks
- Wrong pricing model (formula) used by dealers
- Double / non (timely) execution of payments
- Collateral not properly executed
- Losses due to internal / external fraud
- Selling wrong product to wrong type of customer
- Selling products without proper authorisation or
outside the scope of a given license - Fire, flooding, terrorism
- Etc.
KBC, Ph. Theus, July 2004
21Recent losses in the financial industry erisk.com
OpRisk Management is not (only) about trying to
avoid the little big one that could bring down
the bank
Who is next ?
KBC, Ph. Theus, July 2004
22- OPERATIONAL RISK MANAGEMENT includes
- Identification of risks
- Assessment of exposure to risks
- Mitigation of risks
- Monitoring and reporting
- Degree of formality and sophistication of the
banks operational risk management framework
should be commensurate with the banks risk
profile
KBC, Ph. Theus, July 2004
23Reflection 3
- What is the worst thing that can happen to your
organization? - Is it sufficiently covered? Is there a shared
opinion about how the organization should deal
with this risk? - With risk in general?
24On Error
25Nominal human error rates for selected activities
1966
26Different approaches
- The problem of human error can be viewed in 2
ways - The person approach
- The system approach
- Each has its model of error causation, and each
model gives rise to different philosophies of
error management
1567
27Person approach, basis
- The long-standing and widespread tradition of
person approach focuses on the unsafe acts
-errors and procedural violations- of people on
the front line. - This approach views these unsafe acts as arising
primarily from aberrant mental processes such as
forgetfulness, inattention, poor motivation,
carelessness, negligence, and recklessness. - People are viewed as free agents capable of
choosing between safe and unsafe mode of
behavior. - If something goes wrong, a person or group must
be responsible.
1567
28 person approach, why?
- Blaming individuals is emotionally more
satisfying than targeting institutions. - Uncoupling of persons unsafe acts from any
institutional responsibility is in the interests
of managers - Person approach is also legally more convenient.
1567
29Person approach shortcomings
- Three important features of human error tend to
be overlooked - It is often the best people who make the worst
mistakes- error is not the monopoly of an
unfortunate few - Far from being random, mishaps tend to fall into
recurrent patterns. The same set of circumstances
can provoke similar errors, regardless of the
people involved. - The pursuit of greater reliability is seriously
impeded by an approach that does not seek out and
remove the error-provoking properties within the
system
1567
30System Approach
- Humans are fallible and errors are to be
expected, even in the best organizations - Errors are seen as consequences rather than
causes, having their origins not so much in the
perversity of human nature as in upstream
systemic factors. - Although we can not change the human conditions,
we can change the conditions under which the
human work. - A central idea is that of system defenses. All
hazardous technologies posses barriers and
safeguards. When an adverse event occurs, the
important issue is not who blundered, but how and
why the defenses failed.
1567
31(No Transcript)
32Swiss Cheese Model
- Defenses, barriers, and safeguards occupy a key
position in the system approach. High technology
systems have many defensive layers - some are engineered (alarms, physical barriers,
automatic shutdowns), - others rely on people (surgeons, anesthetists,
pilots, control room operators), - and others depend on procedures and
administrative controls. - In an ideal word, each defensive layer would be
intact. In reality, they are more like slices of
Swiss cheese, having many holes- although unlike
in the cheese, these holes are continually
opening, shutting, and shifting their location. - The presence of holes in any one slice does not
normally cause a bad outcome. Usually this can
happen only when the holes in many layers
momentarily line up to permit a trajectory of
accident opportunity- bringing hazards into
damaging contact with victims.
1567
33- The holes in the defenses arise for 2 reasons
- Active failures
- Latent conditions
- Latent conditions can translate into
error-provoking conditions within the workplace
(time pressure, understaffing, inadequate
equipment, fatigue, and inexperience) - They can create long-lasting holes and weaknesses
in the defenses (untrustworthy alarms and
indicators, unworkable procedures, design and
construction deficiencies). - Latent conditions may lie dormant within the
system for many years before they combine with
active failures and local triggers to create an
accident opportunity. - Active failures are often hard to foresee but
latent conditions can be identified and remedied
before an adverse event occur.
1567
34Reflection 4
- Where are the holes in your cheese? Are they
dynamic? - How do you deal with them? Do you reinforce the
layers? Do you implement new layers?
35The Rationale of Reliability
36The Edge
Normally Safe
Inherently Safe
No need
Return on Capital Invested
6
9
12
Normally Safe
The Edge
Safety Management Systems
Safety Culture
Patrick Hudson - Leiden University
37Flexibility
Reliability
38The interaction between Reliability and
Flexibility
-
- Financial performance
- Mindfulness
-
Flexibility RD ? innovation Adaptability to
changes
Long Term Reliability
Short Term Reliability
-
- Stakeholders
- confidence
- Respons
-
391830, p.74
40Relationship between production and protection
(Reason, 1997)
BANKRUPTCY
Parity zone
High hazard ventures
Low hazard ventures
CATASTROPHE
41The lifespan of a hypothetical organization
(Reason, 1997)
BANKRUPTCY
CATASTROPHE
42Achieving a small a ?
43 or achieving a small ß ?
44Reduced scope of action
Continuous updating of procedures to
avoid recurrence of past accidents and incide
nts
Scope of Regulated action
History of system
Actions sometimes needed to get the job done
Adapted from Reason, 1997
45Practical drift
2 3 1 4
Friendly Fire. Snook (2000), p. 186
46Diabolo
I N D I V I D U A L R I S K S
H R T T O O L S
A G G R R I S K S
H R T S O L U T I O N S
H R T P R I N C
R I S K C A T
C T
47Reflection 5
- What is the null hypothesis in your organization?
- Can your organization be called working harder
or working smarter - Are there traces of some kind of practical drift?
48High Reliability Theory
49(No Transcript)
50NAVAL AVIATION MISHAP RATE
1798
51Defining High Reliability Theory (HRT)
- How often could this organization have failed
with dramatic consequences? If the answer to the
question is many thousands of times the
organization is highly reliable - Examples nuclear power plants, aircraft
carriers, air traffic control, emergency
services, army, SWIFT, Nissan, Railways.
52Defining High Reliability Organizations (HROs)
- HROs face complexity and tight-coupling in the
majority of processes they run. - HROs are not error-free, but errors dont disable
them - HROs are forced to learn from even the smallest
errors
53Mindfulness
HRT
Decoupling Process Design
54(No Transcript)
55- 1. Preoccupation with failure Systems with
higher reliability worry chronically that
analytic errors are embedded in ongoing
activities and that unexpected failure modes and
limitations of foresight may amplify those
analytic errors. The people who operate and
manage high reliability organizations assume
that each day will be a bad day and at
accordingly. but this is not an easy state to
sustain, particularly when the thing about which
one is uneasy has either not happened, or has
happened a long time ago, and perhaps to another
organization (Reason, 1997, p. 37). These
systems have been characterized as consisting of
collective bonds among suspicious individuals
and as systems that institutionalize
disappointment. To institutionalize
disappointment means, in the words of the head of
Pediatric Critical Care at Loma Linda Childrens
Hospital, to constantly entertain the thought
that we have missed something. - 2. Reluctance to simplify interpretations All
organizations have to ignore most of what they
see in order to get work done. The crucial issue
is whether their simplified diagnoses force them
to ignore key sources of unexpected difficulties.
Mindful of the importance of this tradeoff,
systems with higher reliability restrain their
temptations to simplify. They do so through such
means as diverse checks and balances, adversarial
reviews, and cultivation of multiple
perspectives. At the Diablo Canyon nuclear power
plant people preserve complexity in their
interpretations by reminding themselves of two
things (1) we have not yet experienced all
potential failure modes that could occur here
(2) we have not yet deduced all potential failure
modes that could occur here.
56- 3. Sensitivity to operations People in systems
with higher reliability tend to pay close
attention to operations. Everyone, no matter what
his or her level, values organizing to maintain
situational awareness. Resources are deployed so
that people can see what is happening, can
comprehend what it means, and can project into
the near future what these understandings predict
will happen. In medical care settings sensitivity
to operations often means that the system is
organized to support the bedside caregiver. - 4. Cultivation of resilience Most systems try to
anticipate trouble spots, but the higher
reliability systems also pay close attention to
their capability to improvise and act without
knowing in advance what will happen. Reliable
systems spend time improving their capacity to do
a quick study, to develop swift trust, to engage
in just-in-time learning, to simulate mentally,
and to work with fragments of potentially
relevant past experience. - 5. Willingness to organize around expertise
Reliable systems let decisions migrate to those
with the expertise to make them. Adherence to
rigid hierarchies is loosened, especially during
high tempo periods, so that there is a better
matching of experience with problems. - adapted from Karl E. Weick Kathleen M.
Sutcliffe, Managing the Unexpected,
Jossey-Bass, 2001
57Sensemaking vs. decision-making
- If I make a decision it is a possession I take
pride in it I tend to defend it and not to
listen to those who question it. - If I make sense, then this is more dynamic and I
listen and I can change it. A decision is
something you polish. Sensemaking is a direction
for the next period. - --Paul Gleason
1930
58Modified from Richard I. Cook, MD (1997)
59The Culture Premise
Top managements Beliefs Values Actions
Communication Credible Consistent Salient
Perceived values, philosophy Consistent
Intensity Consensus
Rewards Money Promotion Approval
Employees beliefs, attitudes and
behaviors expressed as norms
Adapted from OReilly (1989) Corporations,
Culture, and Commitment Motivation and Social
Control in Organizations. In California
Management Review, Summer 1989, Vol. 31, No. 4.
60HROs simultaneously minimizeType I Type II
errors
- HROs are able to strike a balance that
- minimizes Type I errors
- (catastrophic failure)
- while at the same time
- keeps Type II errors
- (excessive and costly conservatism)
- at acceptable levels.
- (Little, 2005)
61Reflection 6
- Recall an experience in any setting in which
the request that you try harder, be careful,
or stay alert improved your performance. Why
did that work? - Identify a process in your organization that
relies on vigilance. What would you estimate its
reliability to be?