Title: Optimal redundancy allocation for information technology disaster recovery in the network economy
1Optimal redundancy allocation for information
technology disaster recovery in the network
economy
- Benjamin B.M. Shao
- IEEE Transaction on Dependable and Secure
Computing, Vol. 2, NO. 3, July-September 2005 - Presented by Derek KD Jiang ???
2Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
3Introduction
- Modern organizations have become increasingly
reliant on IT to facilitate business operation. - The issue of how to strengthen IT capability so
that a company can prevent or quickly recover
from disasters becomes a serious concern.
4Introduction
- Perform a impact analysis to
- Identify the disasters likely occur in the
environment. - Evaluate the degree to which IT are vulnerable to
sustain. - Take necessary measures to protect those IT
functions according the importance. - This paper incorporate redundancy into critical
IT functions and aims to maximize the
survivability against potential disasters.
5Introduction
- Adopting cluster-centric approach, this paper
concentrate on managing resources around
independent clusters IT functions where each
cluster is assigned its own dedicated solutions. - An optimization model is proposed, taking into
account the significance of IT functions, the
cost of IT solutions, and the availability of
resources subject to budget limitation.
6Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
7Redundancy for IT disaster recovery
- Redundancy is a design principle of having one or
more backup systems in case of failure of the
main system. - The use of redundancy in preparation for
disasters is of potential advantage due to two
aspects. - Proactive prevention
- Reactive recovery
8Redundancy for IT disaster recovery
- The objective is to select among competing
alternatives for redundancy level and reap the
best returns from a limited budget. - A quantitative model can provide the guidelines
for allocating optimal redundancy levels to
critical IT functions needing to be protected.
9Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
10Redundancy allocation model
- Suppose an organization is planning for taking
measures of redundancy, and the budget is
limited. - Several possible disasters have been identified
with the potential to affect IT functions and to
cause business discontinuity. - How to allocate redundancy to IT functions such
that survivability is maximized and the cost
still remains under budget?
11Redundancy allocation model
12Redundancy allocation model
- The redundancy allocation problem (RAP) is
formulated below
13Redundancy allocation model
- Survivability Smid in this context is defined as
the likelihood of IT asset i to withstand
disaster d and to ensure IT function m remains
operational.
IT function m fails against disaster d only when
all of its selected solutions fail at the same
time.
In other words, as long as one of the selected
solutions survives the disaster, IT function
would be in operation.
14Redundancy allocation model
- Ensures that at least one solution is selected
and allocated to each IT function. Notably, IT
function without redundancy is allowable. - Indicates that the total costs cant exceed the
budget limit B.
15Redundancy allocation model
16Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
17Solution procedure
- The proposed model is a 0-1 integer programming
problem with a nonlinear objective function. - Due to the nonlinearity of the objective
function, LR cannot be employed to tackle this
problem. - A partial enumeration procedure based on
probabilistic dynamic programming is presented.
18Solution procedure
19Solution procedure
- We define a state of system T as the available
budget and stage m as IT function. - Let be the failure rate of the system
composed of IT functions m, m1,, M.
20Solution procedure
- For stage (IT function) m, state (budget) T
cannot exceed the total available budget B minus
the minimum costs to be allocated for stage 1,,
m-1. - T must be at least equal to the cost of the least
expensive solution in the current stage to ensure
at least one solution for IT function m.
For T not in the range, Fm(T) is defined as 1, so
it wont be chosen.
21Solution procedure
- Fm(T) of (4) deals with the risks of disaster
occurrence and involves the calculation of
expected failure rate of IT function m according
to the remaining budget T. - The initial stage mM and,
22Solution procedure
- The optimal objective function value F is
obtained as F1(B), representing the minimum
overall failure rate of the whole system composed
of all M IT functions with a budget of B. - The original maximum overall survivability S of
RAP is then equal to 1 - F1(B).
23Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
24Example
- Two LANs (M2) with weight w1 0.3, w20.7
respectively. - Flooding disaster that occurs with a likelihood
of 0.05 (i.e., p10.05, p20.95 for no disaster). - It considers incorporating redundant bridges into
LAN1 and redundant switches into LAN2 with a
budget B14.
25Example
- For LAN1
- Four types of bridges are available (n14), with
C118, C122, C134, and C146. - The survival rates are S1110.1, S1210.09,
S1310.15, and S1410.21 (i.e., v1110.9,
v1210.91, v1310.85, v1410.79). - Their availabilities when no disaster occurs are
S1120.9999, S1220.9993, S1320.9997, and
S1420.9995 (i.e., v1120.0001, v1220.0007,
v1320.0004, v1420.0005).
26Example
- For LAN2
- Three types of switches are available (n23),
with C214, C226, and C235. - The survival rates are S2110.06, S2210.1,
S2310.2 (i.e., v2110.94, v2210.9, v2310.8). - Their availabilities when no disaster occurs are
S2120.9994, S1220.9990, S1320.9996 (i.e.,
v2120.0006, v2220.0010, v2320.0004)
27Example
28Example
- Starts with stage2
- Since the least expensive switch for LAN2 has
cost C214, and the least expensive bridge for
LAN1 has cost C122, the valid range for T is
. - Equation (6) then calculate F2(T) for T4,, 12.
Take F2(6) for example
(X21, X22, X23)(0, 0, 1), (0, 1, 0), (1, 0, 0).
The minimum F2(T) 0.02827 is associated with
(0, 0, 1).
29Example
30Example
- Next, we proceed to find the optimal solution
F1(14) in the final stage m1.
The minimum F1(14) is associated with (X11, X12 ,
X13, X14) (0, 1, 0, 1), with F 0.03905 using
F2(6) 0.02827. Namely, the maximum
survivability S against flooding equal 1 F
1 0.03905 0.96095.
31Agenda
- Introduction
- Redundancy for IT disaster recovery
- Redundancy allocation model
- Solution procedure
- Examples
- Conclusion
32Conclusion
- Contributions
- It presents one of the earliest quantitative
studies to allocate redundancy for recovery
planning. - An exact solution method based on probabilistic
dynamic programming is presented to help obtain
optimal solution of redundancy allocation. - Through sensitivity analysis, the model can
further help IT managers make betters decisions.
33Conclusion
- IT plays an extremely important role in modern
business operations, nevertheless, it has
potential vulnerabilities against disasters. - RAP redundant allocation model proposed in this
paper can fulfill the need for a structured
decision analysis of recovery planning.
34Conclusion
- For future research, we can further categorize
assets into hardware, software, and other types
to examine the impacts of each asset type on the
redundancy allocation decisions. - Specific assumptions of dependent IT functions or
shared solutions can be made to address a
different set of IT disaster recovery problems.
35