Title: IT Disaster Recovery
1Information Systems and Technology
IT Disaster Recovery CAUBO 2008
2IST
- Academic Computing and Networking
- Administrative Systems
- Telecommunications
- Classroom Support/Production Systems
- Bannatyne campus
- 165 staff
- 16M budget
- Many locations but single computer room
3IST
Disaster Recovery is the process, policies and
procedures of restoring operations critical to
the resumption of business, including regaining
access to data (records, hardware, software,
etc.), communications (on campus, internet,
CANARIE, etc.), workspace, and other business
processes after a natural or human-induced
disaster.
Most large companies spend between 2 and 4 of
their IT budget on disaster recovery planning
this is intended to avoid larger losses. Of
companies that had a major loss of computerized
data, 43 never reopen, 51 close within two
years, and only 6 will survive long-term.
Hoffer, Jim. "Backing Up Business - Industry
Trend or Event." Health Management Technology,
Jan 2001
What critical systems and data to bring back
while rebuilding DR is not business as usual
4IST
Assumptions
- Dealing only with centrally supported IT
infrastructure and services - Based on the draconian scenario of a total loss
of the computer room - Lesser disasters could make use of parts of the
overall plan - The priorities are set according to University
priorities - in the event of a disaster we will
remain open and continue to teach
5IST
Four phases
- Criticality assessment
- Recoverability assessment
- Recovery scenarios
- IT Recovery Plan
6IST
Criticality Assessment
- The purpose of this activity is to
- Validate the critical functions / processes
- Confirm the business recovery requirements for
the critical business processes including
Recovery Time and Recovery Point Objectives (RTO,
RPO) - Confirm the critical Information Technology
resource requirements such as hardware, software,
network and procedures that are required to
support the critical business processes.
7IST
Criticality Assessment
The University of Manitoba identified ten
business units to participate in the criticality
assessment
Academic Support Advancement Finance Human
Resources Library Services Research Student
Services Telephony Web Services Network Services
8IST
RTO
The recovery time objective (RTO) is the duration
of time and a service level within which a
business process must be restored after a
disaster in order to avoid unacceptable
consequences associated with a break in
continuity.
RPO
Recovery point objective (RPO) (RPO) describes
the amount of data lost measured in time.
Example If the last available good copy of data
upon an outage was from 18 hours ago, then the
RPO would be 18 hours.
9IST
Impact analysis
Financial Company Assets Employee/Faculty,
Customers Legal Regulatory Corp. Image, Public
Image Students Competitive Position
1010
1111
12(No Transcript)
13(No Transcript)
14IST
Notes
- RTO has to include the time to recover the core
network, application and database servers and
data as well as testing the recovered system - A triage or reality check is necessary - most
business units will overestimate RTO - RTO estimates must be in the context of the
business unit but also in the context of
University priorities - We discovered that the critical applications are
not the administrative systems - they are the web
space, portal and learning management system
15IST
Recoverability Assessment
The objective of this activity is to identify
gaps between the capability and the recovery
requirements identified by the business as part
of the criticality analysis.
16IST
Recoverability Assessment
- Template driven
- Each group in IST reviewed their server
infrastructure
- Classified into core, required for disaster
recovery and other
- Application support
- Desktop
- Backups
- Bannatyne campus
- Database
- Email
- Facilities
- Linux
- Unix
- Network
- Novell
- Storage
- Telecommunications
- Web services
- Windows
Colour coding orange for core yellow required
for DR
17(No Transcript)
18IST
Recovery Scenarios
19(No Transcript)
20IST
On campus hot site
- Duplicating critical infrastructure
- Servers, storage, redundant cabling
- Network core switches
- Telecommunication facilities
3M to 4M
21IST
Contracted site
- Duplicating critical infrastructure smaller
level - Servers, storage, commercial networking
- Not including telecommunication facilities
500K per year plus usage, staffing, travel,
network charges
22IST
Trailer
- Duplicating critical infrastructure smaller
level - Servers, storage
- Not including telecommunication facilities
200K per year plus usage, hitching post
23IST
Drop ship
- Annual fee for guaranteed delivery of critical
equipment within 24-48 hours
Cost depends on how much equipment and what kind
24IST
Technology Recovery Plan
25IST
26IST
27IST
28IST
Sample
29IST
Sample