Title: Experiences with D/R Procedures
1Experiences with D/R Procedures
- Of ADABAS Data on Mainframes
- Natural Conference Boston
- Dieter W. Storr
- May 2004
- info_at_storrconsulting.com
2(No Transcript)
3Different Disaster Different Action
- Unplanned downtime
- Machine outages
- Network outages
- Software failures
- Disaster
- Site / data center loss
- Catastrophic failure
4Leading Causes of DowntimeSource DRJ Summer
2002, Volume 15, Number 3
Power Storm Flood
Terrorism Outage
Damage Sabotage
5Other Causes of Downtime
- Fire
- Earthquake
- Computer Crime
6LA Times Downtime
- Flood Damage 21 April 2002
- Water was flooding through the Orange County
facility, 14-inch pipe that supplies the
fire-sprinkler system burst, half the facility
standing in more than a foot of muddy water - Affected areas editorial, ad ops, IT,HR,ADABAS
was not affected
7LA Times Downtime
- Bomb Alarm 14 June 2002
- A bomb was believed to have been left in the Bank
of America branch thats set into the Times
Building - Security swept the building,
- DBAs observed the system from home
8LA Times Downtime
- Bomb Alarm 29 July 2002
- An intruder claimed to have a bomb, darted into
the garage - Security swept the building,
- OP stopped CA7 - so PLOGCOPY couldnt start
automatically, two PLOGs got full, ADABAS was
locked, DBAs later started the PLCOPY jobs
manually
9LA Times Downtime
- Power Outage - 29 August 2002 (343 P.M.)
- City (DWP) had a power grid, flood leaked into a
DWP transformer - There were actually 2 spikes/outages, the first
started the UPS switchover, which was interrupted
by the second, which took the UPS down.
10LA Times Downtime
- Power Outage - cont
- The network was back in service after a short
delay. - Our Unix-based servers were restarted, and
checked. There was no evidence of damage to the
Sybase Adaptive Server Enterprise (ASE, formerly
Sybase SQL Server) servers.
11LA Times Downtime
- Power Outage - cont
- Mainframe recovery was delayed due to corruption
to the Hardware Management Console (HMC) - OP did a power-on reset, which restored the HMC
- Operations IPLed, and Technical Support proceeded
with system checkout procedures. - Although Enterprise Storage Server (ESS) had an
error indicator, it was still up and did not add
to any outages - IBM reset error indicator without impact.
12LA Times Downtime
- Power Outages - cont
- Started ADABAS servers manually Parm Error 23,
DIB block remained after an abnormal termination - Started all servers with IGNDIBYES 1825
ADABAS IS ACTIVENO ADAN58 Message
13LA Times Downtime
- ADAN58 Message (ADA71 ADAN5A)
- ADAN58 BUFFER-FLUSH START RECORD DETECTED DURING
AUTORESTART. - THE NUCLEUS WILL T E R M I N A T E AFTER
AUTORESTART. IN CASE OF POWER FAILURE, THE
DATABASE MIGHT BE INCONSISTENT BECAUSE OF
PARTIALLY WRITTEN BLOCKS. - O N L Y IN THIS CASE, REPAIR THE DATABASE BY
RESTORE AND REGENERATE OTHERWISE RESTART THE
NUCLEUS. - ADAN5A FILES MODIFIED DURING AUTORESTART files
14Power Failure During Buffer Flush
A
B
C
D
old block updated block partially updated block
on disk
E
F
C
H
E
F
C
D
15Nucleus Restart After Power failure -
IGNDIBYES ltsnipgt ADA200 00230 User exit 2
active. ADA201 00230
PLOG2 closed. ADAP3X2P submitted.
ADAN21 00230 PROTECTION-LOG PLOGR1 STARTED
ADAN02 00230 NUCLEUS-RUN WITH
PROTECTION-LOG 00677 ADAL02 00230
2002-08-29 182518 CLOGRS IS ACTIVE
ADAN03 00230 ADABAS COMING UP
ADAN5A 00230 FILES MODIFIED DURING
AUTORESTART ADAN5A 00230 00038
00057 00069 00072 00073 00074 ADAN5A
00230 00075 00076 00104 00138 00139 00148
ADAN5A 00230 00195 00221 00243
ADAN19 00230 RUNNING WITH
ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00230
FILE-LEVEL CACHING INITIALIZED
ADAN80 00230 ADABAS DYNAMIC CACHING ENVIRONMENT
ESTABLISHED. ADAN01 00230 A D A B A S V6.2.2
IS ACTIVE ADAN01 00230 MODE
MULTI I S O L A T E D ADAN01
00230 RUNNING WITHOUT RECOVERY-LOG
ADA800 00230 User exit 8 active. ltsnipgt
16LA Times Downtime
- Power Outage - cont
- Switched all PLOGs
- Checked batch and online
- There was no evidence of damage to any of the
ADABAS components.
17Other LA Times Disasters
- 1965 Watts riots
- 1971 Sylmar quake 6.5
- 1987 Whittier punch 5.9
- 1992 LA riots
- 1994 Northridge quake 6.7
- 6 Feb 1998 El Nino, flooding in B-1 computer
room - 15 April 1999 Power failure news editing
18ADABAS Recovery
CLOG
- Command Log (CLOG) Failure - I/O Error
- Restore or reallocate/format the CLOG
- ADABAS will come up through Autorestart normally
- No data loss if CLOG is not used
19ADABAS Recovery
PLOG
PLOG
- Protection Log (PLOG) Failure - I/O Error
- Restore or reallocate/format the PLOG
- Take a full back-up of the database
- ADABAS will come up through Autorestart normally
- Restart batch jobs
- Restartable batch jobs OK
- Non-restartable batch jobs check
20ADABAS Recovery
TEMP
SORT
- TEMP and SORT Failure - I/O Error
- Restore or reallocate/format the TEMP/SORT
dataset - Different actions for the utilities
- See the ADABAS Utilities manuals
21ADABAS Recovery
DSIM
- DSIM Failure - I/O Error
- Restore or reallocate/format a DSIM dataset
- Different actions for the utilities
- See the ADABAS Utilities manuals
22ADABAS Recovery
RLOGM
RLOGR
- Recovery Aid Dataset Failure - I/O Error
- Restore or reallocate/format a RLOG dataset
- Prepare the RLOG dataset
- ADARAI PREPARE RLOGSIZE / RLOGDEV.
- Different actions for the utilities
- See the ADABAS Utilities manuals
- Take a full back-up of the database
- This will start the first generation of the RLOG
dataset
23ADABAS Recovery
ASSO
ASSO
DATA
DATA
- ASSO/DATA Failure - I/O Error
- Copy PLOG twice - ADARES PLCOPY
- Restore or reallocate/format DATA dataset(s)
- Instead of reallocate/format and restore all DATA
volumes, System specialists can - Reallocate and format the new volume
- Restore the VTOC chain
- Restore and Regenerate only files that were
located on the failed volume - Otherwise, . . .
24ADABAS Recovery
ASSO
ASSO
DATA
DATA
- ASSO/DATA Failure - I/O Error
- Restore entire database ADASAV RESTORE
OVERWRITE for GCB ADASAV RESTONL
OVERWRITEinclude PLOG - Start nucleus with UTIONLYYES
- Regenerate updates from end of last save
(SYN2)ADARES REGENERATE PLOGNUMxxxADARES
FROMCPSYN2,FROMBLKxxx
25ADABAS Recovery
ASSO
ASSO
DATA
DATA
- ASSO/DATA Failure - I/O Error
- Possible utilities need to be rerun (see ADARES)
- ADALOD LOAD FILExxx
- ADALOD UPDATE FILExxx
- ADALOD UPDATE FILExxx,DDISN
- ADAINV INVERT FILExxx,FIELDxx
- Lock files to rerun utilities
- ADADBS OPERCOM LOCKUxx
- Unlock utility-only status
- ADADBS OPERCOM UTIONLYNO
26ADABAS Recovery
ASSO
ASSO
DATA
DATA
- ASSO/DATA Failure - I/O Error
- Rerun the regenerate function for the relevant
files - Unlock the regenerated files
- ADADBS OPERCOM UNLOCKUxx
- Dont repeat these steps if ADARES points out
- ADALOD LOAD FILEnn
- ADARES REGENERATE FILEnn
- ADADBS REFRESH FILEnn
- Nucleus is ready
27ADABAS Recovery
WORK1
WORK2
WORK3
- WORK 1 Failure - I/O Error
- Restore or reallocate/format the WORK dataset
- Restore and regenerate the entire database to
avoid inconsistencies open transactionsSee
ASSO/DATA failure
28ADABAS Recovery
WORK1
WORK2
WORK3
- WORK 2/3 Failure - I/O Error
- End the database normally (ADAEND) to avoid open
transactions in part 1 of WORK - Restore or reallocate/format the WORK dataset
- Restart the database normally
- If database abends then restore and regenerate
the entire database - see ASSO/DATA failure
29ADABAS Recovery
DATA
DS
DS
- Failure in Data Storage Blocks
- //DDSIIN DD DSNSAVE.SIBA.
- // DD DSNPLCOPY.LOG1
- // DD DSNPLOCPY.LOG2
- //DDCARD DD
- ADARES REPAIR DSRABNxxx-yyy
- ADARES FILEn1,n2,n3
- Failure in DSST
- ADADCK DSCHECK FILExxx
- ADADCK REPAIR
DS
CALL SAG ! !
30ADABAS Recovery
ASSO
CP
DATA
- Nucleus Ends With RC 77
- Not restartable
- No more space for Checkpoint File (CP)
- Rename old WORK
- Allocate/format new WORK with old space
- Change high-used RABN and high-used ISN
- Restart nucleus with new WORK and UTIONLYYES
- Nucleus is in crippled mode - no user has
access - Expand the database
- Stop the nucleus normally
- Rename old WORK and restart the nucleus with old
WORK (autorestart)
CP
31ADABAS Recovery
ASSO
User
DATA
- Nucleus Ends With RC 77
- Not restartable
- No more space for user files
- Rename old WORK
- Allocate/format new WORK with old space
- Restart nucleus with new WORK and UTIONLYYES
- Nucleus is in crippled mode - no user access
- Expand database
- Stop nucleus normally
- Rename old WORK and restart nucleus with old WORK
(autorestart)
User
32ADABAS Recovery
ASSO
DATA
- Nucleus Abends - Missed DE Values
- Descriptor is marked in FDT as DE, value doesnt
- exist in ASSO, but in DATA.
- Check
- ADAICK ICHECK FILExxx,NOOPEN
- ADAVAL VALIDATE FILExxx,DESCRIPTORyy
- Solution 1
- ADAULD UNLOAD FILExxx,UTYPEEXF
- ADALOD LOAD FILExxx,LWPyyyyK
- Solution 2
- ADADBS RELEASE FILExxx,DESCRIPTORyy
- ADAINV INVERT FILExxx,FIELDyy,LWP...
CALL SAG ! !
33Back-up Possibilities
- ADASAV to tape / disk
- Including Fast Dump Restore, DFDSS
- Delta Save Facility (DSF)
- Delta Save QDUMP (Legent)
- Disk mirroring (hardware level)
- FlashCopy of Enterprise Storage Server (ESS)
- Peer-to-Peer Remote Copy Extended Distance
(PPRC-XD) - OC-3 links two EMC disc arrays
- Replication
- Stand-by systems
- Restore and Regenerate
- Entire Transaction Server
ASSO
DATA
34ADABAS Disaster Recovery
- How to back-up
- Collect recovery data
- Restore w/o nucleus
- Start nucleus w/ UTILONLYYES
- Regenerate w/ nucleus
- Switch UTIONLYNO
35ADABAS 6.2.2 Back-up at LA Times
2100
0100
0200
0300
800 - 1100
1200
Weekly
ADAP1BKF Online SAVE
ADAP1PLC (FEOFPL)
ADAP1PLC PLOG Switch
ASSO / DATA / WORK / etc.
BRM/ABARS Several Jobs
DFDSS Full-Volume Back-up
ADAP1BKO Copy Tapes
PDS, GDGs etc.
Pick-up by Recall
36Production Database Back-ups
ADASAV SAVE BUFNO2,TTSYN60 Record format . . .
VB Record length . . . 27994 Block size
. . . . 27998 BUFNO30
37Back-up to SMS Disk Pool
- Run times are consistently at least 80 lower
when writing to disk instead of cartridge - Run times are consistently around 60 lower when
copying from disk to cartridge (compared with
cart to cart) - DFSMShsm, automate your storage management
tasks,SMS Production Storage Pool
DFSMShsm
38Back-up to Disk Pool
- No cartridge errors
- No cartridge drive errors
- No cartridges get accidentally ejected from the
silo - Smaller back-up window
- Smaller maintenance windows
- Less impact to application processes
- Greater confidence that the data you need will be
there when you need it
39IBM Magstar 3494/Virtual Tape Server
- Linear design
- 1 - 18 frames
- Conf. Flexibility
- SCSI, FC, ESCON, FICON
- 3590, 3490E, VTS
- High availability
- Dual robotics
- Dual library manager
gt42 old 3490 carts will fit on 1 new 3494 cart 5
x 3390 volumes fit on one 3494 cart One 3494 cart
can be read in 45 seconds into the VTS disk cache
(raid-5)
40Virtual Tape Concept
- Virtual tape drives
- Appear as multiple 3490E tape drives
- 3490E Media 1 and 2 support
- Shared / partitioned like real tape drives
- Tape Volume Caching
- All data access is to cache
- Improves mount performance
- LRU Cache management
- Volume Stacking
- Fully utilizes physical cart capacity
- Reduces physical cart requirement
- Reduces footprint requirement
180
181
19F
. . .
Virtual Drive n
Virtual Drive 1
Virtual Drive 2
Tape Volume Cache
Virtual Volume 1
Virtual Volume 2
Virtual Volume n
Logical Volume 1
Magstar 3590 30/60 GB capacity
Logical Volume n
assumes 31 compression
41Performance Tests
42Collecting Data For Recovery
Block Ranges SYN1 - SYN2 For ADASAV RESTORE From
ADASAV SAVE PROTECTION LOG PLOGNUM64,
SYN14695, SYN24698 From ADAREP SYN1 06 UTI
2002-09-23 210009 64 4695 DUAL
ADAP1BKF SYNP 06 UTI 2002-09-23 210012 64
4696 DUAL ADAP1BKF SYN2 06 UTI
2002-09-23 210137 64 4698 DUAL
ADAP1BKF SYNV 0A UTI 2002-09-23 210140 64
4699 DUAL ADAP1BKF SYNV 0A UTI
2002-09-23 210140 64 4700 DUAL
ADAP1BKF SYNV 28 UTI 2002-09-23 210208 64
4702 DUAL ADAP1PLC SYNP 28 UTI
2002-09-23 210208 64 4703 DUAL
ADAP1PLC ltsnipgt EOD 00 ET 2002-09-23
233003 64 4747 DUAL ADAPRREP SYNS
53 ET 2002-09-23 233025 64 4749
DUAL ADAP1REP SYNV 28 UTI 2002-09-23
233030 64 4750 DUAL ADAP1PLC SYNP
28 UTI 2002-09-23 233031 64 4751
DUAL ADAP1PLC
43Collecting Data For Recovery
Block Ranges SYN2 - End For ADARES
REGENERATE From ADAREP SYN1 06 UTI 2002-09-23
210009 64 4695 DUAL ADAP1BKF SYNP
06 UTI 2002-09-23 210012 64 4696
DUAL ADAP1BKF SYN2 06 UTI 2002-09-23
210137 64 4698 DUAL ADAP1BKF SYNV
0A UTI 2002-09-23 210140 64 4699
DUAL ADAP1BKF SYNV 0A UTI 2002-09-23
210140 64 4700 DUAL ADAP1BKF SYNV
28 UTI 2002-09-23 210208 64 4702
DUAL ADAP1PLC SYNP 28 UTI 2002-09-23
210208 64 4703 DUAL
ADAP1PLC ltsnipgt EOD 00 ET 2002-09-23
233003 64 4747 DUAL ADAPRREP SYNS
53 ET 2002-09-23 233025 64 4749
DUAL ADAP1REP SYNV 28 UTI 2002-09-23
233030 64 4750 DUAL ADAP1PLC SYNP
28 UTI 2002-09-23 233031 64 4751
DUAL ADAP1PLC
44Collecting Data For Recovery
Dataset Name From Back-up Job (GDG) For ADASAV
RESTORE ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00
CATALOGED
45Collecting Data For Recovery
Dataset Names From PLOG Copy Jobs (GDG) Matching
block numbers 4695 - End For ADASAV RESTORE and
ADARES REGENERATE DDSIAUS1 OUTPUT VOLUMEWRK015,
SESSION NR64
FROMBLK 1214, FROMTIME2002-09-23 033024
TOBLK 4701, TOTIME 2002-09-23
210142 ADABAS.PROD.DB1.PLOG.COPY.G7170V00 DDSIAU
S1 OUTPUT VOLUMEWRK015, SESSION NR64
FROMBLK 4702,
FROMTIME2002-09-23 210208 TOBLK 4748,
TOTIME 2002-09-23 233003 ADABAS.PROD.DB1.PLOG.
COPY.G7171V00 DDSIAUS1 OUTPUT VOLUMEWRK004,
SESSION NR64 FROMBLK 4749,
FROMTIME2002-09-23 233025 TOBLK 4791,
TOTIME 2002-09-24 033033 ADABAS.PROD.DB1.PLOG.
COPY.G7172V00
46Recovery - Part 1 - W/O Nucleus
ADASAV RESTONL ltsnipgt //RESTONL EXEC
ADASAVRD //DDREST1 DD DISPSHR,BUFNO30, //
DSNADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00
//DDPLOG DD DISPSHR,BUFNO30, //
DSNADABAS.PROD.DB1.PLOG.COPY.G7170V00 //DDKARTE
DD
ADASAV RESTONL BUFNO2,OVERWRITE
//REPORT EXEC
ADAREP
//DDKARTE DD
ADAREP NOFILE
//
47Recovery - Part 2
Start the ADABAS nucleus with normal JCL
(UTIONLYYES) ltsnipgt ADAN21 00215 PROTECTION-LOG
PLOGR1 STARTED ADAN02 00215
NUCLEUS-RUN WITH PROTECTION-LOG 00064
ADAL02 00215 2002-09-21 212029 CLOGRS IS
ACTIVE ADAN03 00215 ADABAS COMING UP
ADAN19 00215
RUNNING WITH ASYNCHRONOUS BUFFERFLUSH
ADAN8Y 00215 FILE-LEVEL CACHING INITIALIZED
ADAN80 00215 ADABAS DYNAMIC CACHING
ENVIRONMENT ESTABLISHED. ADAN01 00215 A D A B A
S V6.2.2 IS ACTIVE ADAN01
00215 MODE MULTI I S O L A T E D
ADAN01 00215 RUNNING WITHOUT RECOVERY-LOG
ADA800 00215 User exit 8 active.
ADA801 00215
ADAP1PLC submitted.
48Recovery - Part 2 - With Nucleus
ADARES REGENERATE ltsnipgt //REGEN EXEC ADARES
//DDSIIN DD DISPSHR,BUFNO30, //
DSNADABAS.PROD.DB1.PLOG.COPY.G7170V00 //
DD DISPSHR,BUFNO30, //
DSNADABAS.PROD.DB1.PLOG.COPY.G7171V00 //
DD DISPSHR,BUFNO30, //
DSNADABAS.PROD.DB1.PLOG.COPY.G7172V00 //DDKARTE
DD
ADARES REGENERATE PLOGDBID215,PLOGNUM
64 ADARES
FROMCPSYN2,FROMBLK4698
ADARES TOCPEOD,TOBLK00000 not
needed ltsnipgt
49Recovery - Part 3 - With Nucleus
- Lock files to re-run utilitiesSee regenerate
report - ADADBS OPERCOM LOCKUfnror SYSAOS A / I / L / F
or modify command /F jobname,LOCKUfnr - Unlock utility-only status for users
- ADADBS OPERCOM UTIONLYNOor SYSAOS A / I / L /
U or modify command /F jobname,UTIONLYNO
50Recovery - Part 3 - With Nucleus
- Re-run the utilities - if necessary
- ADALOD LOAD / UPDATE / DDISN
- ADAINV INVERT FILExxx,FIELDxx
- Unlock files
- ADADBS OPERCOM UNLOCKFfnror SYSAOS A / I / L /
F / N or modify command /F jobname,UNLOCKFfnr
51Delta Save Facility (DSF)
52Delta Save Facility
53Delta Save QDUMP (CCA - now TSI)
http//www.treehouse.com/qdump.shtml
54Disk Mirroring
ASSO
- Benefits
- Asynchronous disk mirroring can provide better
physical protection by supporting extended
physical distances. - No loss of committed transactions in synchronous
storage (mirroring/RAID) on a CPU failure
DATA
ASSO
DATA
55Disk Mirroring
ASSO
- Limitations
- No protection from data corruption introduced by
the hardware / software - Secondary site is not guaranteed to be
transitionally consistent, because data is moved
at the disk/track/sector or bit level (in the
case of asynchronous mirroring). - Client application must be re-started after
failure and need to be aware of failure
DATA
ASSO
DATA
56Disk Mirroring
ASSO
- Limitations
- Synchronous mirroring and RAID devices can add
overhead to application performance. - Redundant/specialized high availability
hardware/software can be expensive and restricted
to use for backup purposes only. - Secondary copy of data is not available for use
low hardware utilization. - Need to replicate everything on disk, no
selectivity of data replication
DATA
ASSO
DATA
57Example For Disk Mirroring
Back Up / Hot Site
S/390
UNIX
EMC 5700
SRDF remote mirroredsynchronized
OC-3 link
SRDF remote mirroredsynchronized
12-15 miles
EMC 5700
S/390
UNIX
Main Platform
58Dedicated line broadband speeds and prices
- T-1 - 1.544 megabits per second (24 DS0 lines)
Ave. cost 400.-650./mo. - T-3 - 43.232 megabits per second (28 T1s) Ave.
cost 6,000.-16,000./mo. - OC-3 - 155 megabits per second (100 T1s) Ave.
cost 20,000.-45,000./mo. - OC-12 - 622 megabits per second (4 OC3s) no price
- OC-48 - 2.5 gigabits per seconds (4 OC12s) no
price - OC-192 - 9.6 gigabits per second (4 OC48s) no
price - Source http//www.infobahn.com/research-informati
on.htm - prices updated 16 March 2004
59Peer-to-Peer Remote Copy Extended Distance
(PPRC-XD) PPRC 60 miles - PPRC-XD continent
FlashCopy
ESS Shark - IBM ESS DASD - HDSalso support PPRC
ESS Shark
Also see TimeFinder from EMC
60External Back-up Systems
- Fast Copy of Data
- Snapshot
- No data movement
- A virtual copy by copying pointers
- Copy Process
- Physical copy asynchr. from the log. Copy
- No impact on applic. on the original data
- Specific Hardware Required
- Software works only with the hardware
- Work on Volume Level
- Some snapshot only tools work also on dataset
level
61Snapshot Physical Copy
- IBM
- Hardware Enterprise Storage Server
- Software Flashcopy
- http//www.share.org/proceedings/sh98/data/S3087.P
DF - EMC2
- Hardware Symmetrix Remote Data Facility
- Software EMC TimeFinder
- http//www.emc.com/interactive_center/media/timefi
nder/tf_noRC.html
62How It Works
Read only update requests are queued
Pre-defined time window
Suspend
Resume
Read only
Read / update
Read / update
snap
Physical Backup
Snapshot
Source Data
Source SAG
63Replication
- Benefits
- Warm standby systems can be configured over a
Wide Area Network, providing protection from site
failures. - Ability to more quickly swap to the standby
system in the event of failure, as backup
database is already on-line. - Data corruption is typically not replicated as
transactions are logically reproduced rather than
I/O blocks mirrored.
64Replication
ASSO
- Benefits
- Warm standby systems can be configured over a
Wide Area Network, providing protection from site
failures. - Ability to more quickly swap to the standby
system in the event of failure, as backup
database is already on-line. - Data corruption is typically not replicated as
transactions are logically reproduced rather than
I/O blocks mirrored.
DATA
WORK
WORK
DATA
ASSO
65Replication
ASSO
- Benefits
- Automatic switch over for clients using a
switching mechanism, no client restart needed. - Originating applications are minimally impacted
as replication takes place asynchronously after
commit of the originating transaction. - The warm standby database is available for
read-only operations, allowing better utilization
of backup systems.
DATA
WORK
WORK
DATA
ASSO
66Replication
ASSO
DATA
- Benefits
- Ability to resynchronize and easily switch back
to primary system when it becomes available
without loss of data.
WORK
WORK
DATA
ASSO
67Replication
ASSO
DATA
- Limitations
- Warm standby system will be out-of-date by
transactions committed at the active database
that have not been applied to the standby. - Protection is limited to components supporting
Warm Standby (e.g. DBMS data sources may be
protected but file systems may not be supported).
WORK
WORK
DATA
ASSO
68Entire Transaction Propagator
- The Entire Transaction Propagator allows for
asynchronous data replication. - Replicated data can be updated and synchronized
with master data at user specified intervals.
69- OS/390 Recovery Procedures
- Prepared by the Mainframe Recovery Team
- Recovering
- The OS/390 platform
- The ABARS aggregates
- The ADABAS databases
70(No Transcript)
71- OS/390 D/R Times (SUNGARD)
- About 2400 tapes
- Shipping time from storage to the mainframe ?
- 4 hours ahead for tape staging
- OS/390 and ABARS aggregates
- 5 hours planned, 7 hours with problems
- ADABAS databases
- Approx. 2-3 hours for tape restore and regenerate
- Next test Nov 1 approx. 45 minutes from disk pool
72Experiences From D/R Tests
- Problems to IPL on a strange CPU (6 hours
duration) - Initial setup (restore SYS.. Libraries)
- Pre-IPL procedures (restore Adabas, work, spool
volumes, etc) - Post-IPL procedures (DFHSM in disaster mode,
etc.) - Application restores
- Tape drive offline problems, Import MVSCAT typo
errors, etc. - Recovered wrong volumes, generation errors
- Initialize work volumes - conversion to SMS
(DFSMShsm) - TMC recovery problems caused BRM recovery
problems, too
73Experiences From D/R Tests
- Sent wrong cartridges with system dates to
storage - Less channels for tapes on our offsite (2 instead
of 4) double restore time
74Experiences From D/R Tests
- RESTONL abended with SB00, no PLOG restored,
Recovery Aid flag was on at the saved database. - REGENERATE deleted file and pointed out to repeat
the ADALOD job but the input dataset was not
saved - We did a full volume restore (DFDSS), restored
the database and forgot to format the dual
protection logs. - Missed protection logs
- BRM restored wrong aggregates
- Missing full-volume restores - (Database 2)
- Missing volumes in Work Storage Pool - (Database
3)
75Experiences From D/R Tests
- BRM Back-up and Recovery ManagerABARS
Aggregate Back-up and Recovery Support(ABARS
not Air conditioning and refrigeration industry
services ltsmilegt ) Recovered (-1) Aggregates
instead of (0) (all Databases) Recovered only
SOME files on Aggregate (0) - (Database
1)BRM/ABARS was not properly recovered (wrong
version of BRM database) Once those problems
were resolved (several hours later), the ADABAS
recovery ran smoothly. - 5 Databases (61.4GB) restored and regenerated in
3.5 hours (tape/cart)
76How Far is Far Enough?(http//www.drj.com/artic
les/spr03/1602-02.html)
- Alternate Facility
- Offsite Storage Facility
- Answer 105 miles
- so the survey
77Lessons Learned (http//www.drj.com/articles/spr02
/1502-07.html)
- Distance is keyStreets, bridges, tunnels,
airports are closed - Tape recovery is not effective
- All applications are critical
- Inconsistent back-up is no back-up at all
- People-dependent processes do not suffice
- Two sites are not enough
- People are irreplaceable so is information
78Lessons Learned (http//www.drj.com/articles/spr02
/1502-07.html)
- Companies that relied on tape or on third-party
provider found in many cases they had difficulty
meeting their recovery time objectives
All disasters are possible
79Helpful Links
- Software AG - ADABAS Recoveryhttp//www.softwarea
g.com/adabas/news/vers_7.htmhttp//servline24.sof
twareag.com/SecuredServices/ ltKnowledge Center -
ADABASgt - ADABAS Restart and Recovery (Operations
Manual)http//servline24.softwareag.com/SecuredSe
rvices/ ltKnowledge Center - Product
Documentationgt - University of Arkansas - D/R Planhttp//www.uark.
edu/staff/drp/ - Disaster Recovery Journal http//www.drj.com
80Helpful Links
- FlashCopyhttp//www.share.org/proceedings/sh97/da
ta/S9111.PDFhttp//www.storage.ibm.com/hardsoft/p
roducts/ess/pubs/f2ahs05.pdf - Shark (ESS)http//www.almaden.ibm.com/cs/shark/
http//www.storage.ibm.com/hardsoft/disk/index.htm
l - State of the Art Storagehttp//www.networkmagazin
e.com/article/NMG20010104S0002/2 - EMC TimeFinderhttp//www.emc.com/products/softwar
e/timefinder.jsp - Entire Transaction Propagator (SAG)http//servlin
e24.softwareag.com/SecuredServices/document/html/e
tp151/pdf/man.pdf
81Thank you!
Questions?