RMAN and Overall Database Integrity - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

RMAN and Overall Database Integrity

Description:

Jokes are made about the 'database without the backup' that crash recovered successfully. ... restore has finished PRIOR to issuing a RECOVER database command' ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 52
Provided by: raj46
Category:

less

Transcript and Presenter's Notes

Title: RMAN and Overall Database Integrity


1
RMAN and Overall Database Integrity
  • Raj Pal, EchoStar Satellite LLC

2
Topics
  • Caveat.
  • Before After RMAN story.
  • RMAN in a rush.
  • RMAN without a budget.
  • Database block rate of change.

3
Caveat
  • RMAN is fantastic, important necessary for
  • Corruption checking
  • Restoring
  • Recovering
  • Duplicating
  • Etc
  • RMAN, on its own, is not a good solution for high
    availability (HA) of high visibility/important
    database instances (especially for large
    databases).

4
Thats Good!
5
Thats Bad!
6
Thats Good!
7
Thats Bad!
8
Thats Good!
9
Thats Bad!
10
Thats Good!
11
(No Transcript)
12
Before After RMAN story
13
Thats Good!
  • Oracle database on Unix OS is gt 3Tb
  • gt 4000 datafiles, gt 200 tablespaces
  • Good database performance
  • Well tuned
  • CPU usage lt 50
  • Memory usage lt 50

14
Thats Bad!
  • Can not put database in
  • hot backup mode
  • ERROR
  • ORA-00235
  • controlfile fixed table inconsistent due to
    concurrent update
  • Cause Concurrent update activity on a
    controlfile caused a
  • query on a controlfile fixed table to read
    inconsistent information.

15
Thats Good!
  • Database still up and running well.
  • Several attempts made to backup database.
  • Usage of database is so high that a downtime is
    not realistic.

16
Thats Bad!
  • Still no successful backup after over 3 weeks of
    efforts

17
Thats Good!
  • Over a 3-4 day period, individual tablespaces
    (data tablespaces only) are placed in hot backup
    mode and backed up to tape.
  • One evening at 9pm sufficient disk is allocated
    to backup the database using RMAN.

18
Thats Bad!
  • Lightening rain storm!
  • At 10pm, water leaks in through the roof onto
    the UPS!!
  • UPS is interrupted!!!
  • gt400 production machines and gt40 Oracle
    databases, crash!!!!

19
Thats Good!
  • Damaged circuit boards in UPS are found and
    replaced.
  • 1230am (2.5hrs later) all systems are back up
    and functional.
  • Jokes are made about the database without the
    backup that crash recovered successfully.

20
Thats Bad!
  • 2am, more circuit boards are found to be damaged
    and their replacement is attempted
  • Human error crashes UPS again

21
Thats Good!
  • By 4am, all systems are back up and functional

22
Thats Bad!
  • Except for 3 databases
  • Unexplained corruption leads to full database
    restore and recovery until just prior to 2nd
    crash (for 2 of the 3 databases)

23
Thats Good!
  • The 2 tape restores and recoveries are
    successful for the 2 databases

24
Thats Okay...
  • The database without the backup has several
    corrupted datafiles.
  • 10 datafiles are taken offline
  • Another datafile has only one corrupt block
    requiring recovery

25
Thats Definitely Bad!
  • The one block just happens to be located in a
    system datafile.
  • Restored system datafile from tablespace backup
    and started recovery.
  • After applying many archived redo logs, recovery
    dies with an ORA-600.
  • Corruption is identified as a STUCK RECOVERY
  • Inconsistency between the information stored in
    the redo and the information stored in a database
    block being recovered (in this case, crash
    recovery).

26
Thats Unbelievably Bad!
  • Work Around Instructions
  • Ensure that the restore has finished PRIOR to
    issuing a RECOVER database command
  • If problems continue, consider restoring from a
    backup and doing a point-in-time recovery to a
    time PRIOR to the one implied by error. (restore
    from 3 week old backup?)

27
Thats Ridiculously Bad!
  • Corrupt system datafile will not allow database
    to open.
  • Restored system datafile can not be brought to
    SCN of gt4,000 datafiles.

28
Thats Good!
  • PLAN
  • Force database open using hidden init parameter
  • _corrupt_blocks_on_stuck_recovery
  • Create new database and transport gt200
    tablespaces to it.
  • Database opens!!!

29
Thats Impossibly Bad!
  • Due to large size of database and terabytes of
    extra disk allocated for the research effort, a
    cleanup is started of unused disk.
  • Reclaiming of space ignores a warning and a
    mountpoint is deleted and unmounted
  • That mountpoint hosted the system datafile!!!

30
Thats Crazy-Bad!
  • Since no backup of new system datafile exists and
    the old corrupted system datafile no longer
    contains the metadata of the tablespaces All 3
    terabytes of datafiles are useless
  • Restore of 3 week old backup and recovery is not
    realistic (gt200Gb redo/day)

31
Thats promising
  • PLAN ?Z?
  • Restore data tablespace backups from tape taken
    over 3-4 days and recover.
  • Create indexes (since index tablespaces were not
    backed up).
  • When recovery completes, shutdown and back up to
    tape ?

32
Thats A Little Better
  • From time of the UPS failures until the database
    is recovered (without indexes)
  • Downtime 165hrs (7days).
  • Last index creation completed another 105hrs
    (4days) after the backup.

33
Thats Excellent!!!
  • Within a couple more days, the first successful
    RMAN level0 backup completed to disk!!

34
How Could This Disasters Disaster Have Been
Minimized?
  • Mistake 1
  • Always have a recent backup!
  • Mistake 2
  • Use RMAN!
  • Mistake 3
  • Backup the error!
  • Mistake 4
  • Use RMAN!

35
1 year later
36
Thats Good!
  • The database
  • Different Oracle database on Unix OS.
  • gt 4.5Tb.
  • Good database performance.
  • Well tuned for Siebel front end.
  • CPU usage lt 50.
  • Memory usage lt 50.
  • Backed up to disk using RMAN.

37
Thats Bad!
  • Standby database had stopped playing logs due to
    a disk corruption holding the next archived
    redolog.
  • User error leaves the database functionally
    corrupt to the Siebel front end.
  • Even if archived logs were restored, the standby
    is too far behind to catch up in a reasonable
    amount of time.

38
Thats RMAN!!!
  • Last level0 backup was taken before the physical
    disk corruption of the standbys archived log.
  • Last level1 was taken the previous day.
  • Corrupted archived redolog did not even have to
    be used.
  • Restore of gt4.5Tb from RMAN disk backups took
    4.5 hrs (1 Tb/hr).
  • Point in time recovery (until before core
    resynch) of 100Gb redo took 10 hrs (9 Gb/hr)

39
Contrasts
  • Uptime SLA was not met for either of these
    incidents, but compared to the incident 1 year
    prior, RMANs restore and recovery was a welcome
    relief.
  • Resource usage
  • This restore/recovery took 1 DBA resource.
  • Previous years incident took (around the clock)
    8 DBAs, 3 Storage admins and 3 Unix admins.
  • Functional downtime
  • This incident had 15 hrs.
  • Previous years incident took 270 hrs.

40
Followup
  • The database formerly known as the database
    without a backup
  • Still successfully being backed up by RMAN to
    disk.
  • Now gt6.2Tb (level 0 backup takes 6.5 hrs to
    disk).
  • RMAN will not necessarily get your database
    restored and recovered fast But it WILL restore
    and recover it!!

41
RMAN in a Rush!
42
  • Assumptions
  • Business requirement of no downtime.
  • Capability of backing up static files from disk
    to tape.
  • 9i or higher (just for syntax below)
  • Requirements
  • Front end app, Oracle binaries, init, etc are
    already backed up (just dynamic database
    components).
  • Sysdba or rman role
  • Archivelog mode
  • Sufficient amount of RMAN disk

43
  • Space Considerations
  • Datafiles
  • Controlfiles
  • Archived Redologs
  • Minimum space required is for redo generated
    during the backup.
  • Recommended space is for redo generated during
    and between 3 regularly scheduled level0 backups.

44
  • Setup
  • Allocate RMAN disk
  • /RMAN/ltSIDgt_ARCH
  • /RMAN/ltSIDgt_DATA
  • /RMAN/ltSIDgt_CONTROL
  • Init parameters
  • alter system set control_file_record_keep_time
    30
  • alter system set archive_lag_target 900
  • Init file too!

45
  • Actions
  • Ensure other systems are not removing archived
    logs.
  • Set environment
  • export ORACLE_SIDltSIDgt
  • export ORACLE_HOMEltSID_HOMEgt
  • rman target / nocatalog
  • Backup archived logs (temporary)
  • run
  • allocate channel a1 type disk format
    '/RMAN/ltSIDgt_ARCH/d_arch_s_p_c_t.rmn'
  • backup
  • skip inaccessible
  • archivelog all
  • tag 'manual_arch'
  • filesperset 1
  • delete input

46
  • ( Actions)
  • Backup level0
  • rman target / nocatalog
  • run
  • allocate channel b1 type disk format
    '/RMAN/ltSIDgt_DATA/d_s_p_c_t_level0.rmn'
  • allocate channel b2 type disk format
    '/RMAN/ltSIDgt_DATA/d_s_p_c_t_level0.rmn'
  • allocate channel bltngt type disk format
    '/RMAN/ltSIDgt_DATA/d_s_p_c_t_level0.rmn'
  • sql 'alter system archive log current'
  • backup incremental level0
  • tag 'level0'
  • check logical
  • database
  • sql 'alter system switch logfile'
  • copy current controlfile to '/RMAN/ltSIDgt_CONTROL
    /ltSIDgt_level0_ltdategt.ctl'

47
  • (Actions)
  • Validate Archived logs
  • rman target / nocatalog
  • run
  • allocate channel v1 type disk
  • allocate channel v2 type disk
  • allocate channel vltngt type disk
  • change archivelog all validate
  • Create script and set cron for archives without
    skipping inaccessible.
  • Set cron for regular level0 backups.
  • Tape backup of RMAN disk.

48
  • Pros
  • All benefits of RMAN.
  • No need for Oracle Agent from tape vendor.
  • Backups/Restores from disk are faster than tape.
  • No need for hot backup mode.
  • RMAN utility is included with all standard and
    Enterprise Oracle licenses.
  • Cons
  • Your controlfile has limited visibility into the
    history of its backups.
  • Disk is more expensive than tape.

49
Final Tidbits
  • RMAN without a budget.
  • Database block rate of change.

50
CONCLUSION
  • There is literally no excuse for not using RMAN!

51
  • QUESTIONS?
  • Contact Info
  • Raj_at_Palfoundation.net
Write a Comment
User Comments (0)
About PowerShow.com