High Availability and Disaster Recovery - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

High Availability and Disaster Recovery

Description:

High Availability and Disaster Recovery Considerations and Options Transactional Data Management Solutions Agenda Introduction High Availability - 2006 Industry Shift ... – PowerPoint PPT presentation

Number of Views:376
Avg rating:3.0/5.0
Slides: 29
Provided by: SamiA4
Category:

less

Transcript and Presenter's Notes

Title: High Availability and Disaster Recovery


1
High Availability and Disaster Recovery
  • Considerations and Options
  • Transactional Data Management Solutions

2
Agenda
  • Introduction
  • High Availability - 2006
  • Industry Shift from MTTF to MTTR, Continuous
    Availability
  • Challenges in HA environments
  • Understanding/Evaluating HA technologies
  • TDM HA Solutions
  • Questions Answers

3
About GoldenGate Software
GoldenGate Software is a privately held software
company that offers Transactional Data Management
solutions.
250 customers... 1500 solutions implemented in
35 countries
Established, Loyal Customer Base
Leading Industry Solutions
18,000 Node ATM Network with 24/7 Availability
Saving millions with real-time DW and zero
downtime migrations.
Database tiering handles average of 300,000
updates/hour, peaks at 800,000/hour
Achieving paperless enterprise for this visionary
healthcare provider
4
Speaker Introduction/Background
  • Nick Wagner
  • Director of Product Management, GoldenGate
    Software
  • Transactional Data Management for Oracle and
    other databases
  • 8 years of Product Management, primarily focused
    on Database Replication Solutions for High
    Availability, Disaster Recovery, Reporting, and
    Data Integration
  • 5 Years Product Manager for Quest SharePlex for
    Oracle

5
HA (2006)
  • Definition
  • Ratio of system uptime to sum of uptime and
    downtime
  • Availability MTTF/(MTTFMTTR)
  • Challenges
  • Addressing Performance vs. Reliability in
    computer systems
  • Hardware Faults, Software Bugs, Human errors are
    realities in any complex system deployment
  • Enterprise applications need to function 24x7
  • Disasters are no longer a distant threat
  • Inadequate planning to handle outages
  • Shift in industry (and academic research) focus
  • From fault tolerance to reducing MTTR

6
The 3 States of Availability Systematic View
7
High Availability Concerns (No Outage)
1 Active
Throughput
  • Latency
  • DSS vs. OLTP
  • conflicting requirments
  • Mixed workload
  • Data validation
  • Data Transformation

Common Approaches
  • Add more
  • Nodes
  • Resources
  • Infrastructure

8
High Availability Concerns (Planned Outages)
Common Approaches
  • Selected windows of downtime
  • Phased approach to maintenance

9
Planned Outages - Upgrades/Migration challenges
  • Maintaining SLA during planned outage
  • Revenue Impact
  • Customer Expectations
  • Interdependencies, Integration
  • Data issues
  • Instantiating Terabytes/Petabytes
  • Staging areas
  • Change Management
  • Special Handling
  • Synchronization issues
  • Incremental data movement
  • Source database impact
  • Failback strategy
  • System/Application verification
  • Continued data growth

10
High Availability Concerns (Unplanned Outages)
3 Unplanned Outage
Common Approaches
  • Database Restore/Recovery
  • RAID
  • Shared Disk Clusters
  • Standby database

11
Unplanned outages - Understanding Database
Failures
  • Failure points
  • Statement
  • Process
  • Instance
  • Database
  • Site
  • Failure types
  • Physical (Media, corruption, inconsistency
    amongst redundant copies)
  • Logical (Incorrect DML, out-of-synch, accidental
    table drop)
  • Failure Handling
  • Automatic
  • Manual

12
Unplanned outages Repair as a focus
  • Mapping of symptoms to failure categories is
    complex
  • Native repair solutions do not address complex or
    multiple failures
  • Root cause analysis affects MTTR
  • Failover, isolated repair will replace
    conventional recovery in computing environments

13
Evaluating HA Technologies
  • Availability
  • Is the Failover/DR solution available for real
    use?
  • MTTR (RTO)
  • In the event of a failure, how soon can the data
    be recovered?
  • Performance
  • Speed and support for high volumes
  • Data Loss (RPO)
  • What is the impact of an unplanned outage in
    terms of lost data?
  • Zero downtime
  • Does the solution allow for zero downtime during
    planned outages?
  • Manageability
  • Configuration, Install, Monitoring
  • Impact on deployed systems
  • How intrusive? What is the impact on data itself?
  • Cost
  • Licensing, maintenance

14
Differentiating HA Technologies
  • Conventional Backup/Recovery
  • RAID
  • multiple hard disks behaving as a single large
    fast drive (mirrors/stripes/duplexing/parity)
  • Snapshots
  • Block Level Database Replication
  • Change Level Database Replication
  • Remote Mirroring Solutions
  • Transactional Data Management

High Availability and Disaster Recovery
Roll Forward / File Protection
15
HA Technologies Tradeoffs
  • Block based database replication
  • Standby kept in constant recovery (mount) mode
  • Useful for strict disaster recovery only, not HA
  • Cannot be used for reporting in recovery mode
  • No write access for distributed load balancing
  • Application response times suffer after failover
  • Cannot address availability across heterogeneous
    systems
  • Change based database replication
  • Trigger or log based
  • Not optimized for real time performance
  • Intrusive, Complex
  • Cannot address availability across heterogeneous
    systems

16
HA Technologies Tradeoffs
  • Remote mirroring solutions
  • Volume managers maintain mirrors of local writes
    on a set of remote volumes
  • Useful for file protection
  • Physical distance to remote volumes is a critical
    limitation
  • No protection from logical corruption, or storage
    stack corruption
  • Message based logical writes sent by primary host
    over IP to remote hosts (synchronously/asynchronou
    sly)
  • Write ordering must be maintained by primary host
  • Remote volumes are standby-only, applications
    cannot access them
  • No protection from logical corruption
  • Hardware based
  • Storage arrays propagate IOs to storage arrays at
    a secondary site
  • Secondary arrays are inaccessible during
    replication
  • No protection from logical corruption
  • Only useful for block availability during DR

17
Oracle Technologies Tradeoffs
  • RAC
  • Good for protection from system failures
  • Shared disk architecture can result in single
    point-of-failure
  • Complex deployment, no protection from media
    failure
  • Data Guard
  • Physical standby
  • Runs in inactive mode (mounted)
  • Cold cache increases MTTR from transactional
    standpoint
  • Network latency (over SQLNet)
  • Media recovery process lags significantly during
    heavy workloads
  • Logical standby
  • Redo/Archive logs shipped over the network to
    standby site
  • Real time reporting, High throughput workloads
    (9i limited support)
  • Vulnerable to data loss (9i)
  • RTA Performance impact on LGWR
  • Read Only access for data set being logically
    protected

18
Oracle Technologies Tradeoffs
  • Streams
  • Good for information sharing in low to moderate
    throughput environments
  • Allows Oracle databases to be on different
    platforms
  • Limited support for datatypes in pre 10g release
  • Metadata managed within database
  • Requires custom application for capture from
    non-Oracle database

19
HA Technologies Tradeoffs
  • Transactional Data Management
  • Addresses low-latency in HA hybrid computing
    environments (built on 1 Safe protocol for
    highest performance)
  • Management of transactional streams -- captures,
    transforms, routes, delivers and verifies data
    transactions in real time
  • Real time, heterogeneous, data integrity, low
    impact
  • Use cases for HA, DR, data integration,
    distributed computing
  • Not for file-level replication

20
How TDM Works Modular Building Blocks
Capture Committed changes are captured (and can
be filtered) as they occur by reading the
transaction logs.
Trail files Stages and queues data for routing.
Route Data is compressed, encrypted for routing
to targets.
Delivery Applies transactional data with
guaranteed integrity, transforming the data as
required.
Filtered Delivery
Filtered Capture
LAN / WAN / Internet
Source Database(s)
Target Database(s)
Manager
Manager
21
HA/DR Solution Examples
22
HA Configuration Multi-Master
Master
  • Bi-directional configuration dual-master for
    load balancing, improved performance and
    throughput
  • For
  • Highest Availability
  • Maximized ROI on hardware (transaction
    balancing)
  • Example areas
  • 24x7 (ATMs, Online Banking)
  • Online Retail

Active
Master
Active
23
HA Configuration Scalability
  • Improve scalability and performance of
    transaction processing by offloading query load
    to lower-cost databases/platforms
  • For
  • Horizontal scalability
  • Improved performance
  • Example areas
  • Online Reservations
  • Online Lookups

Writers
Active
Live/Active
Readers (Lookup Query Handling)
24
HA Configuration Disaster Tolerance
Database
Active
  • An HA implementation that captures and applies
    data to a failover system in real time.
  • For
  • Fast failover (No restore)
  • Do root-cause analysis later!
  • Surgical Repair (Dynamic, Selective undo)
  • Example areas
  • 24x7/mission-critical applications
  • Strict SLA requirements

Unplanned outage
System failure
Data failure
Failover Database
25
HA Configuration Switchover
Current Database
  • Zero-Downtime Migrations
  • Rolling Upgrades
  • Zero-Downtime Maintenance
  • Failback contingencies
  • For
  • 24x7 availability
  • Reduced windows for system maintenance
  • Example areas
  • Cant afford downtime to do in-place upgrade

Planned Outage
Switchover Database
26
TDM HA Evaluation Criteria
Availability Not just disaster recovery but also continuous operations
MTTR Immediately available and up-to-date secondary system with MTTR of a few seconds
Performance Near zero time latency Ship only committed transactions
Zero Downtime for planned outages Downtime restricted to application switchover
Data Protection / Loss Redo validation using SQL Apply No Loss (db read access to last IO in current log)
Manageability Director GUI, CLI, STATS
Impact Low impact on deployed systems Metadata outside the database
27
Thank You. QA

Nick Wagner nwagner_at_goldengate.com 415-369-4261
28
Contributions
  • References
  • Self-Repairing Computers (Scientific American
    2003)
  • Oracle 10g Concepts Manual, MAA paper
  • ROC (Stanford-Berkeley joint collaboration) misc.
  • http//roc.cs.berkeley.edu/
Write a Comment
User Comments (0)
About PowerShow.com