NoCOUG Fall Conference 2005 - PowerPoint PPT Presentation

About This Presentation
Title:

NoCOUG Fall Conference 2005

Description:

Inadequate planning to handle outages. Shift in industry (and academic ... What is the impact of an unplanned outage in terms of lost data? Data Loss (RPO) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 26
Provided by: alokp
Learn more at: http://www.nocoug.org
Category:

less

Transcript and Presenter's Notes

Title: NoCOUG Fall Conference 2005


1
NoCOUG Fall Conference 2005
  • High Availability Disaster Recovery A 360o
    View
  • Alok Pareek

2
Agenda
  • Introduction
  • High Availability - 2005
  • Industry Shift from MTTF to MTTR
  • Understanding transaction failure models
  • Evaluating HA technologies
  • Real-World HA Examples
  • About GoldenGate
  • Questions Answers

3
Speaker Introduction/Background
  • GoldenGate Software Architect
  • Transactional Data Management and High
    Availability Solutions
  • 10 years in Oracle kernel development
  • Data Storage Technologies
  • Recovery and High Availability Group
  • High Speed Data Movement
  • Key Features
  • Cross platform/Heterogeneous transportable
    tablespaces (10g)
  • Multi threaded redo generation (9.2)
  • Implementation of multiple block size caches
    (9.0)
  • (TSPITR) Tablespace point in time recovery (8.0)
  • Owner of LGWR process (redo generation)
    algorithms (8i ? 10.2)

4
High Availability, Disaster Recovery
  • Definition
  • Ratio of system uptime to sum of uptime and
    downtime
  • Availability MTTF/(MTTFMTTR)
  • Challenges
  • Addressing Performance vs. Reliability in
    computer systems
  • Hardware Faults, Software Bugs, Human errors are
    realities in any complex system deployment
  • Enterprise applications need to function 24x7
  • Disasters are no longer a distant threat
  • Inadequate planning to handle outages
  • Shift in industry (and academic research) focus
  • From fault tolerance to reducing MTTR

5
HA/DR Systematic View
Database
1
Active
2
3
Unplanned outage
Planned outage
  • Node death
  • Power failure
  • Upgrades
  • Migrations

System Failure
System Changes
  • Physical Media
  • Logical corruption

Data Failure
Data Changes
  • Maintenance

6
1) High Availability Concerns No Outage
  • Performance
  • DSS vs. OLTP
  • conflicting requirments
  • Mixed workload
  • Data validation
  • Data Transformation

7
2) Availability Concerns Unplanned
Database
Unplanned outage
Common Approaches
  • Database Restore/Recovery
  • RAID
  • Shared Disk Clusters
  • Standby database

System failure
Data failure
8
Understanding Database Failures
  • Failure points
  • Statement
  • Process
  • Instance
  • Database
  • Site
  • Failure types
  • Physical (Media, corruption, inconsistency
    amongst redundant copies)
  • Logical (Incorrect DML, out-of-synch, accidental
    table drop)
  • Failure Handling
  • Automatic
  • Manual

9
3) Availability Concerns Planned Outage
Database
Planned Outage
  • Selected windows of downtime
  • Phased approach to maintenance

System Changes
Data Changes
10
Key HA Evaluation Criteria
11
Differentiating HA Technologies
  • Conventional Backup/Recovery
  • RAID
  • multiple hard disks behaving as a single large
    fast drive (mirrors/stripes/duplexing/parity)
  • Snapshots
  • Block Level Database Replication
  • Change Level Database Replication
  • Remote Mirroring Solutions
  • Transactional Data Management

High Availability and Disaster Recovery
Roll Forward / File Protection
12
HA Technologies Tradeoffs
  • Block based database replication
  • Standby kept in constant recovery (inactive) mode
  • Useful for strict disaster recovery only, not HA
  • Cannot be used for reporting in recovery mode
  • No write access for distributed load balancing
  • Application response times suffer after failover
  • Cannot address availability across heterogeneous
    systems
  • Change based database replication
  • Trigger or log based
  • Not optimized for real time performance
  • Intrusive, Complex
  • Cannot address availability across heterogeneous
    systems

13
HA Technologies Tradeoffs
  • Remote mirroring solutions
  • Volume managers maintain mirrors of local writes
    on a set of remote volumes
  • Useful for file protection
  • Physical distance to remote volumes is a critical
    limitation
  • No protection from logical corruption, or storage
    stack corruption
  • Message based logical writes sent by primary host
    over IP to remote hosts (synchronously/asynchronou
    sly)
  • Write ordering must be maintained by primary host
  • Remote volumes are standby-only, applications
    cannot access them
  • No protection from logical corruption
  • Hardware based
  • Storage arrays propagate IOs to storage arrays at
    a secondary site
  • Secondary arrays are inaccessible during
    replication
  • No protection from logical corruption
  • Only useful for block availability during DR

14
HA Technologies Tradeoffs
  • Transactional Data Management
  • Addresses low-latency in HA hybrid computing
    environments (built on 1 Safe)
  • Management of transactional streams -- captures,
    transforms, routes, delivers and verifies data
    transactions in real time
  • Real time, heterogeneous, data integrity, low
    impact
  • Use cases for HA, DR, data integration,
    distributed computing
  • Not for file-level replication

Log-based data capture (can be selective)
Route
Transform (if needed)
Apply
Target
Source
Sub-second latency
15
HA/DR Solution Examples
Database
1
Active
2
3
Unplanned outage
Planned outage
  • Node death
  • Power failure
  • Upgrades
  • Migrations

System Failure
System Changes
  • Physical Media
  • Logical corruption

Data Failure
Data Changes
  • Maintenance

16
Real-World HA Configuration (Active)
Primary Master
  • Uni/Bi-directional configuration if one system
    fails, the other is immediately available and
    ready to take over.
  • For
  • Highest Availability
  • Maximized ROI on hardware (transaction
    balancing)
  • Data Centers with Active/Passive
  • Example areas
  • 24x7 (ATMs, Online Banking)
  • Online Retail
  • Real-time Analytics, Warehousing

Active
Secondary Master
Live/Active
17
Real-World HA Configuration (Transaction
Off-Loading)
Processing Commit Transactions
  • Improve availability and performance of
    transaction processing by offloading query load
    to lower-cost DBs/platforms
  • For
  • Horizontal scalability
  • Improved performance
  • Example areas
  • Online Reservations
  • Online Lookups

Active
Live/Active
Lookup Query Handling
18
Real-World DR Configuration (Failover)
Database
Active
  • An HA implementation that captures and applies
    data to a failover system in real time.
  • For
  • Fast failover (No restore)
  • Do root-cause analysis later!
  • Surgical Repair (Dynamic, Selective undo)
  • Example areas
  • 24x7/mission-critical applications
  • Strict SLA requirements

Unplanned outage
System failure
Data failure
Failover Database
19
Real-World Configuration (Switchover)
Current Database
  • Zero-Downtime Migrations
  • Rolling Upgrades
  • Zero-Downtime Maintenance
  • For
  • 24x7 availability
  • Reduced windows for system maintenance
  • Example areas
  • Cant afford downtime to do in-place upgrade

Planned Outage
Switchover Database
20
About GoldenGate Software
GoldenGate Software is a privately held software
company that offers Transactional Data Management
solutions.
250 customers... 1500 solutions implemented in
35 countries
Established, Loyal Customer Base
Leading Industry Solutions
18,000 Node ATM Network with 24/7 Availability
Saving millions with real-time DW and zero
downtime migrations.
Database tiering handles average of 300,000
updates/hour, peaks at 800,000/hour
Achieving paperless enterprise for this visionary
healthcare provider
21
GoldenGate Transactional Data Management
  • Provides guaranteed capture, routing,
    transformation, delivery, and verification of
    data transactions across heterogeneous
    environments in real time.
  • TDM must be
  • Real time
  • Moves with sub-second latency
  • Heterogeneous
  • Moves transactions across different databases and
    platforms
  • Transactional
  • Maintains transaction integrity
  • GoldenGate differentiates on
  • Performance
  • Handles thousands of transactions per second with
    very low impact on IT systems
  • Extensibility Flexibility
  • Open architecture to meet demanding customer
    needs and data environments
  • Reliability
  • Supports continuous operations and availability

22
How GoldenGate Works
Capture Committed changes are captured as they
occur by reading the transaction logs.
Trail files Stages and queues data for routing.
Route Data is compressed, encrypted for routing
to targets.
Delivery Applies transactional data with
guaranteed integrity.
GoldenGate Veridata Reports on data
discrepancies between in-use DBs
23
TDM HA Evaluation Criteria
24
Questions Answers
  • Contact Information
  • Alok Pareek
  • www.goldengate.com
  • (415) 777-0200
  • apareek_at_goldengate.com
  • 301 Howard Street, Suite 2100, San Francisco CA
    94105

25
Lag points address Logical failures
One-to-Many
One-to-One
Source
Source
Target
Target (current)
Target (1 hour lag)
Target (1 day lag)
Write a Comment
User Comments (0)
About PowerShow.com