Title: NoCOUG Fall Conference 2005
1NoCOUG Fall Conference 2005
- High Availability Disaster Recovery A 360o
View - Alok Pareek
2Agenda
- Introduction
- High Availability - 2005
- Industry Shift from MTTF to MTTR
- Understanding transaction failure models
- Evaluating HA technologies
- Real-World HA Examples
- About GoldenGate
- Questions Answers
3Speaker Introduction/Background
- GoldenGate Software Architect
- Transactional Data Management and High
Availability Solutions - 10 years in Oracle kernel development
- Data Storage Technologies
- Recovery and High Availability Group
- High Speed Data Movement
- Key Features
- Cross platform/Heterogeneous transportable
tablespaces (10g) - Multi threaded redo generation (9.2)
- Implementation of multiple block size caches
(9.0) - (TSPITR) Tablespace point in time recovery (8.0)
- Owner of LGWR process (redo generation)
algorithms (8i ? 10.2)
4High Availability, Disaster Recovery
- Definition
- Ratio of system uptime to sum of uptime and
downtime - Availability MTTF/(MTTFMTTR)
- Challenges
- Addressing Performance vs. Reliability in
computer systems - Hardware Faults, Software Bugs, Human errors are
realities in any complex system deployment - Enterprise applications need to function 24x7
- Disasters are no longer a distant threat
- Inadequate planning to handle outages
- Shift in industry (and academic research) focus
- From fault tolerance to reducing MTTR
5HA/DR Systematic View
Database
1
Active
2
3
Unplanned outage
Planned outage
System Failure
System Changes
- Physical Media
- Logical corruption
Data Failure
Data Changes
61) High Availability Concerns No Outage
- Performance
- DSS vs. OLTP
- conflicting requirments
- Mixed workload
- Data validation
- Data Transformation
72) Availability Concerns Unplanned
Database
Unplanned outage
Common Approaches
- Database Restore/Recovery
- RAID
- Shared Disk Clusters
- Standby database
System failure
Data failure
8Understanding Database Failures
- Failure points
- Statement
- Process
- Instance
- Database
- Site
- Failure types
- Physical (Media, corruption, inconsistency
amongst redundant copies) - Logical (Incorrect DML, out-of-synch, accidental
table drop) - Failure Handling
- Automatic
- Manual
93) Availability Concerns Planned Outage
Database
Planned Outage
- Selected windows of downtime
- Phased approach to maintenance
System Changes
Data Changes
10Key HA Evaluation Criteria
11Differentiating HA Technologies
- Conventional Backup/Recovery
- RAID
- multiple hard disks behaving as a single large
fast drive (mirrors/stripes/duplexing/parity) - Snapshots
- Block Level Database Replication
- Change Level Database Replication
- Remote Mirroring Solutions
- Transactional Data Management
High Availability and Disaster Recovery
Roll Forward / File Protection
12 HA Technologies Tradeoffs
- Block based database replication
- Standby kept in constant recovery (inactive) mode
- Useful for strict disaster recovery only, not HA
- Cannot be used for reporting in recovery mode
- No write access for distributed load balancing
- Application response times suffer after failover
- Cannot address availability across heterogeneous
systems - Change based database replication
- Trigger or log based
- Not optimized for real time performance
- Intrusive, Complex
- Cannot address availability across heterogeneous
systems
13 HA Technologies Tradeoffs
- Remote mirroring solutions
- Volume managers maintain mirrors of local writes
on a set of remote volumes - Useful for file protection
- Physical distance to remote volumes is a critical
limitation - No protection from logical corruption, or storage
stack corruption - Message based logical writes sent by primary host
over IP to remote hosts (synchronously/asynchronou
sly) - Write ordering must be maintained by primary host
- Remote volumes are standby-only, applications
cannot access them - No protection from logical corruption
- Hardware based
- Storage arrays propagate IOs to storage arrays at
a secondary site - Secondary arrays are inaccessible during
replication - No protection from logical corruption
- Only useful for block availability during DR
14 HA Technologies Tradeoffs
- Transactional Data Management
- Addresses low-latency in HA hybrid computing
environments (built on 1 Safe) - Management of transactional streams -- captures,
transforms, routes, delivers and verifies data
transactions in real time - Real time, heterogeneous, data integrity, low
impact - Use cases for HA, DR, data integration,
distributed computing - Not for file-level replication
Log-based data capture (can be selective)
Route
Transform (if needed)
Apply
Target
Source
Sub-second latency
15HA/DR Solution Examples
Database
1
Active
2
3
Unplanned outage
Planned outage
System Failure
System Changes
- Physical Media
- Logical corruption
Data Failure
Data Changes
16Real-World HA Configuration (Active)
Primary Master
- Uni/Bi-directional configuration if one system
fails, the other is immediately available and
ready to take over. - For
- Highest Availability
- Maximized ROI on hardware (transaction
balancing) - Data Centers with Active/Passive
- Example areas
- 24x7 (ATMs, Online Banking)
- Online Retail
- Real-time Analytics, Warehousing
Active
Secondary Master
Live/Active
17Real-World HA Configuration (Transaction
Off-Loading)
Processing Commit Transactions
- Improve availability and performance of
transaction processing by offloading query load
to lower-cost DBs/platforms - For
- Horizontal scalability
- Improved performance
- Example areas
- Online Reservations
- Online Lookups
Active
Live/Active
Lookup Query Handling
18Real-World DR Configuration (Failover)
Database
Active
- An HA implementation that captures and applies
data to a failover system in real time. - For
- Fast failover (No restore)
- Do root-cause analysis later!
- Surgical Repair (Dynamic, Selective undo)
- Example areas
- 24x7/mission-critical applications
- Strict SLA requirements
Unplanned outage
System failure
Data failure
Failover Database
19Real-World Configuration (Switchover)
Current Database
- Zero-Downtime Migrations
- Rolling Upgrades
- Zero-Downtime Maintenance
- For
- 24x7 availability
- Reduced windows for system maintenance
- Example areas
- Cant afford downtime to do in-place upgrade
Planned Outage
Switchover Database
20About GoldenGate Software
GoldenGate Software is a privately held software
company that offers Transactional Data Management
solutions.
250 customers... 1500 solutions implemented in
35 countries
Established, Loyal Customer Base
Leading Industry Solutions
18,000 Node ATM Network with 24/7 Availability
Saving millions with real-time DW and zero
downtime migrations.
Database tiering handles average of 300,000
updates/hour, peaks at 800,000/hour
Achieving paperless enterprise for this visionary
healthcare provider
21GoldenGate Transactional Data Management
- Provides guaranteed capture, routing,
transformation, delivery, and verification of
data transactions across heterogeneous
environments in real time.
- TDM must be
- Real time
- Moves with sub-second latency
- Heterogeneous
- Moves transactions across different databases and
platforms - Transactional
- Maintains transaction integrity
- GoldenGate differentiates on
- Performance
- Handles thousands of transactions per second with
very low impact on IT systems - Extensibility Flexibility
- Open architecture to meet demanding customer
needs and data environments - Reliability
- Supports continuous operations and availability
22How GoldenGate Works
Capture Committed changes are captured as they
occur by reading the transaction logs.
Trail files Stages and queues data for routing.
Route Data is compressed, encrypted for routing
to targets.
Delivery Applies transactional data with
guaranteed integrity.
GoldenGate Veridata Reports on data
discrepancies between in-use DBs
23TDM HA Evaluation Criteria
24Questions Answers
- Contact Information
- Alok Pareek
- www.goldengate.com
- (415) 777-0200
- apareek_at_goldengate.com
- 301 Howard Street, Suite 2100, San Francisco CA
94105
25Lag points address Logical failures
One-to-Many
One-to-One
Source
Source
Target
Target (current)
Target (1 hour lag)
Target (1 day lag)