Title: Trends in Business Continuity Planning RealTime Access to RealTime Information
1Trends in Business Continuity PlanningReal-Time
Access to Real-Time Information
- Alok Pareek, VP Technology
- Feb 22, 2008
2Agenda
- Introduction
- Trends in Business Continuity
- Availability, A 360 degree view
- Challenges, Approaches
- Data Freshness
- Real Time as a trend
3Background
- Alok Pareek
- Vice-President Technology
- GoldenGate Software 3 years
- RD, Architecture, Real World Deployments
- Oracle Corp. Server Technologies Development
10 years - Recovery (Redo Generation and Write Ahead
Logging) - High Speed Data Movement (Cross Platform
Transportable Tablespaces) - Patent granted/filed key contributions to Oracle
kernel include - Multi-threaded redo generation (9i)
- Multiple block size cache support (9i)
- Cross Platform Transportable Tablespaces (Oracle
10g) - Whole database transport (10.2)
- Data Recovery/Repair (11g)
- Stanford University
- Research area - Recovery and Database Systems
4About GoldenGate
Worldwide offices USA, EMEA, Asia Pacific, Latin
America
Exceptional customer support 24x7 global coverage
Company Strength and Service
Established in 1995
Rapid Growth in Strategic Partners
Established, Loyal Customer Base
5 Solutions
- HIGH AVAILABILITY
- REAL TIME DATA INTEGRATION
-
6GoldenGate Customers Major Industries
Represented
Healthcare
Banking
Cable, Telco, and Manufacturing
Financial and Insurance Services
eBusiness, Retail, Public/Govt., Services
7Business Continuity Trends a few points
- Broad Area
- Lots of papers, articles, books
- People/Process vs. Technology
- Trends are interesting
- Highlight Real Time Access to Real Time
Information - Link Business Continuity to Availability
- Link Availability to Access
- Link Access to Freshness
- Make the case for Real-Time
- Focus (today)
- Mission Critical Applications
- Transactional Data
8A few Points to Consider
- Data produced by Humans vs. Machines
- Competition is healthy and encouraged,
Globalization - What does an 0800-1700 or a weekend mean anyway?
- Deletes/Updates are out of fashion, should they
ever have been introduced? - You want NOW but your pecking order is low
9Just a decade ago
OLTP
OLAP/DW
- Mostly updates
- Many small transactions
- Mb-Tb of data
- Raw data
- Clerical users
- Up-to-date data
- Consistency, recoverability critical
- Gb - Tb of data
- Mostly reads
- Queries long, complex
- Gb - Tb of data
- Summarized, consolidated data
- Decision-makers, analysts as users
- Up-to-date data
- Consistency, recoverability critical
- Operational data
Hector Garcia Molina Stanford
Data Warehousing and OLAP
10Where Real-Time Matters
Real-Time Access
Continuous Availability
Disaster Recovery
Disaster Tolerance
Continuous Operations
Tape Backups Disk Mirroring
Block-level Replication Hot Standby
Active-Active
Data Integration
BATCH Weeks / Days Custom Scripts
RIGHT TIME Hours ETL Scripts
REAL TIME Sub-Seconds TDM
NEAR REAL TIME Minutes / Seconds EAI
ETL Scripts
Sub-Seconds
Real-Time Information
Physical Infrastructure
Data
Transactional Operations
11High Availability
- Definition
- Ratio of system uptime to sum of uptime and
downtime - Availability MTTF/(MTTFMTTR)
- Dependability
- Addressing Performance vs. Reliability in
computer systems - Hardware Faults, Software Bugs, Human errors are
realities in any complex system deployment - Enterprise applications need to function 24x7
- Disasters are no longer a distant threat
- Inadequate planning to handle outages
- Social Reputation, Competition, and the single
click issue - Direction
- Redefinition of Critical Systems, and uptime
expectations - From fault tolerance to reducing MTTR
- Batches to MicroBatches to continuous feeds
12The 3 States of Availability 3600 View
A P P L I C A T IO N S
CRM, BILLING, SALES, TELCO, FIN
2 Planned Outage
3 Unplanned Outage
Unplanned outage
Data Failure
13High Availability Concerns (No Outage)
1 Active
- Latency
- DSS vs. OLTP
- conflicting requirments
- Mixed workload
- Data validation
- Data Transformation
14High Availability Concerns (Planned Outages)
2 Planned Outage
Unplanned outage
Common Approaches
- Selected windows of downtime
- Phased approach to maintenance
Migrations
Upgrades
Maintenance
15High Availability Concerns (Unplanned Outages)
- Database Restore/Recovery
- RAID
- Shared Disk Clusters
- Standby database
3 Unplanned Outage
Common Approaches
16Why are unplanned outages difficult? (1)
- Understanding failures
- Failure Points
- Statement
- Process
- Instance
- Database
- Site
- Failure Types
- Physical (Media, corruption, inconsistency
amongst redundant copies) - Logical (Incorrect DML, out-of-synch, accidental
table drop) - Failure Handling
- Automatic
- Manual
17Why are unplanned outages difficult? (2)
- Mapping of symptoms to failure categories is
complex - Planning for all failure cases is non trivial
- Native repair solutions do not address complex or
multiple failures - Root cause analysis affects MTTR
- GOING FORWARD
- Failover, isolated repair will replace
conventional recovery in computing environments - Restores will be frowned upon
- System designers will increasingly focus on micro
(surgical) repair
18Ongoing HA Challenges
- Meeting Service levels due to performance impact
- Providing windows for Application, Database, OS
upgrades/maintenance/migration - Mapping of symptoms to failure categories is
complex - Planning for all failure cases is non trivial
- Native repair solutions do not address complex or
multiple failures - Root cause analysis affects MTTR
- GOING FORWARD
- Keeping OLTP really OLTP
- Rolling upgrade solutions will be expected of
Application vendors - Failover, isolated repair will replace
conventional recovery in computing
environments, Restores will be frowned upon - System designers will increasingly focus on micro
(surgical) repair
19Keeping OLTP Systems Performing (State 1)
- Splitting workloads across multiple databases
for - Horizontal SCALABLITY for high throughput
environments - High speed, low-cost distributed caching
- Write/Read
- Multi Master load balancing solutions yielding
better performance - Avoid global serialization
- Faster response times (geographic proximity)
- Partitioning manageability advantages
- Offloading Reporting to real time copies
- Solution matters
- ROI on existing standby systems (hardware, or
databases) - BITS ? TRANSACTIONS ? APPLICATIONS?BUSINESS
SOLUTION
20Eliminating Planned Outage Windows (State 2)
- Addressing Upgrades/Migrations
- Rolling Database upgrades
- Zero-Downtime Platform migrations
- Zero-Downtime Database migrations
- Rolling Application upgrades
- Maintenance operations
- Index reorganizations
- Regenerating Statistics for Query optimizer
- Health Checks
- Verification, Validation
21The rolling approach Technical challenges
- Data issues
- The secondary copy
- Instantiating Terabytes/Petabytes
- Staging areas
- Change Management
- Special Handling
- Synchronization issues
- Incremental data movement
- Transactional Integrity
- Reliable change data capture framework
- Source database impact
- Performance
- Application/Schema changes
- Failback strategy
- System/Application verification
- Continued data changes
22Protection from Failures (State 3)
- Failover to synchronized target (Live)
- Multiple copies of data for high availability
(temporal versions) - Decouple Root Cause Analysis from MTTR
- Eliminate Restore from MTTR
- Application data already in cache increased
response times - Undo transactions using logical recovery features
- Recovery from user errors
- Increase availability by handling special class
failures - Redo block corruption
- Storage stack corruption
- Protection from lost writes from host
23Availability and Data Freshness
To reduce latency and drive value, data
acquisition needs to approach real time.
Availability Requirements matter.
Business event
Data latency
Analysis latency
Business Value
Decision latency
Action Time
From TDWI The Business Case for Real-Time
BI Based on concept developed by Richard
Hackathorn, Bolder Technology
24Summary
- Business Continuity from a technology standpoint
is about addressing High Availability of
Applications - High Availability encompasses
- SLA on Production Systems
- Addressing Planned Outages without outage windows
- Addressing Unplanned Outages using Failover (all
failures) - Trends
- Continuous Application Availability
- Data Freshness across multiple systems
- Real Time Access to Real Time Information is the
driver
25Thank You.QAapareek_at_goldengate.com