Title: Oracle Change Data Capture Jack Raitto, Development Manager
1Oracle Change Data Capture
- Jack Raitto, Development Manager Oracle NEDC
- NYOUG Long Island SIGOctober 7, 2004
2Capture your change data for FREE!
Zero additional license cost over Oracle10g EE
Virtually zero source system processing cost
3What is Oracle CDC?
- Captures change data from operational system(s)
as it occurs - Part of Extract / Transform / Load (ETL) process
for DSS / Data warehouse, potentially other
applications - Optimizes the extract phase
- Unleashes SQL power for transformations
- Provides management framework for change data
4How was it done before (old way)?
5CDC Advantages
- Built in, custom fit, evolves with the database
- Delivers change data when you need it, where you
need it - Offers several tradeoffs between timely change
delivery vs. source system overhead (sync, async
hotlog, async autolog, etc.) - Assumes complete change management responsibility
6CDC Advantages (concl.)
- Captures all change data along with transaction
information see all changes a given transaction
made and who made them - Transactional consistency for changes across
multiple source tables is guaranteed - Transparently coordinates sharing of change data
across users and applications - You dont need rocket scientists on your staff!
7CDC Configurations
8How CDC Works Sync CDC
- Uses internal triggers to capture before and/or
after images of new and updated rows - Has the same performance implications as capture
via user triggers - Delivers change data in real-time
- Uses the same interface as async CDC
9Synchronous CDC HotLog
Combined Source / Operational BI System
CDC Change Tables
ETL Process
Upsert to Load Dimension Tables
Customer
Triggers
Direct Path Insert to load Fact Tables
Order
CDC
10How CDC Works Async CDC
- Relational interface to Streams
- Prepackaged Streams application
- Asynchronously captures change data from
redo/archive logs - Presents relational interface to change data
stream - Can operate on source system (hot log) or staging
system (auto log)
11Foundations of Async CDC
Change capture Change management Warehouse loading
Async CDC
Replication Message queuing Warehouse
loading Event notification Data protection
Streams
Redo log inspection Debugging Auditing Reversing
transactions
LogMiner
12Asynchronous CDC HotLog
Combined Source / Operational BI System
CDC Change Tables
ETL Process
Upsert to Load Dimension Tables
Customer
LogMiner
Active Redo Log
Streams
Direct Path Insert to load Fact Tables
Order
CDC
13Asynchronous CDC AutoLog
Data Warehouse / Staging System
Source Database
CDC Change Tables
ETL Process
Upsert to Load Dimension Tables
Customer
LogMiner
Streams
Redo Logs
Direct Path Insert to load Fact Tables
Order
CDC
Arch Process
Archived Redo Logs
14Using CDC Publish/Subscribe
- Publisher supplies, subscribers consume change
data - Model allows sharing of change data across users
and applications - Coordinates retention / purge of change data
- Prevents application from accidentally processing
change data more than once - Guarantees transactional consistency of change
data across source tables via change sets
15Using CDC Publish/Subscribe
Subscriber 1
Subscription
Publisher
Change Data
Publication
Subscriber 2
Subscription
16Publisher Concepts
- Change source
- Defines the source system to CDC
- Change set
- Collection of source tables for which
transactionally consistent change data is needed - Change table
- Container to receive change data
- Is published to subscribers
17Publisher Concepts
Source Database HQ
Staging Database DW
Change Source HQ_SRC
Source table sh.salesPROD_IDCUST_IDPROMO
_IDAMOUNT_SOLD QUANTITY_SOLD
Change Set SH_SET
Change table sales_ctPROD_IDCUST_IDPROMO
_IDAMOUNT_SOLD
Source table sh.promotionsPROMO_IDPROMO_S
UBCATPROMO_CAT PROMO_COST
Change table promo_ctPROMO_IDPROMO_SUBCAT
PROMO_CAT
18Publish Package
DBMS_CDC_PUBLISH CREATE / ALTER /
DROP_AUTOLOG_CHANGE_SOURCE CREATE / ALTER /
DROP_CHANGE_SET CREATE / ALTER /
DROP_CHANGE_TABLE PURGE PURGE_CHANGE_SET
PURGE_CHANGE_TABLE DROP_SUBSCRIPTION
19Using Change Data Subscribers
- The subscriber creates a subscription from an
available publication - The subscription provides a moving window (view)
to the change data - Subscriptions go against a single change set and
are therefore transactionally consistent - When all subscribers have advanced past old
change data, CDC automatically and efficiently
purges
20Subscriber Concepts
Staging Database DW
Subscription sales_promo_list
Change Set SH_SET
Publication on sh.salesPROD_IDCUST_IDPROM
O_IDAMOUNT_SOLD
Subscriber view
spl_sales
Publication on sh.promotionsPROMO_IDPROMO
_SUBCATPROMO_CAT
Subscriber view
spl_promos
21Subscriber View
Subscriber view spl_sales
Insert
Updatebefore
Updateafter
Insert
Insert
Delete
Insert
Insert
22Subscriber Package
DBMS_CDC_SUBSCRIBE CREATE_SUBSCRIPTION
SUBSCRIBE ACTIVATE_SUBSCRIPTION
EXTEND_WINDOW PURGE_WINDOW
DROP_SUBSCRIPTION
23Security
- Sync publisher must have SELECT access to the
source table - Async publisher must have EXECUTE_CATALOG_ROLE
privilege - Publisher uses GRANT and REVOKE on change tables
to control subscriber access
24Performance Benchmark
- Objectives
- Determine impact on transaction time
- Determine latency
- Source system Oracle 10g R1 Beta, SunFire 4800
SMP 8x900Mhz/16GB w/striped 8 x Sun StorEdge T3
arrays (9X36.4MB each) - Customer insurance quote OLTP application run at
Oracle, 250 concurrent users / 175 TPS, system
warmed up (steady state) - Mixture of Inserts, Updates, Deletes, Singleton
Selects, Cursor Fetches, Rollbacks / Commits,
savepoints - Capture changes on all tables
Your mileage will vary!
25Transaction Performance
Transaction elongated by 10 Relative impact
varies depending on other overhead
26Transaction Performance
Transaction elongated by 8 Can reduce elongation
by adding RAC nodes / CPUs
27Transaction Performance
Transaction elongation virtually
eliminated Change capture processing moved off
system
28HotLog Latency Performance
About ½ the change data arrived in 1
second Virtually all the change data arrived in 2
seconds
29Summary
- CDC assumes the burden of change capture for you
- Change data is guaranteed consistent and complete
- Change data can be shared across users and
applications effortlessly - CDC delivers change data where you need it, when
you need it, and with minimal overhead
30For More Information
- Oracle Data Warehousing Guide, 10gR1, Chapter 16
- Oracle PL/SQL Packages and Types Reference,
10gR1, packages DBMS_CDC_ - http//www.oracle.com/technology/oramag/oracle/03-
nov/o63tech_bi.html - http//www.oracle.com/technology/products/bi/db/10
g/pdf/twp_dss_ontime_etl_10gr1_0304.pdf - http//www.rittman.net/archives/000901.html
- http//www.nyoug.org/cdc.pdf (Oracle9i)
31Questions?
?