Title: Experiences with Real-Time Data Warehousing Using Oracle Database 10G
1Experiences with Real-Time Data Warehousing Using
Oracle Database 10G
- Mike Schmitz
- High Performance Data Warehousing
- mike.schmitz_at_databaseperformance.com
- Michael Brey
- Principal Member Technical Staff
- ST/NEDC Oracle Engineering
- Oracle Corporation
2Agenda
- The meaning of Real-Time in Data Warehousing
- Customer Business Scenario
- Customer Environment
- Real-Time Requirement
- Our Real-Time Solution
- Real-Time data architecture
- Incremental Operational Source Change Capture
- Transformation and Population into DW Target
- Simplified Functional Demonstration
- Asynchronous Change Data Capture (Oracle)
- Performance Characteristics and Considerations
3My Background
An independent data warehousing consultant
specializing in the dimensional approach to data
warehouse / data mart design and implementation
with in-depth experience utilizing efficient,
scalable techniques whether dealing with
large-scale data warehouses or small-scale,
platform constrained data mart implementations. I
deliver dimensional design and implementation as
well as ETL workshops in the U.S. and Europe. I
have helped implement data warehouses using
Redbrick, Oracle, Teradata, DB2, Informix, and
SQL Server on mainframe, UNIX, and NT platforms,
working with small and large businesses across a
variety of industries including such customers as
Hewlett Packard, American Express, General Mills,
ATT, Bell South, MCI, Oracle Slovakia, J.D.
Power and Associates, Mobil Oil, The Health
Alliance of Greater Cincinnati, and the French
Railroad SNCF.
4Real-Time in Data Warehousing
- Data Warehousing Systems are complex environments
- Business rules
- Various data process flows and dependencies
- Almost never pure Real-Time
- Some latency is a given
- What do you need?
- Real Time
- Near Real-Time
- Just in Time for the business
5Customer Business Scenario
- Client provides software solutions for utility
companies - Utility companies have plants generating energy
supply - Recommended maximum output capacity
- Reserve Capacity
- Buy supplemental energy as needed
- Peak demand periods are somewhat predictable
- Each day is pre-planned on historical behavior
- Cheaper to buy energy ahead
- Expensive to have unused capacity
- Existing data warehouse supports the planning
function - Reduced option expenses
- Cut down of supplemental energy costs
6Customer Real-Time Requirement
- Getting more in-time accuracy enhances
operational business - Compare today's plant output volumes to
yesterdays or last weeks average - Know when to purchase additional options or
supplies - Customer Target
- Actual data within a 5 minute lag
- Use a single query
- Use a single tool
7Sample Analysis Graph
8Our Real-Time SolutionOverview
- Three-Step Approach
- Implement a real-time DW data architecture
- Near real-time incremental change capture from
operational system - Transformation and Propagation (population) of
change data to DW
9Our Real-Time SolutionReal-Time DW Data
Architecture
- Add a Real-Time Partition to our Plant Output
Fact Table for current day activity - Separate physical table
- No indexes or RFI constraints (data coming in
will have RFI enforced) during daily activity - UNION ALL viewed to the Plant Output Fact Table
10Our Real-Time SolutionChange Capture and
Population
- Incremental change capture from operational site
- Synchronous or Asynchronous
- Transformation and Propagation (population) of
change data to the DW - Continuous trickle feed or periodic batch
Synch CDC
Staging
DW
Trigger
Operations
Asynch CDC
Batch
11Our Real-Time SolutionIncremental Change Capture
- Done with Oracles Change Data Capture (CDC)
functionality - Synchronous CDC available with Oracle9i
- Asynchronous CDC with Oracle10g
- Asynchronous CDC is the preferred mechanism
- Decoupling of change capture from the operational
transaction
12Asynchronous CDC
Oracle10g
Redo log files
- SQL interface to change data
- Publish/subscribe paradigm
- Parallel access to log files, leveraging Oracle
Streams - Parallel transformation of data
OLTP DB
13Our Real-Time SolutionPopulation of Change Data
into DW
- Continuous
- Change table owner creates trigger to populate
warehouse real-time partition - Periodic Batch
- Utilize the Subscribe Interface
- Subscribe to specific table and column changes
through view - Sets a window and extracts the changes at
required period - Purges view and moves window
14Our Real-Time SolutionThe Daily Process
- Integrate daily changes into historical fact
table - At the end of the day
- index the current day table and apply constraints
(no validate) - Create new fact table partition
- Exchange current day table with new partition
- Create next days Real-Time Partition table
15Simplified Functional DemoSchema Owners
- AO_CDC_OP
- Owns the operational schema
- AO_CDC
- Owns the CDC change sets and change tables (needs
special cdc privileges) - ? CDC Publish Role
- AO_CDC_DW
- Owns the data warehouse schema (also needs
special cdc privileges) - ? CDC Subscribe Role
16Simplified Functional DemoOperational Schema
17Simplified Functional DemoData Warehouse Schema
18What do we have?
- Operational transaction table
- AO_CDC_OP.PLANT_OUTPUT
- DW historical partitioned fact table
- AO_CDC_DW.F_PLANT_OUTPUT
- DW current day table (Real-Time Partition)
- AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT
- Data Warehouse UNION ALL view
- AO_CDC_DW.V_PLANT_OUTPUT
19First
- The CDC user publishes
- Create a Change Set (CDC_DW)
- Add supplemental logging for the operational
table - Create a change table for the operational table
(CT_PLANT_OUTPUT) - Force database logging on the tablespace to catch
any bulk insert / APPEND / (non-logged)
activity
20Next Transform and Populate
- One of two ways
- Continuous Feed
- Logged Insert activity
- Permits nearer real-time
- Constant system load
- Periodic Batch Feed
- Permits non-logged bulk operations
- You set the lag time how often do you run the
batch process? - Hourly
- Every five minutes
- Less system load overall
21The Continuous Feed
- Put an insert trigger on the change table which
joins to the dimension tables picking up the
dimension keys and does any necessary
transformations
22The Batch Feed
- The CDC schema owner
- Authorizes AO_CDC_DW to select from the change
table (the select will be accomplished via a
generated view) - The DW schema owner
- Subscribes to the change table and the columns he
needs (with a centralized EDW approach this would
usually be the whole change table) with a
subscription and view name - Activates the subscription
- Extract
- Extend the window
- Extracts changed data via the view (same code as
trigger) - Purges the window (logical Delete physical
deletion is handled by the CDC schema owner)
23Extraction from Change Table View
insert / APPEND/ into ao_cdc_dw.F_CURRENT_DAY_P
LANT_OUTPUT (generating_plant_key,
output_day_key, output_minute_key,
output_actual_qty_in_kwh) select
p.generating_plant_key ,d.output_day_key
,m.output_minute_key ,new.output_in_kwh
from ao_cdc_dw.PO_ACTIVITY_VIEW new
inner join ao_cdc_dw.d_generating_plant p
on new.plant_id
p.plant_id inner join
ao_cdc_dw.d_output_day d
on trunc(new.output_ts) d.output_day
inner join ao_cdc_dw.d_output_minute m
on
to_number(substr(to_char(new.output_ts,'YYYYMMDD
HHIISS'),10,2)substr(to_char(new.output_ts,'YY
YYMMDD HHIISS'),13,2)) m.output_time_24hr_nbr
24Next Step
- Add the current days activity (the contents of
the current day fact table) to the historical
fact table as a new partition - Index and apply constraints to the current day
fact table - Add a new empty partition to the fact table
- Exchange the current day fact table with the
partition - Create the new current day fact table
25Lets step thru this live
26Summary
- We created a real-time partition for current day
activity - We put CDC on the operational table and created a
change table populated by an asynchronous process
(reads redo log) - We demonstrated continuous feed to the DW by
using a trigger based approach - We demonstrated a batch DW feed by using the CDC
subscribe process - We showed how to add the current day table to the
fact table and set up the next days table - An electronic copy of the SQL used to build this
prototype is available by emailing
mike.schmitz_at_databaseperormance.com
27Michael BreyPrincipal Member Technical
Staff ST/NEDC Oracle Engineering Oracle
Corporation
28Overview
- Benchmark Description
- System Description
- Database Parameters
- Performance Data
29 The Benchmark
- Customer OLTP benchmark run internally at Oracle
- Insurance application handling customer inquires
and quotes over the phone - N users perform M quotes
- Quote actual work performed during a call with
a customer - Mixture of Inserts, Updates, Deletes, Singleton
Selects, Cursor Fetches, Rollbacks/commits,
savepoints - Compute average time for all quotes across users
30System Info
- SunFire 4800
- A standard Shared Memory Processor (SMP)
- 8 900-Mhz CPUs
- 16 GB physical memory
- Solaris 5.8
- Database storage striped across 8 Sun StorEdge
T3 arrays (9X36.4MB each)
31Database Parameters
- Parallel_max_servers 20
- Streams_pool_size 400M (default 10 shared pool)
- Shared_pool_size 600M
- Buffer cache 128M
- Redo buffers 4M
- Processes 600
32Change Data Capture (CDC)
Sync Async HotLog Async AutoLog
Available Oracle 9i Oracle 10g Oracle 10g
source system cost System resources System resources Minimal
Part of txn YES NO NO
Changes seen Real time Near real time Variable
Systems 1 1 2
33Tests
- Conducted tests with Asynchronous Hotlog CDC
enabled and disabled and with Sync CDC. - Asynchronous Hotlog CDC tests conducted at
different log usage levels - Appr. 10, 50, and 100 of all OLTP tables with
DML operations were included in CDC - Tests run with
- 250 concurrent users
- Continuous peak workload after ramp-up
- 175 transactions per second
34Impact on Transaction Time
35CPU ConsumptionSupplemental Logging
36CPU Consumption10 DML Change tracking
37CPU Consumption50 DML Change tracking
38CPU Consumption10,100 DML Change tracking
39Latency of Change Tracking
- Latency is defined as the time between the actual
change and its reflection in the Change Capture
Table - Latency timechange record insert timeredo
log insert - Latency measurement were made for the 100
Asynchronous Hotlog CDC run - 99.7 of records arrived in less than 2 secs
- 53.5 of records arrived in less than 1 sec
- Remaining records arrived in less than 3 sec
- Asynchronous CDC kept up with the constant high
OLTP workload all the time
40Summary
- Change Data Capture enables enterprise-ready near
real-time capturing of change data - No fallback for constant high-load OLTP
environments - Minimal impact on origin OLTP transactions
- Predictable additional resource requirements,
solely driven by the amount of change tracking - Oracle provides the flexibility to meet your
on-time business needs
41A
42Next Steps.Data Warehousing DB Sessions
Monday
Tuesday
- 1100 AM
- 40153, Room 304
- Oracle Warehouse Builder
- New Oracle Database 10g Release
-
- 330 PM
- 40176, Room 303
- Security and the Data Warehouse
- 400 PM
- 40166, Room 130
- Oracle Database 10g
- SQL Model Clause
830 AM 40125, Room 130 Oracle Database 10g A
Spatial VLDB Case Study 330 PM 40177, Room
303 Building a Terabyte Data Warehouse, Using
Linux and RAC 500 PM 40043, Room 104 Data
Pump in Oracle Database 10g Foundation for
Ultrahigh-Speed Data Movement
For More Info On Oracle BI/DW Go To
http//otn.oracle.com/products/bi/db/dbbi.html
43Next Steps.Data Warehousing DB Sessions
Thursday
Business Intelligence and Data Warehousing Demos
All Four Days In The Oracle Demo Campground
830 AM 40179, Room 304 Oracle Database 10g
Data Warehouse Backup and Recovery 1100
AM 36782, Room 304 Experiences with Real-Time
Data Warehousing using Oracle 10g
100PM 40150, Room 102 Turbocharge your
Database, Using the Oracle Database 10g SQLAccess
Advisor
Oracle Database 10g Oracle OLAP Oracle Data
Mining Oracle Warehouse Builder Oracle
Application Server 10
For More Info On Oracle BI/DW Go To
http//otn.oracle.com/products/bi/db/dbbi.html
44Reminder please complete the OracleWorld
online session surveyThank you.