Experiences with Real-Time Data Warehousing Using Oracle Database 10G - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Experiences with Real-Time Data Warehousing Using Oracle Database 10G

Description:

Experiences with Real-Time Data Warehousing Using Oracle Database 10G Mike Schmitz High Performance Data Warehousing mike.schmitz_at_databaseperformance.com – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 45
Provided by: MichaelD134
Category:

less

Transcript and Presenter's Notes

Title: Experiences with Real-Time Data Warehousing Using Oracle Database 10G


1
Experiences with Real-Time Data Warehousing Using
Oracle Database 10G
  • Mike Schmitz
  • High Performance Data Warehousing
  • mike.schmitz_at_databaseperformance.com
  • Michael Brey
  • Principal Member Technical Staff
  • ST/NEDC Oracle Engineering
  • Oracle Corporation

2
Agenda
  • The meaning of Real-Time in Data Warehousing
  • Customer Business Scenario
  • Customer Environment
  • Real-Time Requirement
  • Our Real-Time Solution
  • Real-Time data architecture
  • Incremental Operational Source Change Capture
  • Transformation and Population into DW Target
  • Simplified Functional Demonstration
  • Asynchronous Change Data Capture (Oracle)
  • Performance Characteristics and Considerations

3
My Background
An independent data warehousing consultant
specializing in the dimensional approach to data
warehouse / data mart design and implementation
with in-depth experience utilizing efficient,
scalable techniques whether dealing with
large-scale data warehouses or small-scale,
platform constrained data mart implementations. I
deliver dimensional design and implementation as
well as ETL workshops in the U.S. and Europe. I
have helped implement data warehouses using
Redbrick, Oracle, Teradata, DB2, Informix, and
SQL Server on mainframe, UNIX, and NT platforms,
working with small and large businesses across a
variety of industries including such customers as
Hewlett Packard, American Express, General Mills,
ATT, Bell South, MCI, Oracle Slovakia, J.D.
Power and Associates, Mobil Oil, The Health
Alliance of Greater Cincinnati, and the French
Railroad SNCF.
4
Real-Time in Data Warehousing
  • Data Warehousing Systems are complex environments
  • Business rules
  • Various data process flows and dependencies
  • Almost never pure Real-Time
  • Some latency is a given
  • What do you need?
  • Real Time
  • Near Real-Time
  • Just in Time for the business

5
Customer Business Scenario
  • Client provides software solutions for utility
    companies
  • Utility companies have plants generating energy
    supply
  • Recommended maximum output capacity
  • Reserve Capacity
  • Buy supplemental energy as needed
  • Peak demand periods are somewhat predictable
  • Each day is pre-planned on historical behavior
  • Cheaper to buy energy ahead
  • Expensive to have unused capacity
  • Existing data warehouse supports the planning
    function
  • Reduced option expenses
  • Cut down of supplemental energy costs

6
Customer Real-Time Requirement
  • Getting more in-time accuracy enhances
    operational business
  • Compare today's plant output volumes to
    yesterdays or last weeks average
  • Know when to purchase additional options or
    supplies
  • Customer Target
  • Actual data within a 5 minute lag
  • Use a single query
  • Use a single tool

7
Sample Analysis Graph
8
Our Real-Time SolutionOverview
  • Three-Step Approach
  • Implement a real-time DW data architecture
  • Near real-time incremental change capture from
    operational system
  • Transformation and Propagation (population) of
    change data to DW

9
Our Real-Time SolutionReal-Time DW Data
Architecture
  • Add a Real-Time Partition to our Plant Output
    Fact Table for current day activity
  • Separate physical table
  • No indexes or RFI constraints (data coming in
    will have RFI enforced) during daily activity
  • UNION ALL viewed to the Plant Output Fact Table

10
Our Real-Time SolutionChange Capture and
Population
  • Incremental change capture from operational site
  • Synchronous or Asynchronous
  • Transformation and Propagation (population) of
    change data to the DW
  • Continuous trickle feed or periodic batch

Synch CDC
Staging
DW
Trigger
Operations
Asynch CDC
Batch
11
Our Real-Time SolutionIncremental Change Capture
  • Done with Oracles Change Data Capture (CDC)
    functionality
  • Synchronous CDC available with Oracle9i
  • Asynchronous CDC with Oracle10g
  • Asynchronous CDC is the preferred mechanism
  • Decoupling of change capture from the operational
    transaction

12
Asynchronous CDC
Oracle10g
Redo log files
  • SQL interface to change data
  • Publish/subscribe paradigm
  • Parallel access to log files, leveraging Oracle
    Streams
  • Parallel transformation of data

OLTP DB
13
Our Real-Time SolutionPopulation of Change Data
into DW
  • Continuous
  • Change table owner creates trigger to populate
    warehouse real-time partition
  • Periodic Batch
  • Utilize the Subscribe Interface
  • Subscribe to specific table and column changes
    through view
  • Sets a window and extracts the changes at
    required period
  • Purges view and moves window

14
Our Real-Time SolutionThe Daily Process
  • Integrate daily changes into historical fact
    table
  • At the end of the day
  • index the current day table and apply constraints
    (no validate)
  • Create new fact table partition
  • Exchange current day table with new partition
  • Create next days Real-Time Partition table

15
Simplified Functional DemoSchema Owners
  • AO_CDC_OP
  • Owns the operational schema
  • AO_CDC
  • Owns the CDC change sets and change tables (needs
    special cdc privileges)
  • ? CDC Publish Role
  • AO_CDC_DW
  • Owns the data warehouse schema (also needs
    special cdc privileges)
  • ? CDC Subscribe Role

16
Simplified Functional DemoOperational Schema
17
Simplified Functional DemoData Warehouse Schema
18
What do we have?
  • Operational transaction table
  • AO_CDC_OP.PLANT_OUTPUT
  • DW historical partitioned fact table
  • AO_CDC_DW.F_PLANT_OUTPUT
  • DW current day table (Real-Time Partition)
  • AO_CDC_DW.F_CURRENT_DAY_PLANT_OUTPUT
  • Data Warehouse UNION ALL view
  • AO_CDC_DW.V_PLANT_OUTPUT

19
First
  • The CDC user publishes
  • Create a Change Set (CDC_DW)
  • Add supplemental logging for the operational
    table
  • Create a change table for the operational table
    (CT_PLANT_OUTPUT)
  • Force database logging on the tablespace to catch
    any bulk insert / APPEND / (non-logged)
    activity

20
Next Transform and Populate
  • One of two ways
  • Continuous Feed
  • Logged Insert activity
  • Permits nearer real-time
  • Constant system load
  • Periodic Batch Feed
  • Permits non-logged bulk operations
  • You set the lag time how often do you run the
    batch process?
  • Hourly
  • Every five minutes
  • Less system load overall

21
The Continuous Feed
  • Put an insert trigger on the change table which
    joins to the dimension tables picking up the
    dimension keys and does any necessary
    transformations

22
The Batch Feed
  • The CDC schema owner
  • Authorizes AO_CDC_DW to select from the change
    table (the select will be accomplished via a
    generated view)
  • The DW schema owner
  • Subscribes to the change table and the columns he
    needs (with a centralized EDW approach this would
    usually be the whole change table) with a
    subscription and view name
  • Activates the subscription
  • Extract
  • Extend the window
  • Extracts changed data via the view (same code as
    trigger)
  • Purges the window (logical Delete physical
    deletion is handled by the CDC schema owner)

23
Extraction from Change Table View
insert / APPEND/ into ao_cdc_dw.F_CURRENT_DAY_P
LANT_OUTPUT (generating_plant_key,
output_day_key, output_minute_key,
output_actual_qty_in_kwh) select
p.generating_plant_key ,d.output_day_key
,m.output_minute_key ,new.output_in_kwh
from ao_cdc_dw.PO_ACTIVITY_VIEW new
inner join ao_cdc_dw.d_generating_plant p
on new.plant_id
p.plant_id inner join
ao_cdc_dw.d_output_day d
on trunc(new.output_ts) d.output_day
inner join ao_cdc_dw.d_output_minute m
on
to_number(substr(to_char(new.output_ts,'YYYYMMDD
HHIISS'),10,2)substr(to_char(new.output_ts,'YY
YYMMDD HHIISS'),13,2)) m.output_time_24hr_nbr

24
Next Step
  • Add the current days activity (the contents of
    the current day fact table) to the historical
    fact table as a new partition
  • Index and apply constraints to the current day
    fact table
  • Add a new empty partition to the fact table
  • Exchange the current day fact table with the
    partition
  • Create the new current day fact table

25
Lets step thru this live
26
Summary
  • We created a real-time partition for current day
    activity
  • We put CDC on the operational table and created a
    change table populated by an asynchronous process
    (reads redo log)
  • We demonstrated continuous feed to the DW by
    using a trigger based approach
  • We demonstrated a batch DW feed by using the CDC
    subscribe process
  • We showed how to add the current day table to the
    fact table and set up the next days table
  • An electronic copy of the SQL used to build this
    prototype is available by emailing
    mike.schmitz_at_databaseperormance.com

27
Michael BreyPrincipal Member Technical
Staff ST/NEDC Oracle Engineering Oracle
Corporation
28
Overview
  • Benchmark Description
  • System Description
  • Database Parameters
  • Performance Data

29
The Benchmark
  • Customer OLTP benchmark run internally at Oracle
  • Insurance application handling customer inquires
    and quotes over the phone
  • N users perform M quotes
  • Quote actual work performed during a call with
    a customer
  • Mixture of Inserts, Updates, Deletes, Singleton
    Selects, Cursor Fetches, Rollbacks/commits,
    savepoints
  • Compute average time for all quotes across users

30
System Info
  • SunFire 4800
  • A standard Shared Memory Processor (SMP)
  • 8 900-Mhz CPUs
  • 16 GB physical memory
  • Solaris 5.8
  • Database storage striped across 8 Sun StorEdge
    T3 arrays (9X36.4MB each)

31
Database Parameters
  • Parallel_max_servers 20
  • Streams_pool_size 400M (default 10 shared pool)
  • Shared_pool_size 600M
  • Buffer cache 128M
  • Redo buffers 4M
  • Processes 600

32
Change Data Capture (CDC)
Sync Async HotLog Async AutoLog
Available Oracle 9i Oracle 10g Oracle 10g
source system cost System resources System resources Minimal
Part of txn YES NO NO
Changes seen Real time Near real time Variable
Systems 1 1 2
33
Tests
  • Conducted tests with Asynchronous Hotlog CDC
    enabled and disabled and with Sync CDC.
  • Asynchronous Hotlog CDC tests conducted at
    different log usage levels
  • Appr. 10, 50, and 100 of all OLTP tables with
    DML operations were included in CDC
  • Tests run with
  • 250 concurrent users
  • Continuous peak workload after ramp-up
  • 175 transactions per second

34
Impact on Transaction Time
35
CPU ConsumptionSupplemental Logging
36
CPU Consumption10 DML Change tracking
37
CPU Consumption50 DML Change tracking
38
CPU Consumption10,100 DML Change tracking
39
Latency of Change Tracking
  • Latency is defined as the time between the actual
    change and its reflection in the Change Capture
    Table
  • Latency timechange record insert timeredo
    log insert
  • Latency measurement were made for the 100
    Asynchronous Hotlog CDC run
  • 99.7 of records arrived in less than 2 secs
  • 53.5 of records arrived in less than 1 sec
  • Remaining records arrived in less than 3 sec
  • Asynchronous CDC kept up with the constant high
    OLTP workload all the time

40
Summary
  • Change Data Capture enables enterprise-ready near
    real-time capturing of change data
  • No fallback for constant high-load OLTP
    environments
  • Minimal impact on origin OLTP transactions
  • Predictable additional resource requirements,
    solely driven by the amount of change tracking
  • Oracle provides the flexibility to meet your
    on-time business needs

41
A
42
Next Steps.Data Warehousing DB Sessions
Monday
Tuesday
  • 1100 AM
  • 40153, Room 304
  • Oracle Warehouse Builder
  • New Oracle Database 10g Release
  • 330 PM
  • 40176, Room 303
  • Security and the Data Warehouse
  • 400 PM
  • 40166, Room 130
  • Oracle Database 10g
  • SQL Model Clause

830 AM 40125, Room 130 Oracle Database 10g A
Spatial VLDB Case Study 330 PM 40177, Room
303 Building a Terabyte Data Warehouse, Using
Linux and RAC   500 PM 40043, Room 104 Data
Pump in Oracle Database 10g Foundation for
Ultrahigh-Speed Data Movement
For More Info On Oracle BI/DW Go To
http//otn.oracle.com/products/bi/db/dbbi.html
43
Next Steps.Data Warehousing DB Sessions
Thursday
Business Intelligence and Data Warehousing Demos
All Four Days In The Oracle Demo Campground
830 AM 40179, Room 304 Oracle Database 10g
Data Warehouse Backup and Recovery 1100
AM 36782, Room 304 Experiences with Real-Time
Data Warehousing using Oracle 10g
100PM 40150, Room 102 Turbocharge your
Database, Using the Oracle Database 10g SQLAccess
Advisor
Oracle Database 10g Oracle OLAP Oracle Data
Mining Oracle Warehouse Builder Oracle
Application Server 10
For More Info On Oracle BI/DW Go To
http//otn.oracle.com/products/bi/db/dbbi.html
44
Reminder please complete the OracleWorld
online session surveyThank you.
Write a Comment
User Comments (0)
About PowerShow.com