Jerry Held - PowerPoint PPT Presentation

About This Presentation
Title:

Jerry Held

Description:

UT. MOAB. George W. Hayduke. 7396. 5000. 84532. 84532. 84532. 30605. Zip. Jan 15, 2003. Jan 15, 2003 ... UT. UT. UT. GA. State. MOAB. Doc Sarvis. 7397. MOAB ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 75
Provided by: Analy7
Category:
Tags: held | jerry | ut

less

Transcript and Presenter's Notes

Title: Jerry Held


1
(No Transcript)
2
How cool is that? Capitalizing on Oracle 9i for
data warehousing
Session id 36482
  • Mike AmesOracle DBA
  • SAS Institute

3
Topics
  • Performance Parallelism and Design
  • Capitalizing on of Oracle 9i for DW
  • External Tables
  • Merge Multiple Inserts
  • Partitioning
  • Materialized View Enhancements
  • Bitmap Join Indexes
  • SQL for Analysis
  • Conclusions Questions

4
Performance Parallelism and Design serial
operation
  • At 20MB/sec it takes 14 ½ hours to read 1TB
    serially
  • With perfect parallelism and 16 parallel
    processes it takes about 54 minutes (assuming
    sustained throughput)

5
Performance Parallelism and Design
  • Amdahls Law in a nutshell
  • Speedup (S) Efficiency gained by executing a
    process in parallel
  • Formula for Speedup
  • S 1/ ( sequential) ( parallel/
    processors) overhead)
  • S Speedup
  • N Number of Processors
  • B of the process or algorithm that is serial
  • S 1/ ( B (1-B/N) O )
  • Example 8 processors 5 serial operations

6
Performance Parallelism and Design
  • Why is this important?
  • Dependencies created by design and complexity
    reduce our ability to parallelize data
    warehouse operations.
  • S 1/ ( 5) (95/ 8) 0) 5.9
  • Assuming perfect parallelism, a query that takes
    30 minutes to execute serially would take just
    over 5 minutes in parallel.
  • Keys to parallel performance
  • Minimize dependencies
  • Minimize overhead associated with complexity

7
Performance Parallelism and Design
  • Two things to note
  • Incremental speedup by doubling the processors
    is dependant on Sequential
  • 5 sequential with 4 processors gt 20 sequential
    with 8 processors

8
Performance Parallelism and Design realistic
expectations
9
Performance Parallelism and Design measuring
performance
  • Database performance is generally measured by
  • Load Performance
  • Ability to parallelize
  • Order number of operations and complexity to
    maintain integrity
  • Ability to leverage RDBMS load facilities
  • Query Performance
  • Ability to parallelize
  • Ability to leverage partitioning
  • Ability to exploit query re-write summary data
  • Number of sort and join operations
  • Usability
  • Ability of users to capitalize on data
  • Level of complexity

10
Performance Parallelism and Design critical
factor
  • Database design is critical for performance
  • Ability to parallelize is constrained by
    dependencies
  • Data and referential integrity
  • Order of operations
  • Ability to leverage RDBMS features is constrained
    by design
  • Loading
  • Partitioning
  • Indexing
  • Query re-write
  • Materialized Views
  • End user satisfaction is constrained by
    complexity
  • Ease of use
  • Data quality

11
Performance Parallelism and Design
  • The bottom line design dictates performance
  • Good Design
  • Maximizes Parallelism by minimizing dependencies
  • Minimizes Complexity
  • Capitalizes on features of the RDMBS (Oracle 9i)

12
Capitalizing on of Oracle 9i
  • External Tables
  • Merge Multiple Inserts
  • Partitioning
  • Materialized View Enhancements
  • Bitmap Join Indexes
  • SQL for Analysis

13
Capitalizing on External Tables
  • External Tables
  • Enable you to reference multiple flat files as if
    they were a table on your database.
  • Restrictions
  • Read Only no DML (INSERT, UPDATE, DELETE)
  • Cant be used for partition exchange

14
Capitalizing on External Tables
  • Loading from flat files
  • Old method
  • Create a Stage Table on your warehouse
  • Use SQLLoader to bulk load the table
  • Read from stage table performing operations to
    put data into final format.
  • New method
  • Create a table that references the external file
  • Read from the external table performing
    operations to put data into final format.
  • Significantly reduces the number of times data
    has to be moved around and virtually eliminates
    the need to use SQLLoader directly.

15
Capitalizing on External Tables
External Table Insert /Append/ Example
File(s)
Target
Source
  • Steps
  • FTP 02Nov2003_Sales Extract
  • Alter Table add file(s) to location
  • Perform INSERT /Append/ INTO

16
Capitalizing on External Tables
CREATE TABLE NOV_SALES_EXTERNAL( PROD_ID
NUMBER(6), CUST_ID NUMBER, TIME_ID DATE,
CHANNEL_ID CHAR(1), PROMO_ID NUMBER(6),
QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD
NUMBER(10,2), UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)) ORGANIZATION external (
TYPE oracle_loader DEFAULT DIRECTORY extracts
ACCESS PARAMETERS ( RECORDS DELIMITED BY
NEWLINE CHARACTERSET US7ASCII BADFILE
log_file_dir' NOV_SALES_EXTERNAL.bad'
LOGFILE log_file_dir' NOV_SALES_EXTERNAL.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( '01Nov2003Sales.dat) )REJECT LIMIT UNLIMITED
Step 0. Create a target Table and an external
extract table Step 1. FTP Nov 2nd and 3rd extracts
17
Capitalizing on External Tables
  • Step 2. Alter External Table adding the new files
    to the location.
  • ALTER TABLE NOV_SALES_EXTERNAL
  • LOCATION ('01Nov2003Sales.dat',
    '02Nov2003Sales.dat', '03Nov2003Sales.dat')
  • Step 3. Insert into target table from external
    table
  • INSERT / APPEND/ INTO SALES_FACT
  • SELECT FROM NOV_SALES_EXTERNAL

18
Capitalizing on External Tables
Sales Extract External Table
Sales Table
INSERT/APPEND/ INTO SALES SELECT FROM
SALES_EXTRACT_EXTERNAL
19
Capitalizing on External Tables
  • Fixed-length fields are processed faster than
    delimited fields or records terminated by a
    string.
  • Single-character delimiters for record
    terminators and field delimiters are faster to
    process than multi-character delimiters.
  • Using INSERT/APPEND/ with either key lookup
    functions or joins (for surrogate key lookup) is
    a great method to load fact tables.

20
Capitalizing on merge
  • Merge
  • Enables you to perform updates to matched records
    and inserts of new records.
  • Leverages parallelism
  • Is a slick way of performing slowly changing
    dimension logic.

21
Capitalizing on merge
Customer Extract External
Customer Existing Table
Customer Post Merge
MERGE INTO CUSTOMERS USING ( CUST_EXT x) WHEN
MATCHED THEN UPDATE SET city x.city WHEN NOT
MATCHED THEN INSERT (CUST_ID) VALUES(x.cust_id)
New
Matched updated
22
Capitalizing on merge
Typical type 1 slowly changing dimension (SCD)
logic
Source Extract
Update Existing
New Changed Records
Compare
Generate Key
Master Dimension Cross Reference
Insert New
23
Capitalizing on merge for type 1 SCD
  • Capitalize on MERGE for Type 1 SCDs
  • Steps
  • Create your dimension table
  • Create an external table as your extract table
    that contains all of the columns in your
    dimension except the surrogate key
  • Create an Oracle sequence this will be used for
    the surrogate key of your dimension
  • Use MERGE to populate your dimension

24
Capitalizing on merge for type 1 SCD
  • Create our Dimension (Target) table

Note cust_key is our the surrogate key
cust_id is the natural key or production key
CREATE TABLE CUSTOMER_DIM ( CUST_KEY
NUMBER NOT NULL, CUST_ID NUMBER
NOT NULL, CUST_NAME VARCHAR2(20) NOT
NULL, ZIP CHAR(5) NOT NULL,
CITY VARCHAR2(30) NOT NULL,
STATE VARCHAR2(40) NULL, CONSTRAINT CUSTOMER_PK
PRIMARY KEY (CUST_KEY) )
25
Capitalizing on merge for type 1 SCD
  • Create our extract or staging table

2a. Running a new extract is simply a matter of
referencing a new file ALTER TABLE
customer_extract LOCATION (CIF_NOV_2003.dat)
CREATE TABLE CUSTOMER_EXTRACT ( CUST_KEY
NUMBER, CUST_ID
NUMBER, CUST_NAME VARCHAR2(20),
ZIP CHAR(5) , CITY
VARCHAR2(30) , STATE VARCHAR2(40)) ORGAN
IZATION EXTERNAL ( TYPE oracle_loader DEFAULT
DIRECTORY extracts_dir ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE CHARACTERSET
US7ASCII BADFILE log_file_dir'customer_extract.ba
d' LOGFILE log_file_dir'customer_extract.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( 'cust_extract.dat' )) REJECT LIMIT UNLIMITED
PARALLEL
26
Capitalizing on merge for type 1 SCD
  • Create our Sequence this will used for the
    surrogate key

CREATE SEQUENCE CUST_SEQ START WITH 1000
INCREMENT BY 1000
27
Capitalizing on merge for type 1 SCD
Capitalizing on merge
  • Use a single MERGE statement to perform our type
    1 SCD logic

Customer Extract External
Existing Customer Dimension
MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id) WHEN MATCHED
THEN UPDATE SET CUST_FIRST_NAME
X.CUST_FIRST_NAME, CUST_LAST_NAME
X.CUST_LAST_NAME,. CUST_EMAIL
X.CUST_EMAIL WHEN NOT MATCHED THEN INSERT
( CUST_KEY, CUST_ID,) VALUES( CUST_SEQ.NEXTVAL, X
.CUST_ID, X.CUST_EMAIL)
Updates
Matched updated
Customer Dimension Post Merge
New
New
28
Capitalizing on merge for type 2 SCD
  • Type 2 Slowly Changing Dimension (SCD)
  • Type 2 SCD a technique where a new dimension
    record is created with a new surrogate key each
    to reflect the change
  • We can do this quite simply a single merge
    statement simply add the change columns to the ON
    () portion of the merge.

MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id and c.cityx.city and
c.statex.state and c.zip x.zip) WHEN MATCHED
THEN UPDATE SET CUST_NAME X.CUST_NAME, City
X.CITY, STATEX.STATE, ZipX.ZIP. WHEN NOT
MATCHED THEN INSERT ( CUST_KEY, CUST_ID, CUST_NAME
, CITY, STATE, ZIP, CREATE_DATE) VALUES( CUST_SEQ
.NEXTVAL, / CUST_KEY / X.CUST_ID, X.CUST_NAME T
RUNC(SYSDATE)) / CREATE_DATE /
Change Columns
Change Pointer
Surrogate Key
29
Capitalizing on merge for type 2 SCD
  • A more common approach to Type 2 Slowly Changing
    Dimension logic is the addition of change
    pointers for reference data, Unfortunately, this
    requires a multi-step process.
  • This is generally performed with a series of
    insert and update statements or procedural
    logic but can be accomplished with two merge
    statements as well.
  • One to insert new records and update existing
    (close out)
  • One to insert new changed records

Surrogate Key
Change Pointers
30
Capitalizing on merge for type 2 SCD
1
2
/ First Merge Close out existing, Insert New /
MERGE INTO CUST_DIM C USING (SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city,
state, zip FROM CUST_DIM WHERE CURRENT_FLAG
'Y') X ON (C.CUST_ID X.CUST_ID AND
C.END_DATEto_date('15-JAN-2099','DD-MON-YYYY')) W
HEN MATCHED THEN UPDATE SET c.current_flag
'N' WHEN NOT MATCHED THEN INSERT
(CUST_KEY,CUST_ID,CUST_NAME, CITY,STATE,ZIP,START_
DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID, X.CUST_NAME, X.CITY,
X.STATE, X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2
099','DD-MON-YYYY'), 'Y') COMMIT
/ Second Merge Insert new changed record
/ MERGE INTO CUST_DIM C USING ( SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city, state, zip
FROM CUST_DIM) X ON (C.CUST_ID X.CUST_ID AND
C.CURRENT_FLAG'Y') WHEN MATCHED THEN UPDATE SET
c.END_DATEtrunc(SYSDATE -1) WHEN NOT MATCHED
THEN INSERT (CUST_KEY,CUST_ID,CUST_NAME, CITY,STAT
E,ZIP,START_DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID,X.CUST_NAME,X.CITY, X
.STATE,X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2099
','DD-MON-YYYY'), 'Y') COMMIT
3
/ Final step Date Closeout / UPDATE CUST_DIM
C SET c.END_DATEtrunc(SYSDATE -1) WHERE
C.CURRENT_FLAG 'N' AND c.END_DATETO_DATE(01-JA
N-2099','DD-MON-YYYY') commit
31
Capitalizing on merge for type 2 SCD
Customer Extract External
Take note of how a single table design decision
has impacted our ability to parallelize. Instead
of a single merge statement we now have a three
step process.
Customer Existing Dimension
Customer Post Type 2 Merge (s)
32
Capitalizing on multiple inserts
  • Multiple Inserts
  • Enables us to conditionally insert into multiple
    tables in parallel.
  • All result set is applied to all conditions
  • First result set is applied to the first
    condition
  • Leverages parallelism
  • Is a slick way to segment data and load fact
    tables.

33
Capitalizing on multiple inserts
GA_SALES
  • Multiple Inserts (ALL)

INSERT ALL WHEN state'GA or state FL THEN
INTO GA_SALES VALUES(prod_id, cust_id,sale_date,s
ale_amount qty_sold) WHEN state 'FL' THEN INTO
FL_SALES VALUES(prod_id, cust_id,sale_date,sale_a
mount qty_sold) ELSE INTO ALL_OTHER_SALES
VALUES(prod_id, cust_id,sale_date,sale_amount
qty_sold) SELECT prod_id, cust_id,sale_date,sale_a
mount qty_sold FROM sales_extract
FL_SALES
ALL_OTHER_SALES
Query
34
Capitalizing on multiple inserts
GA_SALES
  • Multiple Inserts (FIRST)

INSERT FIRST WHEN state 'GA OR state FL
THEN INTO GA_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) WHEN
state 'FL' THEN INTO FL_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) ELSE INTO
ALL_OTHER_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) SELECT
prod_id, cust_id,sale_date,sale_amount qty_sold
FROM sales_extract
FL_SALES
ALL_OTHER_SALES
35
Capitalizing on - Partitioning
  • Types of Partitioning
  • Range - maps rows to partitions based on ranges
    of column values
  • List (New) - enables you to explicitly control
    how rows map to partitions.
  • Hash - evenly distributes rows among partitions
  • Composite
  • Range-Hash benefits of range partitioning then
    further hash distributing the sub-partition.
  • Partition pruning Parallel processing
  • Range List (New for 9i) benefits of range
    partitioning and further discrete
    sub-partitioning.

36
Capitalizing on - Partitioning
  • Why is partitioning important
  • partitioning enables you to split large volumes
    of data into smaller separate buckets that can be
    managed independently
  • Partition Pruning / Elimination
  • Partition-wise Joins
  • Parallel DML
  • Partition Exchanging / Swapping

37
Capitalizing on Oracle 9i partitioning
Jan 2003
  • Why Partition?
  • Partition pruning
  • Ability to eliminate partitions that dont
    satisfy query conditions

Feb 2003
Mar 2003

SELECT sum(qty_sold) FROM sales WHERE sale_date
BETWEEN Feb 1, 2003 and Feb 15, 2003
38
Capitalizing on Oracle 9i partitioning
  • Why Partition?
  • Partition-wise joins
  • Full Equi-partitioned on the join keys i.e. the
    two tables are both partitioned on the same key.
    Hash-Hash is the easiest example.
  • Partial Oracle dynamically repartitions based
    on the reference table.

Here accounts and transactions are both hash
partitioned by account_id into 32 partitions
Note to achieve equal work distribution, the
number of partitions should always be a multiple
of the degree of parallelism. Ex. Here we hashed
account and transaction into 32 partitions with a
degree of parallelism 8
39
Capitalizing on Oracle 9i partitioning
ACCT ID NAME
ACCT ID Price QTY
Partition-wise Join
40
Capitalizing on - Partitioning
PQ1
Partition Exchanging
Stock Extracts
PQ2
PQ3
ALTER TABLE stock_sales EXCHANGE PARTITION PQ3
WITH stock_extracts Does not work with
external tables
41
Capitalizing on - Partitioning
Range Example
CREATE TABLE account_balance_range( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY RANGE
(snapshot_date)( PARTITION Q1_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION Q2_ACCT_BAL VALUES LESS THAN
(TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION
Q3_ACCT_BAL VALUES LESS THAN (TO_DATE('01-OCT-200
0','DD-MON-YYYY')), PARTITION Q4_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')))
42
Slick New Feature - Partitioning
List Example
CREATE TABLE account_balance_list( account_key NUM
BER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY LIST
(state_key)( PARTITION northwest VALUES ('OR',
'WA'), PARTITION southwest VALUES ('AZ', 'UT',
'NM'), PARTITION southeast VALUES
('FL','GA','SC','AL','TN','NC'), PARTITION rest
VALUES (DEFAULT)))
43
Capitalizing on - Partitioning
Hash Example
CREATE TABLE account_balance_hash ( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL ) PARTITION BY HASH
(account_key)( PARTITIONS 16 STORE IN (TS1_DATA,
TS2_DATA, TS3_DATA, TS4_DATA)
44
Capitalizing on - Partitioning
Composite Range-Hash Example
  • CREATE TABLE account_bal_range_hash(
  • account_key NUMBER(7) CONSTRAINT acct_nn NOT
    NULL,
  • branch_key NUMBER(7) CONSTRAINT brch_nn NOT
    NULL,
  • product_key NUMBER(7) CONSTRAINT prod_nn NOT
    NULL,
  • snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
  • state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
  • ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
    NULL,
  • average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
    NOT NULL,
  • transaction_count NUMBER(7) CONSTRAINT txnc_nn
    NOT NULL,
  • interest_paid NUMBER(7,3) CONSTRAINT intp_nn
    NOT NULL,
  • fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
    NULL)
  • PARTITION BY RANGE (snapshot_date)
  • SUBPARTITION BY HASH (account_key)
  • SUBPARTITIONS 8
  • ( PARTITION Q1_ACCT_BAL
  • VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-Y
    YYY')),
  • PARTITION Q2_ACCT_BAL
  • VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-Y
    YYY')),
  • PARTITION Q3_ACCT_BAL

45
Capitalizing on - Partitioning
Composite Range-List Example
  • CREATE TABLE account_bal_range_list(
  • account_key NUMBER(7) CONSTRAINT acct_nn NOT
    NULL,
  • branch_key NUMBER(7) CONSTRAINT brch_nn NOT
    NULL,
  • product_key NUMBER(7) CONSTRAINT prod_nn NOT
    NULL,
  • snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
  • state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
  • ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
    NULL,
  • average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
    NOT NULL,
  • transaction_count NUMBER(7) CONSTRAINT txnc_nn
    NOT NULL,
  • interest_paid NUMBER(7,3) CONSTRAINT intp_nn
    NOT NULL,
  • fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
    NULL)
  • PARTITION BY RANGE (snapshot_date)
  • SUBPARTITION BY LIST (state)
  • SUBPARTITION TEMPLATE(
  • PARTITION northwest VALUES ('OR', 'WA'),
  • PARTITION southwest VALUES VALUES ('AZ', 'UT',
    'NM'),
  • PARTITION southeast VALUES ('FL',
    'GA','SC','AL','TN','NC'),
  • PARTITION rest VALUES (DEFAULT)))(
  • PARTITION q1_2002 VALUES LESS THAN(TO_DATE('1-APR-
    2002','DD-MON-YYYY')),

46
Capitalizing on - Partitioning
External Table to Partition using INSERT
/APPEND/
File(s)
Target
Source
  • FTP 02Nov2003_Sales Extract
  • Alter Table add file(s) to location
  • Perform INSERT /Append/ INTO partitioned table

47
Capitalizing on Partitioning
PQ1
Sales Extract External
PQ2
INSERT/APPEND/ INTO SALES (PQ3) SELECT FROM
SALES_EXTRACT_EXTERNAL
PQ3
48
Capitalizing on Partitioning
  • What to partition
  • Fact Tables
  • Generally Range-Hash composite
  • Range for some date (partition elimination)
  • Hash on the driving dimension key (partition-wise
    join)
  • Dimension
  • Hashing on the primary key of dimension tables
    facilitates full and partial partition-wise
    joins.
  • To for a full partition-wise join between a fact
    and dimension table you need to hash partition on
    the same key in the same number of buckets.
  • Materialized Views
  • Generally Range-Hash composite
  • Generally mirror the fact tables partition scheme

49
Capitalizing on Materialized Views
  • Materialized Views
  • Enable queries to be re-written to take advantage
    of pre-calculated summaries thus reducing or
    eliminating sorts and joins.
  • Materialized Views can dramatically increase
    performance of queries when applied judiciously
  • Reduce the number of sorts
  • Reduce the number of joins
  • Pre-filters data
  • Can be Indexed and Partitioned
  • Seamless to end users
  • Enhancements in 9i include
  • Removed restrictions enabling them to be
    leveraged in more situations
  • Fast refresh is now possible on a materialized
    views containing the UNION ALL operator.

50
Capitalizing on Materialized Views
  • Materialized Views cont

SELECT T.FISCAL_QUARTER_DESC, C.CUST_CITY,
COUNT() AS SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY
Query is re-written to be resolved from the MV
instead of from the base tables
51
Capitalizing on Materialized Views
  • How to capitalize on MVs
  • Identify candidate queries
  • Analysis of common queries based on design
  • Oracles Summary Advisor Wizard
  • DBMS_OLAP
  • Create MVs based on analysis and refresh
    requirements.
  • Test
  • Benchmark
  • Measure Utilization
  • Repeat

52
Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
real time 9.62 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost5392 Card7453
Bytes350291) 1 0 SORT (GROUP BY)
(Cost5392 Card7453 Bytes350291) 2 1
HASH JOIN (Cost1077 Card1016271
Bytes47764737) 3 2 TABLE ACCESS
(FULL) OF 'TIMES' (Cost6 Card1461 Bytes23376)
4 2 HASH JOIN (Cost1043 Card1016271
Bytes31504401) 5 4 TABLE ACCESS
(FULL) OF 'CUSTOMERS' (Cost106 Card50000
Bytes700000) 6 4 PARTITION RANGE
(ALL) 7 6 TABLE ACCESS (FULL) OF
'SALES' (Cost469 Card1016271 Bytes17276607) Sta
tistics ------------------------------------------
---------------- 5855 consistent gets 1 sorts
(memory) 0 sorts (disk) 5075 rows processed
Note the Number of Joins and Sorts, the amount of
memory, and the number of full table scans.
53
Capitalizing on Materialized Views
proc sql connect to ORACLE as ORACON (usersh
passwordsh1 pathdemo.na.sas.com') / Create a
Materialized View / execute ( CREATE
MATERIALIZED VIEW QTRLY_CITY_SALES_MV compress BUI
LD IMMEDIATE REFRESH ON COMMIT ENABLE QUERY
REWRITE AS SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() SALE_COUNT,
SUM(S.AMOUNT_SOLD) SALE_DOLLARS FROM CUSTOMERS
C, SALES S, TIMES T WHERE C.CUST_ID S.CUST_ID
AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY ) by
ORACON disconnect from ORACON quit
Build Immediate Create this Now Refresh on
Commit Keep the MV current when Insert, Update,
and Deletes occur Enable Query Rewrite Enables
dynamic query re-direction. Compress
compresses redundant data i.e. makes the MV
smaller
Question did it improve the performance of our
query?
54
Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
Real Time 0.76 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost4 Card2206
Bytes114712) 1 0 TABLE ACCESS (FULL) OF
'QTRLY_CITY_SALES_MV' (Cost4 Card2206
Bytes114712) Statistics -------------------------
--------------------------------- 7
recursive calls 0 db block gets
366 consistent gets 0 physical
reads 0 redo size 137260 bytes
sent via SQLNet to client 2641 bytes
received via SQLNet from client 340
SQLNet roundtrips to/from client 0
sorts (memory) 0 sorts (disk)
5075 rows processed
No Joins, No Sorts, Less Memory
55
Capitalizing on Alt aggregate strategies
  • Aggregate building with pCTAS

CREATE TABLE quartly_city_passthrough PARALLEL
NOLOGGING as SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() AS SALE_COUNT,
SUM(S.AMOUNT_SOLD) AS SALE_DOLLARS FROM
CUSTOMERS C, SALES S, TIMES T WHERE
C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
56
Capitalizing on Alt aggregate strategies
  • Aggregate building with pIIAS

INSERT / APPEND / INTO quartly_city_iias SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM CUSTOMERS C, SALES S, TIMES
T WHERE C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
57
Capitalizing on bitmap join index
  • Bitmap Join Index
  • Creates a bitmap index for the resolution of
    joins of two or more tables. Works similar to a
    materialized view.
  • Bitmap join index is space efficient because it
    compresses the rowids where a materialized view
    does not.
  • Can be leveraged to improve performance of
    snowflake schemas and common join operations
    across facts.

58
Capitalizing on bitmap join index
SELECT countries.country_name, sum(sales.amount_so
ld) FROM sales, customers, countries WHERE
sales.cust_id customers.cust_id AND
customers.country_id countries.country_id
CREATE BITMAP INDEX bjx_sales_country ON
sales(countries.country_name)) FROM sales,
customers, countries WHERE sales.cust_id
customers.cust_id AND countries.country_id
customers.country_id LOCAL PARALLEL NOLOGGING
COMPUTE STATISTICS
Can be leveraged to improve performance of
snowflake query problems
59
Capitalizing on SQL for Analysis
  • SQL for Analysis
  • Especially useful for reporting and preparing
    data sets for statistical analysis
  • Rankings and percentiles
  • cumulative distributions, percent rank, and
    N-tiles.
  • Moving window calculations
  • allow you to find moving and cumulative
    aggregations, such as sums and averages.
  • Lag/lead analysis
  • enables direct inter-row references so you can
    calculate period-to-period changes.
  • First/last analysis
  • first or last value in an ordered group.

60
Capitalizing on SQL for Analysis
  • RANK
  • RANK ( ) OVER ( query_partition_clause
    order_by_clause )
  • DENSE_RANK ( ) OVER ( query_partition_clause
    order_by_clause )

SELECT country_id, TO_CHAR(SUM(amount_sold),
'9,999,999,999') Sales_Total, RANK() OVER (ORDER
BY SUM(amount_sold) DESC NULLS LAST) AS
sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id
AND times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id
CO SALES_TOTAL SALES_LEADER -- --------------
------------ US 13,333,510 1 NL
7,174,053 2 UK 6,421,240
3 DE 6,346,440 4 FR
4,404,921 5 ES 1,699,209
6 IE 1,549,407 7 IN
732,502 8 AU 632,475
9 BR 606,281 10
61
Capitalizing on SQL for Analysis
  • Rank for Top N

SELECT FROM (SELECT country_id, TO_CHAR(SUM(amou
nt_sold), '9,999,999,999') Sales_Total, RANK()
OVER (ORDER BY SUM(amount_sold) DESC NULLS LAST)
AS sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id AND
times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id ) / inline view
/ WHERE COUNTRY_RANK lt 10
CO SEPT_TOTAL COUNTRY_RANK -- --------------
------------ US 6,517,786 1 NL
3,447,121 2 UK 3,207,243
3 DE 3,194,765 4 FR
2,125,572 5 ES 777,453
6 IE 770,758 7 IN
371,198 8 BR 317,001
9 AU 302,393 10
62
Capitalizing on SQL for Analysis
  • Moving window Example running total

SELECT c.country_id AS CO, t.calendar_quarter_desc
AS QUARTER, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS Q_SALES, TO_CHAR(SUM(SUM(amoun
t_sold)) OVER (PARTITION BY c.country_id ORDER BY
c.country_id, t.calendar_quarter_desc ROWS
UNBOUNDED PRECEDING), '9,999,999,999') AS
RUNNING_TOTAL FROM sales s, times t, customers
c WHERE s.time_idt.time_id AND s.cust_idc.cust_i
d AND t.calendar_year2000 GROUP BY c.country_id,
t.calendar_quarter_desc ORDER BY c.country_id,
t.calendar_quarter_desc
CO QUARTER Q_SALES RUNNING_TOTAL US 2000-Q1
21,719,528 21,719,528 US 2000-Q2 21,915,534
43,635,062 US 2000-Q3 18,857,276 62,492,338 US
2000-Q4 14,970,316 77,462,654
63
Capitalizing on SQL for Analysis
Moving window Example moving average
SELECT c.country_id AS CO, t.calendar_month_desc
AS CAL, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS SALES , TO_CHAR(AVG(SUM(amount
_sold)) OVER (ORDER BY c.country_id,
t.calendar_month_desc ROWS 2 PRECEDING),'9,999,999
,999') AS MOVING_3_MONTH FROM sales s, times t,
customers c WHERE s.time_idt.time_id AND
s.cust_idc.cust_id AND t.calendar_year2000
GROUP BY c.country_id, t.calendar_month_desc ORDE
R BY c.country_id, t.calendar_month_desc
CO CALENDAR SALES MOVING_3_MONTH --
-------- -------------- -------------- AR 2000-01
172,380 172,380 AR 2000-02
140,906 156,643 AR 2000-03 142,581
151,956 AR 2000-04 169,727
151,071 AR 2000-05 157,016
156,441 AR 2000-06 155,675 160,806
64
Capitalizing on SQL for Analysis
  • LAG / LEAD

TIME_ID SALES LAG1 LAG2
LEAD1 LEAD2 --------- ---------- ----------
---------- ---------- ---------- 01-JAN-00
869,132 909,726
896,626 02-JAN-00 909,726 869,132
896,626 895,204 03-JAN-00 896,626
909,726 869,132 895,204
954,066 04-JAN-00 895,204 896,626
909,726 954,066 918,154 05-JAN-00
954,066 895,204 896,626 918,154
895,849 06-JAN-00 918,154 954,066
895,204 895,849 889,525
SELECT time_id, TO_CHAR(SUM(amount_sold),'9,999,9
99') AS SALES, TO_CHAR(LAG(SUM(amount_sold),1)
OVER (ORDER BY time_id),'9,999,999') AS
LAG1, TO_CHAR(LAG(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS
LAG2, TO_CHAR(LEAD(SUM(amount_sold),1) OVER
(ORDER BY time_id),'9,999,999') AS
LEAD1, TO_CHAR(LEAD(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS LEAD2 FROM
sales WHERE time_id between TO_DATE('01-JAN-2000')
AND TO_DATE('31-JAN-2000') GROUP BY time_id
65
Capitalizing on SQL for Analysis
  • FIRST/LAST lets you order on column A but return
    an result of an aggregate applied on column B.
  • List price of the product with the lowest
    minimum price LPLO_MINP
  • Lowest minimum price LO_MINP
  • List price of the product with the highest
    minimum price LPHI_MINP
  • Highest minimum price HI_MINP

SELECT prod_subcategory, MIN(prod_list_price)
KEEP (DENSE_RANK FIRST ORDER BY (prod_min_price))
AS LPLO_MINP, MIN(prod_min_price) AS LO_MINP,
MAX(prod_list_price) KEEP (DENSE_RANK LAST ORDER
BY (prod_min_price))AS LPHI_MINP, MAX(prod_min_pri
ce) AS HI_MINP FROM products WHERE
prod_category'Women' GROUP BY prod_subcategory
PROD_SUBCATEGORY LPLO_MINP LO_MINP
LPHI_MINP HI_MINP -------------------------
------------- ---------- -------------
---------- Dresses - Women 44
31.28 189 165 Easy Shapes -
Women 51 39.47 149
127.39 Knit Outfits - Women 38
17.78 138 95.63 Outerwear - Women
58 27.14 198
131.87 Shirts And Jackets - Women 19
13.68 162 145.8
66
Other Important New Features
  • Table Compression
  • Multiple Block Sizes
  • RAC for warehousing

67
Capitalizing on Table Compression
  • Table Compression
  • Can improve performance by reducing both disk and
    memory (buffer cache) requirements.
  • Note tables with large amounts of DML operations
    are not good candidates for compression
  • Ideal candidates are partitioned fact tables,
    materialized views with rollups, and tables with
    a high degree of redundant data
  • Regular tables
  • create table () compress
  • alter table compress
  • Partitioned tables
  • Can compress the entire table or on a partition
    by partition basis
  • Create table () compress partition by ()
  • PARTITION p1 VALUES (FL', GA') COMPRESS
  • Materialized Views
  • CREATE MATERIALIZED VIEW QTRLY_SALES_MV COMPRESS
  • Alter materialized view compress

68
Capitalizing on Multiple Block Sizes
  • Multiple Block Size Capitalization
  • When you need to run a mix of OLTP activity and
    DSS activity within the same instance
  • When you have an OLTP system with a smaller block
    sizes and using transportable tablespaces to move
    tables to a decision support system.
  • Place small static dimensions in a smaller block
    cache (4K or 8K) and larger dimensions and facts
    in a large block cache (16K)

69
Capitalizing on RAC
Node 1
Node 2
Node 3
Node N
High Speed Interconnect
Storage Area Network (SAN)
70
Capitalizing on RAC
  • RAC provides both speedup and scale up
  • Theoretically to double performance simply double
    the number of nodes.
  • Limiting traffic over the interconnect is key to
    performance.
  • Parallel Loading
  • Multiple SQLLoader sessions
  • Collocated extracts
  • Querying
  • Partition key choices
  • Data model design choices
  • Define join dependencies
  • Partition wise joins are key to limiting the
    traffic over the interconnect.
  • match partitions so that they are collocated on
    the same node
  • Oracles automatic node affinity improves
    performance of DML operations by routing DML
    operations to the node that has affinity for the
    partition.

71
Conclusion
  • Parallelism is key to performance of DSS
    applications
  • Design is the limiting factor to parallelism and
    performance
  • Oracle has some slick new features that enhance
    and simplify common warehouse operations.
  • The future direction of data warehousing
  • Better performance
  • Increased parallelism reduced dependencies
  • Loads
  • Queries
  • Reduced complexity higher user satisfaction
  • Better design paradigms and ideologies
  • Enhanced features
  • Increased usability
  • Increased ability to capitalize

72
Next Steps.
  • Recommended sessions
  • Optimal Usage of Oracle's Partitioning Option
  • Oracle9i The Features They Didn't Tell You About
  • Advanced PL/SQL and Oracle9i ETL
  • Oracle 9i RAC Concepts and Implementation - A
    Practical Perspective

73
A
74
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com