Oracle Database 10g for Data Warehousing - PowerPoint PPT Presentation

About This Presentation
Title:

Oracle Database 10g for Data Warehousing

Description:

Composite Range-List enables logical sub-partitioning for the most commonly used ... RANK(400000) WITHIN GROUP (ORDER BY Price DESC) hypo_rank. FROM Homes ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 37
Provided by: dwprodu
Learn more at: http://www.nocoug.org
Category:

less

Transcript and Presenter's Notes

Title: Oracle Database 10g for Data Warehousing


1
(No Transcript)
2
Recent Query Processing Enhancements NoCOUG
Conference February 19th, 2004 George Lumpkin
3
Query processing
  • For the purposes of this presentation, query
    processing includes
  • The underlying database objects which are being
    accessed (table, indexes, etc)
  • The SQL functions and capabilities used to access
    those database objects
  • The internal algorithms for executing SQL
    statements (table scans, index probes, joins,
    etc)
  • The optimization techniques applied to SQL
    statements
  • The capabilities to view and understand Oracles
    query processing
  • Improved performance is the primary benefit of
    enhanced query processing

4
Database Objects
5
Database objects
  • Oracle9i
  • Table Compression
  • List and Range-List Partitioning
  • Bitmap join index
  • IOTs hash partitiong, parallel DML, bitmap
    indexes
  • Datetime datatype
  • Oracle10g
  • Floating point datatype
  • Global hash-partitioned indexes
  • Datetime improvements

6
Table Compression (Oracle9i, Release 2)
  • Tables can be compressed
  • Compression can also be specified at the
    partition level
  • Indexes are not compressed
  • Typical compression ratios range from 31 to 51
  • Compression is dependent upon the actual data
  • Compression algorithm based on removing data
    redundancy
  • Key benefit is cost savings
  • Save TBs of storage without compromising
    performance or functionality
  • However, a secondary benefit is often performance
    due to reduced IO utilization

7
List and Composite Range-List Partitioning
  • List partitioning allows a table to be
    partitioned with a list of values
  • For example, a table can be partitioned by region
    or by department
  • Composite Range-List enables logical
    sub-partitioning for the most commonly used Range
    partitioning
  • Further flexibility in how a DBA can manage large
    data sets
  • Provide appropriate partitioning techniques for
    all business requirements

8
Composite Range-List Partitioning
Range partition across time
List partition across another major attribute
...
9
Implementation and Usage Tips List and Composite
List-Range Partitioning
  • Consider LIST (sub)partitioning when
  • You have a column containing unordered values,
    which correspond to a logical unit for data
    maintenance and query access
  • Use a DEFAULT list partition when
  • You may have unexpected values for the
    partitioning key
  • You often add or modify values in your
    partitioning key.
  • Migration of existing nonpartitioned and
    partitioned tables
  • For online migration, use the dbms_redefinition
    package
  • For offline creation, use
  • CREATE TABLE AS SELECT
  • INSERT / APPEND /

Performance Benefits
Manageability Benefits
10
IEEE Floating Point
  • New datatypes binary_float and binary_double
  • Precise mapping to Java and other application
    environments
  • Potential space reduction
  • 4/8 bytes fixed vs. up to 21 bytes variable for
    Oracle number
  • Increased range of values
  • Binary doubles 11-bit exponent
  • Performance improvement
  • Native hardware vs. proprietary software for
    calculations
  • Caveat binary numbers are subject to rounding
    effects and are never suitable for data requiring
    precision

11
IEEE Floating Point
  • Biggest potential benefit for BI Improved
    performance for lengthy/complex arithmetical
    expressions
  • Example query with 20X performance gains
  • Second biggest potential benefit Space savings
    for lengthy numeric types

select promotion_name, exp
(geo_mean_temp/count) as geometric_mean from (
select p.promo_name as promotion_name,  
sum (ln (quantity_sold)) as geo_mean_temp  
from sales s, promotions p   where
p.promo_id s.promo_id   and time_id
between to_date ('01-jan-1998', 'dd-mon-yyyy')  
and to_date ('31-dec-1998','dd-mon-yyyy
') and quantity_sold gt 0 
group by promo_name) order by geometric_mean
desc
12
SQL Functions
13
SQL Functions
  • Oracle9i
  • ANSI Joins
  • Full outer joins
  • CASE statement
  • Grouping sets
  • WITH clause
  • Oracle10g
  • Enhanced connect by
  • Partition Outer Join
  • Regular Expressions
  • SQL Models
  • Statistical functions
  • Frequent Itemsets

14
WITH clause
  • Useful when a given query accesses the same
    subquery multiple times
  • WITH channel_summary AS
  • ( SELECT channels.channel_desc,
  • SUM(amount_sold) AS channel_total
  • FROM sales, channels
  • WHERE sales.channel_id channels.channel_id
  • GROUP BY channels.channel_desc )
  • SELECT channel_desc, channel_total
  • FROM channel_summary
  • WHERE channel_total gt
  • ( SELECT SUM(channel_total) 1/3 FROM
    channel_summary)

15
Partitioned Outer Join
  • New outer join syntax enabling easy specification
    and high performance for joins that "densify"
    sparse data.
  • To specify comparison calculations and to format
    reports reliably, best to return a consistent set
    of dimension members in query results
  • Yet data normally stored in "sparse" form why
    waste space storing non-occurrence?
  • Ugly and slow SQL needed to add back rows for
    nonexistent cases into query output.
  • Most frequently used to replace missing values
    along time dimension.
  • Accepted for ANSI SQL standard.

16
Partitioned Outer Join - Basics
Inventory Tabletime_id product quant
Inventory table holds only changed values. But
for calculations reporting, we want rows for
the full set of dates.
1 April 2003 Bottle 10 6
April 2003 Bottle 8 1 April 2003
Can 15 4 April 2003
Can 11
1 April 2003 Bottle 10 2
April 2003 Bottle 3 April 2003
Bottle 4 April 2003 Bottle
5 April 2003 Bottle 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
3 April 2003 Can 4
April 2003 Can 11 5 April 2003
Can 6 April 2003 Can

SELECT times.time_id, product, quant FROM
inventory PARTITION BY (product) RIGHT
OUTER JOIN times ON (times.time_idinventory.ti
me_id)
  • Similar to a regular outer join, except the
  • outer join is applied to each partition.

17
Partition Outer Join Repeating Values
  • The last non-null values should be preserved for
    subsequent records (typical inventory problem)
  • New analytical SQL keyword for LAST_VALUE()

1 April 2003 Bottle 10 2
April 2003 Bottle 10 3 April 2003
Bottle 10 4 April 2003 Bottle
10 5 April 2003 Bottle 10 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
15 3 April 2003 Can 15 4
April 2003 Can 11 5 April 2003
Can 11 6 April 2003 Can
11
SELECT time_id, product, LAST_VALUE (quant
IGNORE NULLS) OVER (PARTITION BY product
ORDER BY time_id) quant FROM ( SELECT
times.time_id, product, quant FROM inventory
PARTITION BY (product) RIGHT OUTER JOIN
times ON (times.time_idinventory.time_id) )
18
Row Sources
19
Row sources
  • Oracle9i
  • Sampling
  • Index skip scans
  • Oracle10g
  • Table scan speed-up
  • Inline Lob access speedup

20
Index Skip Scan
  • In Oracle8i, composite index used only if first
    (prefix) column in the predicate
  • In Oracle9i, skip scan uses the composite index
    that is far faster than a Full Table Scan
  • No need for another index
  • Especially useful if the number of distinct
    values of prefix column are relatively low

21
Index Skip Scan
  • Business Scenario Department of Motorized
    Vehicles
  • A car is uniquely identified by State and
    registration ID
  • Unique index on (STATE, REGISTRATION)
  • Query Find the details of a registration ID when
    the State is not known
  • Index skip scan allows composite index to be used
    for this query
  • Can be many times faster than not using an index
  • Before index skip scans,
  • Bad performance because of lack of index
  • Or, extra cost, maintenance to create index on
    (registration)

22
Optimizer
23
Optimizer
  • Oracle9i
  • Dynamic Sampling
  • Bind Peeking
  • Index Joins
  • Oracle10g
  • Automatic SQL Tuning

24
Bind peeking
  • In the first invocation of a cursor containing
    bind variables, the optimizer will peek at the
    bind values and use those values to optimize the
    query
  • The query plan will remain cached, and will be
    re-used for future invocations
  • The bind variables in the first invocation should
    thus be representative values

25
Optimizer Dynamic SamplingOracle9i Rel 2
  • Problem optimizer statistics may be missing or
    known to be inaccurate
  • Solution statistics are dynamically gathered
    during query optimization
  • Table predicate selectivity and cardinality
  • Sampling is used to minimize the time required to
    gather statistics
  • Statistics are only gathered for queries which
    are expected to take a long time (relative to the
    cost of gathering stats)

26
Optimizer Dynamic Sampling
  • Settings for OPTIMIZER_DYNAMIC_SAMPLING
    parameter
  • 0 -- Off.
  • 1 Used for multi-table queries for tables w/o
    both statistics and indexes. Little overhead
    since you will have to do a full scan anyway.
    This is the default in 9iR2.
  • 2 -- Used for any unanalyzed object. This is the
    default in 10g where we have automated stats
    collection, but users may still have volatile
    objects without stats. This is the default in 10g
  • 3 Used when the optimizer has to use a guess,
    e.g., to_number(c1) gt 10.
  • 4 -- Used if correlations could be present, e.g.,
    ANDed or ORed conditions on the same table.

27
Diagnostics
28
Diagnostics
  • Oracle9i
  • Query execution statistics
  • Enhanced SQL trace information
  • Enhanced explain plan output (DBMS_XPLAN)
  • Oracle10g
  • Automatic workload repository and automated
    diagnosis with ADDM
  • Self-tuning SQL optimization
  • Parallel Execution Enhancements No Slave SQL

29
Query Execution Statistics
  • Oracle9i introduces new dynamic views for a
    deeper insight into SQL Execution
  • VSQL_PLAN_
  • Execution plans of all cursors in the shared SQL
    area
  • Cursor runtime statistics can be collected with
    STATISTICS_LEVELALL
  • VSQL_WORKAREA_
  • Detailed information about the memory usage for
    all running SQL statements down to a row source
    level
  • Activated when PGA_AGGREGATE_TARGET ltgt 0

30
VSQL_PLAN
  • VSQL_PLAN equivalent to PLAN_TABLE
  • Shows actual used plan
  • SQL shown to select plan is simplified
  • In Oracle10g, you can use DBMS_XPLAN

SQLgt select / TRACK_ME / e.ename, d.dname
from scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, cost, bytes from
vsql_plan where hash_value (select
hash_value from vsql where sql_text like
'TRACK_ME'
and sql_text not like 'NOT_ME') order
by 1 ID OPERATION OBJECT_NAME
COST BYTES ---------- ----------------------
-------- --------------------- ----------
---------- 0 SELECT STATEMENT 5 1
HASH JOIN 4 588 2 TABLE ACCESS
DEPT 2 88 3 TABLE ACCESS
EMP 2 280
31
VSQL_PLAN_STATISTICS_ALL
  • VSQL_PLAN_STATISTICS shows actual cursor
    execution statistics (overhead, not enabled by
    default)
  • SQL shown to select plan is simplified
  • In Oracle10g, you can use DBMS_XPLAN

SQLgt alter session set statistics_levelALL SQLgt
select / TRACK_ME / e.ename, d.dname from
scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, last_output_rows "ROWS",
last_cr_buffer_gets CR", last_disk_reads
PR", last_elapsed_time "TIME us" from
vsql_plan_statistics_all where hash_value
(select hash_value from vsql where sql_text
like 'TRACK_ME' and sql_text not like
'NOT_ME') order by 1 ID OPERATION
OBJECT_NAME ROWS CR PR TIME us --
------------ ----------------------- ---------
--------- -------- ---------- 1 HASH JOIN
14 7 0 2685 2 TABLE ACCESS DEPT
4 3 0 377 3 TABLE ACCESS EMP 14
4 0 426
32
Enhanced SQL Trace
  • By Default, SQL trace includes some runtime
    statistics for a SQL statement

PARSING IN CURSOR 11 select e.ename, d.dname
from emp e, dept d where e.deptnod.deptno END OF
STMT PARSE 11c20000,e13838,p0,cr8,cu0,mis1
,r0,dep0,og1,tim6389414660329 EXEC
11c0,e102,p0,cr0,cu0,mis0,r0,dep0,og1,t
im6389414660694 FETCH 11c0,e2132,p0,cr6,cu
0,mis0,r1,dep0,og1,tim6389414663127 FETCH
11c0,e1329,p0,cr1,cu0,mis0,r13,dep0,og1
,tim6389414665803 XCTEND rlbk0, rd_only1 STAT
11 id1 cnt14 pid0 op'HASH JOIN (cr7 pr0
pw0 time3357 us)' STAT 11 id2 cnt4 pid1
op'TABLE ACCESS FULL DEPT (cr3 pr0 pw0
time525 us)' STAT 11 id3 cnt14 pid1
op'TABLE ACCESS FULL EMP (cr4 pr0 pw0
time380 us)'
33
Workload Repository
  • Data Warehouse of the Database
  • Facility to collect, process, and maintain
    important RDBMS statistics and workload SQL
    workload, segment statistics, time wait
    statistics, metrics, feature usage
  • Efficiently sample and compute statistics in
    memory
  • Periodically flush coarser-grain information to
    disk - in a self-managed tablespace
  • Information readily available real-time
    accessible when needed
  • On by default - flush to disk every 30 min, keep
    for 7 days

34
Automatic Database Diagnostic Monitor (ADDM)
  • Performance expert in a box
  • Provides database-wide performance diagnostic,
    including RAC
  • Automatically runs at every capture of workload
    repository snapshot
  • Provides impact and benefit analysis
  • Provides Information vs. raw data
  • Integrates all components using the Time Model

Application SQL Management
Storage Management
System Resource Management
Space Management
Backup Recovery Management
Database Management
Intelligent Infrastructure
35
Self-Tuning SQL OptimizationOracle10g
  • Internally analyze most expensive queries, as
    determined by Workload Repository
  • Use sampling, including join sampling, to
    determine cardinalities of intermediate results
  • When optimizer errors are found, store corrective
    information
  • Apply corrections next time query is executed
  • Related techniques for truly ad hoc queries

36
Oracle10g - No Slave SQL
  • In Oracle9i, parallel execution plans were
    complex
  • Difficult to read/understand due to multiple
    cursors
  • Difficult to analyze statement-level performance
    information

SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno -----------------
-------------------------------------------------
---------------- Id Operation
Name TQ IN-OUT PQ Distribution
-------------------------------------------------
---------------------------------- 0 SELECT
STATEMENT
1 HASH JOIN
88,01 P-gtS QC (RAND)
2 TABLE ACCESS FULL EMP
88,01 PCWP 3 TABLE
ACCESS FULL DEPT 88,00 P-gtP
BROADCAST --------------------------------
--------------------------------------------------
- PX Slave SQL Information (identified by
operation id) -----------------------------------
------------------- 1 - SELECT / ORDERED
NO_EXPAND USE_HASH(A2) / A1.C1,A2.C1
FROM (SELECT / NO_EXPAND ROWID(A3) /
A3."DEPTNO" C0,A3."ENAME" C1 FROM
"EMP" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A3
) A1,Q288000 A2 WHERE A1.C0A2.C0 3 -
SELECT / NO_EXPAND ROWID(A1) / A1."DEPTNO"
C0,A1."DNAME" C1 FROM "DEPT"
PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
37
Oracle10g - No Slave SQL
  • Oracle Database 10g single execution plan,
    single cursor

SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno ----------------
------------------------------..
-------------------------------- Id
Operation Name TQ
IN-OUT PQ Distrib --------------------------
---------------------..---------------------------
----- 0 SELECT STATEMENT
1 PX
COORDINATOR
2 PX SEND QC (RANDOM)
TQ10001 Q1,01 P-gtS QC (RAND)
3 HASH JOIN
Q1,01 PCWP 4 PX
BLOCK ITERATOR Q1,01
PCWC 5 TABLE ACCESS FULL
EMP Q1,01 PCWP
6 BUFFER SORT
Q1,01 PCWC 7 PX
RECEIVE Q1,01 PCWP
8 PX SEND BROADCAST
TQ10000 Q1,00 P-gtP BROADCAST
9 PX BLOCK ITERATOR
Q1,00 PCWC 10 TABLE
ACCESS FULL DEPT Q1,00 PCWP
-----------------------------------------
-----..---------------------------------
38
(No Transcript)
39
Statistical Analysis
  • Descriptive Statistics average, standard
    deviation, variance, min, max, median (via
    percentile_count), mode, group-by roll-up
  • Correlations Pearsons correlation coefficients,
    Spearman's and Kendall's (both nonparametric).
  • Cross Tabs Enhanced with statistics chi
    squared, phi coefficient, Cramer's V, contingency
    coefficient, Cohen's kappa
  • Hypothesis Testing t-test , F-test, ANOVA,
    Chi-square, Mann Whitney, Kolmogorov-Smirnov,
    Wilcoxon signed ranks
  • Distribution Fitting normal, uniform, Poisson,
    exponential, Weibull

New Oracle10g features are in yellow
40
Oracle10g - Frequent Itemsets
  • Built to support Data Mining
  • Often used in market-basket analysis
  • Finds the most common combinations of items which
    are purchased together
  • Show the hundred most common pairs of items which
    are purchased together.
  • Show all combinations of 3 or more products which
    were purchased together in the same transaction
    more than 1000 times and in which one of the
    items was a six-pack of soft drinks.

41
Frequent Itemsets
  • Without any optimizations, frequent itemsets must
    be computed by issuing thousands of SQL
    statements (e.g. count the first pair of items,
    then count the next pair of times, )
  • With optimization, Oracle10g uses advanced
    algorithms to count multiple itemsets more
    efficiently
  • The build time for Oracle Data Minings
    Association Rules algorithm measured at4-12X
    faster, depending on workload

42
Frequent Itemsets
  • Example Query and Results

select cast(itemset as fi_char) itemset,
support, length, total_tranx from
table(dbms_frequent_itemset.fi_transactional
(cursor(select session_id, command
from web_log where
(command like 'pdf' or command like 'zip' or
command like 'PDF' or command like
'ZIP')), (60/2600), 2, 2, null,
null)) ITEMSET ---------------------------------
---------------------------------- SUPPORT
LENGTH RNK ---------- ----------
---------- FI_CHAR('GET twp_data_compression.pdf',
'GET vldw_winter.pdf') 115 2
1 FI_CHAR('GET 9idwperf_flow_twp.pdf', 'GET
9idwperf_sizes_twp.pdf') 109 2
2 FI_CHAR('GET 9imv_twp.pdf', 'GET
twp_query_optimization.pdf') 107
2 3
43
Inverse Percentile FunctionsFind value
corresponding to a given percentile
Example Find the median home price within each
city
SELECT city, avg(price) avg_p,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
price) median_p FROM homes GROUP BY city
city price SF 500,000 SF
700,000 SF 1,000,000 NYC 300,000 NYC
600,000 NYC 900,000
44
Hypothetical Rank Distribution functions Find
rank/distribution value of specified number
Example Find hypothetical rank for a house with
a price of 400,000, by Area
SELECT Area, RANK(400000) WITHIN GROUP (ORDER
BY Price DESC) hypo_rank FROM Homes GROUP BY AREA
45
First/Last Aggregate Functions
Example Get beginning and closing balances per
year
SELECT year MIN(balance) KEEP (DENSE_RANK
FIRST ORDER BY month) beg_bal, MIN(balance)
KEEP (DENSE_RANK LAST ORDER BY month)
end_bal FROM ledger GROUP BY year
46
Scenario for Aggregation
  • Star Schema with two dimensions Product Time

How do we get to the logical cube?
47
Simple Group By
SELECT Year, Division, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY Year,
Division
Produces subtotal
Creates a single group
48
Group By Rollup
SELECT Year,Quarter,Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY ROLLUP (Year, Quarter, Month)
All Div.
Division
Brand
Item
Year
Produces subtotals
Month
Quarter
All Years
Creates groups at multiple levels
49
Group By Cube
All Div.
SELECT Division, Year, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY
CUBE (Year, Division)
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates all possible combinations
50
Group By Grouping Sets
All Div.
SELECT Division, Brand, Year, Quarter,
Sum(Sales) FROM Sales, Items, Time WHERE
ltjoin-conditiongt GROUP BY GROUPING SETS
((Quarter, Division), (Year, Brand))
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates just the specified groups
51
Concatenated Groups
SELECT Division, Brand, Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY Month, (G1) GROUPING SETS
(Division, Brand) (G2)
All Div.
Division
Brand
Item
Year
Month
Quarter
All Years
G1 x G2
G1 produces
G2 produces
Month
Division Brand
Write a Comment
User Comments (0)
About PowerShow.com