Oracle Database 10g for Data Warehousing

About This Presentation

Title:

Oracle Database 10g for Data Warehousing

Description:

Composite Range-List enables logical sub-partitioning for the most commonly used ... RANK(400000) WITHIN GROUP (ORDER BY Price DESC) hypo_rank. FROM Homes ... – PowerPoint PPT presentation

Number of Views:171

Avg rating:3.0/5.0

Slides: 37

Provided by: dwprodu

Learn more at: http://www.nocoug.org

Category:

more less

Transcript and Presenter's Notes

Title: Oracle Database 10g for Data Warehousing

1
(No Transcript)
2
Recent Query Processing Enhancements NoCOUG
Conference February 19th, 2004 George Lumpkin
3
Query processing

For the purposes of this presentation, query
processing includes
The underlying database objects which are being
accessed (table, indexes, etc)
The SQL functions and capabilities used to access
those database objects
The internal algorithms for executing SQL
statements (table scans, index probes, joins,
etc)
The optimization techniques applied to SQL
statements
The capabilities to view and understand Oracles
query processing
Improved performance is the primary benefit of
enhanced query processing

4
Database Objects
5
Database objects

Oracle9i
Table Compression
List and Range-List Partitioning
Bitmap join index
IOTs hash partitiong, parallel DML, bitmap
indexes
Datetime datatype
Oracle10g
Floating point datatype
Global hash-partitioned indexes
Datetime improvements

6
Table Compression (Oracle9i, Release 2)

Tables can be compressed
Compression can also be specified at the
partition level
Indexes are not compressed
Typical compression ratios range from 31 to 51
Compression is dependent upon the actual data
Compression algorithm based on removing data
redundancy
Key benefit is cost savings
Save TBs of storage without compromising
performance or functionality
However, a secondary benefit is often performance
due to reduced IO utilization

7
List and Composite Range-List Partitioning

List partitioning allows a table to be
partitioned with a list of values
For example, a table can be partitioned by region
or by department
Composite Range-List enables logical
sub-partitioning for the most commonly used Range
partitioning
Further flexibility in how a DBA can manage large
data sets
Provide appropriate partitioning techniques for
all business requirements

8
Composite Range-List Partitioning
Range partition across time
List partition across another major attribute
...
9
Implementation and Usage Tips List and Composite
List-Range Partitioning

Consider LIST (sub)partitioning when
You have a column containing unordered values,
which correspond to a logical unit for data
maintenance and query access
Use a DEFAULT list partition when
You may have unexpected values for the
partitioning key
You often add or modify values in your
partitioning key.
Migration of existing nonpartitioned and
partitioned tables
For online migration, use the dbms_redefinition
package
For offline creation, use
CREATE TABLE AS SELECT
INSERT / APPEND /

Performance Benefits
Manageability Benefits
10
IEEE Floating Point

New datatypes binary_float and binary_double
Precise mapping to Java and other application
environments
Potential space reduction
4/8 bytes fixed vs. up to 21 bytes variable for
Oracle number
Increased range of values
Binary doubles 11-bit exponent
Performance improvement
Native hardware vs. proprietary software for
calculations
Caveat binary numbers are subject to rounding
effects and are never suitable for data requiring
precision

11
IEEE Floating Point

Biggest potential benefit for BI Improved
performance for lengthy/complex arithmetical
expressions
Example query with 20X performance gains
Second biggest potential benefit Space savings
for lengthy numeric types

select promotion_name, exp
(geo_mean_temp/count) as geometric_mean from (
select p.promo_name as promotion_name,
sum (ln (quantity_sold)) as geo_mean_temp
from sales s, promotions p where
p.promo_id s.promo_id and time_id
between to_date ('01-jan-1998', 'dd-mon-yyyy')
and to_date ('31-dec-1998','dd-mon-yyyy
') and quantity_sold gt 0
group by promo_name) order by geometric_mean
desc
12
SQL Functions
13
SQL Functions

Oracle9i
ANSI Joins
Full outer joins
CASE statement
Grouping sets
WITH clause
Oracle10g
Enhanced connect by
Partition Outer Join
Regular Expressions
SQL Models
Statistical functions
Frequent Itemsets

14
WITH clause

Useful when a given query accesses the same
subquery multiple times
WITH channel_summary AS
( SELECT channels.channel_desc,
SUM(amount_sold) AS channel_total
FROM sales, channels
WHERE sales.channel_id channels.channel_id
GROUP BY channels.channel_desc )
SELECT channel_desc, channel_total
FROM channel_summary
WHERE channel_total gt
( SELECT SUM(channel_total) 1/3 FROM
channel_summary)

15
Partitioned Outer Join

New outer join syntax enabling easy specification
and high performance for joins that "densify"
sparse data.
To specify comparison calculations and to format
reports reliably, best to return a consistent set
of dimension members in query results
Yet data normally stored in "sparse" form why
waste space storing non-occurrence?
Ugly and slow SQL needed to add back rows for
nonexistent cases into query output.
Most frequently used to replace missing values
along time dimension.
Accepted for ANSI SQL standard.

16
Partitioned Outer Join - Basics
Inventory Tabletime_id product quant
Inventory table holds only changed values. But
for calculations reporting, we want rows for
the full set of dates.
1 April 2003 Bottle 10 6
April 2003 Bottle 8 1 April 2003
Can 15 4 April 2003
Can 11
1 April 2003 Bottle 10 2
April 2003 Bottle 3 April 2003
Bottle 4 April 2003 Bottle
5 April 2003 Bottle 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
3 April 2003 Can 4
April 2003 Can 11 5 April 2003
Can 6 April 2003 Can

SELECT times.time_id, product, quant FROM
inventory PARTITION BY (product) RIGHT
OUTER JOIN times ON (times.time_idinventory.ti
me_id)

Similar to a regular outer join, except the
outer join is applied to each partition.

17
Partition Outer Join Repeating Values

The last non-null values should be preserved for
subsequent records (typical inventory problem)
New analytical SQL keyword for LAST_VALUE()

1 April 2003 Bottle 10 2
April 2003 Bottle 10 3 April 2003
Bottle 10 4 April 2003 Bottle
10 5 April 2003 Bottle 10 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
15 3 April 2003 Can 15 4
April 2003 Can 11 5 April 2003
Can 11 6 April 2003 Can
11
SELECT time_id, product, LAST_VALUE (quant
IGNORE NULLS) OVER (PARTITION BY product
ORDER BY time_id) quant FROM ( SELECT
times.time_id, product, quant FROM inventory
PARTITION BY (product) RIGHT OUTER JOIN
times ON (times.time_idinventory.time_id) )
18
Row Sources
19
Row sources

Oracle9i
Sampling
Index skip scans
Oracle10g
Table scan speed-up
Inline Lob access speedup

20
Index Skip Scan

In Oracle8i, composite index used only if first
(prefix) column in the predicate
In Oracle9i, skip scan uses the composite index
that is far faster than a Full Table Scan
No need for another index
Especially useful if the number of distinct
values of prefix column are relatively low

21
Index Skip Scan

Business Scenario Department of Motorized
Vehicles
A car is uniquely identified by State and
registration ID
Unique index on (STATE, REGISTRATION)
Query Find the details of a registration ID when
the State is not known
Index skip scan allows composite index to be used
for this query
Can be many times faster than not using an index
Before index skip scans,
Bad performance because of lack of index
Or, extra cost, maintenance to create index on
(registration)

22
Optimizer
23
Optimizer

Oracle9i
Dynamic Sampling
Bind Peeking
Index Joins
Oracle10g
Automatic SQL Tuning

24
Bind peeking

In the first invocation of a cursor containing
bind variables, the optimizer will peek at the
bind values and use those values to optimize the
query
The query plan will remain cached, and will be
re-used for future invocations
The bind variables in the first invocation should
thus be representative values

25
Optimizer Dynamic SamplingOracle9i Rel 2

Problem optimizer statistics may be missing or
known to be inaccurate
Solution statistics are dynamically gathered
during query optimization
Table predicate selectivity and cardinality
Sampling is used to minimize the time required to
gather statistics
Statistics are only gathered for queries which
are expected to take a long time (relative to the
cost of gathering stats)

26
Optimizer Dynamic Sampling

Settings for OPTIMIZER_DYNAMIC_SAMPLING
parameter
0 -- Off.
1 Used for multi-table queries for tables w/o
both statistics and indexes. Little overhead
since you will have to do a full scan anyway.
This is the default in 9iR2.
2 -- Used for any unanalyzed object. This is the
default in 10g where we have automated stats
collection, but users may still have volatile
objects without stats. This is the default in 10g
3 Used when the optimizer has to use a guess,
e.g., to_number(c1) gt 10.
4 -- Used if correlations could be present, e.g.,
ANDed or ORed conditions on the same table.

27
Diagnostics
28
Diagnostics

Oracle9i
Query execution statistics
Enhanced SQL trace information
Enhanced explain plan output (DBMS_XPLAN)
Oracle10g
Automatic workload repository and automated
diagnosis with ADDM
Self-tuning SQL optimization
Parallel Execution Enhancements No Slave SQL

29
Query Execution Statistics

Oracle9i introduces new dynamic views for a
deeper insight into SQL Execution
VSQL_PLAN_
Execution plans of all cursors in the shared SQL
area
Cursor runtime statistics can be collected with
STATISTICS_LEVELALL
VSQL_WORKAREA_
Detailed information about the memory usage for
all running SQL statements down to a row source
level
Activated when PGA_AGGREGATE_TARGET ltgt 0

30
VSQL_PLAN

VSQL_PLAN equivalent to PLAN_TABLE
Shows actual used plan
SQL shown to select plan is simplified
In Oracle10g, you can use DBMS_XPLAN

SQLgt select / TRACK_ME / e.ename, d.dname
from scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, cost, bytes from
vsql_plan where hash_value (select
hash_value from vsql where sql_text like
'TRACK_ME'
and sql_text not like 'NOT_ME') order
by 1 ID OPERATION OBJECT_NAME
COST BYTES ---------- ----------------------
-------- --------------------- ----------
---------- 0 SELECT STATEMENT 5 1
HASH JOIN 4 588 2 TABLE ACCESS
DEPT 2 88 3 TABLE ACCESS
EMP 2 280
31
VSQL_PLAN_STATISTICS_ALL

VSQL_PLAN_STATISTICS shows actual cursor
execution statistics (overhead, not enabled by
default)
SQL shown to select plan is simplified
In Oracle10g, you can use DBMS_XPLAN

SQLgt alter session set statistics_levelALL SQLgt
select / TRACK_ME / e.ename, d.dname from
scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, last_output_rows "ROWS",
last_cr_buffer_gets CR", last_disk_reads
PR", last_elapsed_time "TIME us" from
vsql_plan_statistics_all where hash_value
(select hash_value from vsql where sql_text
like 'TRACK_ME' and sql_text not like
'NOT_ME') order by 1 ID OPERATION
OBJECT_NAME ROWS CR PR TIME us --
------------ ----------------------- ---------
--------- -------- ---------- 1 HASH JOIN
14 7 0 2685 2 TABLE ACCESS DEPT
4 3 0 377 3 TABLE ACCESS EMP 14
4 0 426
32
Enhanced SQL Trace

By Default, SQL trace includes some runtime
statistics for a SQL statement

PARSING IN CURSOR 11 select e.ename, d.dname
from emp e, dept d where e.deptnod.deptno END OF
STMT PARSE 11c20000,e13838,p0,cr8,cu0,mis1
,r0,dep0,og1,tim6389414660329 EXEC
11c0,e102,p0,cr0,cu0,mis0,r0,dep0,og1,t
im6389414660694 FETCH 11c0,e2132,p0,cr6,cu
0,mis0,r1,dep0,og1,tim6389414663127 FETCH
11c0,e1329,p0,cr1,cu0,mis0,r13,dep0,og1
,tim6389414665803 XCTEND rlbk0, rd_only1 STAT
11 id1 cnt14 pid0 op'HASH JOIN (cr7 pr0
pw0 time3357 us)' STAT 11 id2 cnt4 pid1
op'TABLE ACCESS FULL DEPT (cr3 pr0 pw0
time525 us)' STAT 11 id3 cnt14 pid1
op'TABLE ACCESS FULL EMP (cr4 pr0 pw0
time380 us)'
33
Workload Repository

Data Warehouse of the Database
Facility to collect, process, and maintain
important RDBMS statistics and workload SQL
workload, segment statistics, time wait
statistics, metrics, feature usage
Efficiently sample and compute statistics in
memory
Periodically flush coarser-grain information to
disk - in a self-managed tablespace
Information readily available real-time
accessible when needed
On by default - flush to disk every 30 min, keep
for 7 days

34
Automatic Database Diagnostic Monitor (ADDM)

Performance expert in a box
Provides database-wide performance diagnostic,
including RAC
Automatically runs at every capture of workload
repository snapshot
Provides impact and benefit analysis
Provides Information vs. raw data
Integrates all components using the Time Model

Application SQL Management
Storage Management
System Resource Management
Space Management
Backup Recovery Management
Database Management
Intelligent Infrastructure
35
Self-Tuning SQL OptimizationOracle10g

Internally analyze most expensive queries, as
determined by Workload Repository
Use sampling, including join sampling, to
determine cardinalities of intermediate results
When optimizer errors are found, store corrective
information
Apply corrections next time query is executed
Related techniques for truly ad hoc queries

36
Oracle10g - No Slave SQL

In Oracle9i, parallel execution plans were
complex
Difficult to read/understand due to multiple
cursors
Difficult to analyze statement-level performance
information

SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno -----------------
-------------------------------------------------
---------------- Id Operation
Name TQ IN-OUT PQ Distribution
-------------------------------------------------
---------------------------------- 0 SELECT
STATEMENT
1 HASH JOIN
88,01 P-gtS QC (RAND)
2 TABLE ACCESS FULL EMP
88,01 PCWP 3 TABLE
ACCESS FULL DEPT 88,00 P-gtP
BROADCAST --------------------------------
--------------------------------------------------
- PX Slave SQL Information (identified by
operation id) -----------------------------------
------------------- 1 - SELECT / ORDERED
NO_EXPAND USE_HASH(A2) / A1.C1,A2.C1
FROM (SELECT / NO_EXPAND ROWID(A3) /
A3."DEPTNO" C0,A3."ENAME" C1 FROM
"EMP" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A3
) A1,Q288000 A2 WHERE A1.C0A2.C0 3 -
SELECT / NO_EXPAND ROWID(A1) / A1."DEPTNO"
C0,A1."DNAME" C1 FROM "DEPT"
PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
37
Oracle10g - No Slave SQL

Oracle Database 10g single execution plan,
single cursor

SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno ----------------
------------------------------..
-------------------------------- Id
Operation Name TQ
IN-OUT PQ Distrib --------------------------
---------------------..---------------------------
----- 0 SELECT STATEMENT
1 PX
COORDINATOR
2 PX SEND QC (RANDOM)
TQ10001 Q1,01 P-gtS QC (RAND)
3 HASH JOIN
Q1,01 PCWP 4 PX
BLOCK ITERATOR Q1,01
PCWC 5 TABLE ACCESS FULL
EMP Q1,01 PCWP
6 BUFFER SORT
Q1,01 PCWC 7 PX
RECEIVE Q1,01 PCWP
8 PX SEND BROADCAST
TQ10000 Q1,00 P-gtP BROADCAST
9 PX BLOCK ITERATOR
Q1,00 PCWC 10 TABLE
ACCESS FULL DEPT Q1,00 PCWP
-----------------------------------------
-----..---------------------------------
38
(No Transcript)
39
Statistical Analysis

Descriptive Statistics average, standard
deviation, variance, min, max, median (via
percentile_count), mode, group-by roll-up
Correlations Pearsons correlation coefficients,
Spearman's and Kendall's (both nonparametric).
Cross Tabs Enhanced with statistics chi
squared, phi coefficient, Cramer's V, contingency
coefficient, Cohen's kappa
Hypothesis Testing t-test , F-test, ANOVA,
Chi-square, Mann Whitney, Kolmogorov-Smirnov,
Wilcoxon signed ranks
Distribution Fitting normal, uniform, Poisson,
exponential, Weibull

New Oracle10g features are in yellow
40
Oracle10g - Frequent Itemsets

Built to support Data Mining
Often used in market-basket analysis
Finds the most common combinations of items which
are purchased together
Show the hundred most common pairs of items which
are purchased together.
Show all combinations of 3 or more products which
were purchased together in the same transaction
more than 1000 times and in which one of the
items was a six-pack of soft drinks.

41
Frequent Itemsets

Without any optimizations, frequent itemsets must
be computed by issuing thousands of SQL
statements (e.g. count the first pair of items,
then count the next pair of times, )
With optimization, Oracle10g uses advanced
algorithms to count multiple itemsets more
efficiently
The build time for Oracle Data Minings
Association Rules algorithm measured at4-12X
faster, depending on workload

42
Frequent Itemsets

Example Query and Results

select cast(itemset as fi_char) itemset,
support, length, total_tranx from
table(dbms_frequent_itemset.fi_transactional
(cursor(select session_id, command
from web_log where
(command like 'pdf' or command like 'zip' or
command like 'PDF' or command like
'ZIP')), (60/2600), 2, 2, null,
null)) ITEMSET ---------------------------------
---------------------------------- SUPPORT
LENGTH RNK ---------- ----------
---------- FI_CHAR('GET twp_data_compression.pdf',
'GET vldw_winter.pdf') 115 2
1 FI_CHAR('GET 9idwperf_flow_twp.pdf', 'GET
9idwperf_sizes_twp.pdf') 109 2
2 FI_CHAR('GET 9imv_twp.pdf', 'GET
twp_query_optimization.pdf') 107
2 3
43
Inverse Percentile FunctionsFind value
corresponding to a given percentile
Example Find the median home price within each
city
SELECT city, avg(price) avg_p,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
price) median_p FROM homes GROUP BY city
city price SF 500,000 SF
700,000 SF 1,000,000 NYC 300,000 NYC
600,000 NYC 900,000
44
Hypothetical Rank Distribution functions Find
rank/distribution value of specified number
Example Find hypothetical rank for a house with
a price of 400,000, by Area
SELECT Area, RANK(400000) WITHIN GROUP (ORDER
BY Price DESC) hypo_rank FROM Homes GROUP BY AREA
45
First/Last Aggregate Functions
Example Get beginning and closing balances per
year
SELECT year MIN(balance) KEEP (DENSE_RANK
FIRST ORDER BY month) beg_bal, MIN(balance)
KEEP (DENSE_RANK LAST ORDER BY month)
end_bal FROM ledger GROUP BY year
46
Scenario for Aggregation

Star Schema with two dimensions Product Time

How do we get to the logical cube?
47
Simple Group By
SELECT Year, Division, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY Year,
Division
Produces subtotal
Creates a single group
48
Group By Rollup
SELECT Year,Quarter,Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY ROLLUP (Year, Quarter, Month)
All Div.
Division
Brand
Item
Year
Produces subtotals
Month
Quarter
All Years
Creates groups at multiple levels
49
Group By Cube
All Div.
SELECT Division, Year, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY
CUBE (Year, Division)
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates all possible combinations
50
Group By Grouping Sets
All Div.
SELECT Division, Brand, Year, Quarter,
Sum(Sales) FROM Sales, Items, Time WHERE
ltjoin-conditiongt GROUP BY GROUPING SETS
((Quarter, Division), (Year, Brand))
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates just the specified groups
51
Concatenated Groups
SELECT Division, Brand, Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY Month, (G1) GROUPING SETS
(Division, Brand) (G2)
All Div.
Division
Brand
Item
Year
Month
Quarter
All Years
G1 x G2
G1 produces
G2 produces
Month
Division Brand

Write a Comment

User Comments (0)