Title: Oracle Database 10g for Data Warehousing
1(No Transcript)
2Recent Query Processing Enhancements NoCOUG
Conference February 19th, 2004 George Lumpkin
3Query processing
- For the purposes of this presentation, query
processing includes - The underlying database objects which are being
accessed (table, indexes, etc) - The SQL functions and capabilities used to access
those database objects - The internal algorithms for executing SQL
statements (table scans, index probes, joins,
etc) - The optimization techniques applied to SQL
statements - The capabilities to view and understand Oracles
query processing - Improved performance is the primary benefit of
enhanced query processing
4Database Objects
5Database objects
- Oracle9i
- Table Compression
- List and Range-List Partitioning
- Bitmap join index
- IOTs hash partitiong, parallel DML, bitmap
indexes - Datetime datatype
- Oracle10g
- Floating point datatype
- Global hash-partitioned indexes
- Datetime improvements
6Table Compression (Oracle9i, Release 2)
- Tables can be compressed
- Compression can also be specified at the
partition level - Indexes are not compressed
- Typical compression ratios range from 31 to 51
- Compression is dependent upon the actual data
- Compression algorithm based on removing data
redundancy - Key benefit is cost savings
- Save TBs of storage without compromising
performance or functionality - However, a secondary benefit is often performance
due to reduced IO utilization
7List and Composite Range-List Partitioning
- List partitioning allows a table to be
partitioned with a list of values - For example, a table can be partitioned by region
or by department - Composite Range-List enables logical
sub-partitioning for the most commonly used Range
partitioning - Further flexibility in how a DBA can manage large
data sets - Provide appropriate partitioning techniques for
all business requirements
8Composite Range-List Partitioning
Range partition across time
List partition across another major attribute
...
9Implementation and Usage Tips List and Composite
List-Range Partitioning
- Consider LIST (sub)partitioning when
- You have a column containing unordered values,
which correspond to a logical unit for data
maintenance and query access - Use a DEFAULT list partition when
- You may have unexpected values for the
partitioning key - You often add or modify values in your
partitioning key. - Migration of existing nonpartitioned and
partitioned tables - For online migration, use the dbms_redefinition
package - For offline creation, use
- CREATE TABLE AS SELECT
- INSERT / APPEND /
Performance Benefits
Manageability Benefits
10IEEE Floating Point
- New datatypes binary_float and binary_double
- Precise mapping to Java and other application
environments - Potential space reduction
- 4/8 bytes fixed vs. up to 21 bytes variable for
Oracle number - Increased range of values
- Binary doubles 11-bit exponent
- Performance improvement
- Native hardware vs. proprietary software for
calculations - Caveat binary numbers are subject to rounding
effects and are never suitable for data requiring
precision
11IEEE Floating Point
- Biggest potential benefit for BI Improved
performance for lengthy/complex arithmetical
expressions - Example query with 20X performance gains
- Second biggest potential benefit Space savings
for lengthy numeric types
select promotion_name, exp
(geo_mean_temp/count) as geometric_mean from (
select p.promo_name as promotion_name,
sum (ln (quantity_sold)) as geo_mean_temp
from sales s, promotions p where
p.promo_id s.promo_id and time_id
between to_date ('01-jan-1998', 'dd-mon-yyyy')
and to_date ('31-dec-1998','dd-mon-yyyy
') and quantity_sold gt 0
group by promo_name) order by geometric_mean
desc
12SQL Functions
13SQL Functions
- Oracle9i
- ANSI Joins
- Full outer joins
- CASE statement
- Grouping sets
- WITH clause
- Oracle10g
- Enhanced connect by
- Partition Outer Join
- Regular Expressions
- SQL Models
- Statistical functions
- Frequent Itemsets
14WITH clause
- Useful when a given query accesses the same
subquery multiple times - WITH channel_summary AS
- ( SELECT channels.channel_desc,
- SUM(amount_sold) AS channel_total
- FROM sales, channels
- WHERE sales.channel_id channels.channel_id
- GROUP BY channels.channel_desc )
- SELECT channel_desc, channel_total
- FROM channel_summary
- WHERE channel_total gt
- ( SELECT SUM(channel_total) 1/3 FROM
channel_summary)
15Partitioned Outer Join
- New outer join syntax enabling easy specification
and high performance for joins that "densify"
sparse data. - To specify comparison calculations and to format
reports reliably, best to return a consistent set
of dimension members in query results - Yet data normally stored in "sparse" form why
waste space storing non-occurrence? - Ugly and slow SQL needed to add back rows for
nonexistent cases into query output. - Most frequently used to replace missing values
along time dimension. - Accepted for ANSI SQL standard.
16Partitioned Outer Join - Basics
Inventory Tabletime_id product quant
Inventory table holds only changed values. But
for calculations reporting, we want rows for
the full set of dates.
1 April 2003 Bottle 10 6
April 2003 Bottle 8 1 April 2003
Can 15 4 April 2003
Can 11
1 April 2003 Bottle 10 2
April 2003 Bottle 3 April 2003
Bottle 4 April 2003 Bottle
5 April 2003 Bottle 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
3 April 2003 Can 4
April 2003 Can 11 5 April 2003
Can 6 April 2003 Can
SELECT times.time_id, product, quant FROM
inventory PARTITION BY (product) RIGHT
OUTER JOIN times ON (times.time_idinventory.ti
me_id)
- Similar to a regular outer join, except the
- outer join is applied to each partition.
17Partition Outer Join Repeating Values
- The last non-null values should be preserved for
subsequent records (typical inventory problem) - New analytical SQL keyword for LAST_VALUE()
1 April 2003 Bottle 10 2
April 2003 Bottle 10 3 April 2003
Bottle 10 4 April 2003 Bottle
10 5 April 2003 Bottle 10 6 April
2003 Bottle 8 1 April 2003
Can 15 2 April 2003 Can
15 3 April 2003 Can 15 4
April 2003 Can 11 5 April 2003
Can 11 6 April 2003 Can
11
SELECT time_id, product, LAST_VALUE (quant
IGNORE NULLS) OVER (PARTITION BY product
ORDER BY time_id) quant FROM ( SELECT
times.time_id, product, quant FROM inventory
PARTITION BY (product) RIGHT OUTER JOIN
times ON (times.time_idinventory.time_id) )
18Row Sources
19Row sources
- Oracle9i
- Sampling
- Index skip scans
- Oracle10g
- Table scan speed-up
- Inline Lob access speedup
20Index Skip Scan
- In Oracle8i, composite index used only if first
(prefix) column in the predicate - In Oracle9i, skip scan uses the composite index
that is far faster than a Full Table Scan - No need for another index
- Especially useful if the number of distinct
values of prefix column are relatively low
21Index Skip Scan
- Business Scenario Department of Motorized
Vehicles - A car is uniquely identified by State and
registration ID - Unique index on (STATE, REGISTRATION)
- Query Find the details of a registration ID when
the State is not known - Index skip scan allows composite index to be used
for this query - Can be many times faster than not using an index
- Before index skip scans,
- Bad performance because of lack of index
- Or, extra cost, maintenance to create index on
(registration)
22Optimizer
23Optimizer
- Oracle9i
- Dynamic Sampling
- Bind Peeking
- Index Joins
- Oracle10g
- Automatic SQL Tuning
24Bind peeking
- In the first invocation of a cursor containing
bind variables, the optimizer will peek at the
bind values and use those values to optimize the
query - The query plan will remain cached, and will be
re-used for future invocations - The bind variables in the first invocation should
thus be representative values
25Optimizer Dynamic SamplingOracle9i Rel 2
- Problem optimizer statistics may be missing or
known to be inaccurate - Solution statistics are dynamically gathered
during query optimization - Table predicate selectivity and cardinality
- Sampling is used to minimize the time required to
gather statistics - Statistics are only gathered for queries which
are expected to take a long time (relative to the
cost of gathering stats)
26Optimizer Dynamic Sampling
- Settings for OPTIMIZER_DYNAMIC_SAMPLING
parameter - 0 -- Off.
- 1 Used for multi-table queries for tables w/o
both statistics and indexes. Little overhead
since you will have to do a full scan anyway.
This is the default in 9iR2. - 2 -- Used for any unanalyzed object. This is the
default in 10g where we have automated stats
collection, but users may still have volatile
objects without stats. This is the default in 10g - 3 Used when the optimizer has to use a guess,
e.g., to_number(c1) gt 10. - 4 -- Used if correlations could be present, e.g.,
ANDed or ORed conditions on the same table.
27Diagnostics
28Diagnostics
- Oracle9i
- Query execution statistics
- Enhanced SQL trace information
- Enhanced explain plan output (DBMS_XPLAN)
- Oracle10g
- Automatic workload repository and automated
diagnosis with ADDM - Self-tuning SQL optimization
- Parallel Execution Enhancements No Slave SQL
29Query Execution Statistics
- Oracle9i introduces new dynamic views for a
deeper insight into SQL Execution - VSQL_PLAN_
- Execution plans of all cursors in the shared SQL
area - Cursor runtime statistics can be collected with
STATISTICS_LEVELALL - VSQL_WORKAREA_
- Detailed information about the memory usage for
all running SQL statements down to a row source
level - Activated when PGA_AGGREGATE_TARGET ltgt 0
30VSQL_PLAN
- VSQL_PLAN equivalent to PLAN_TABLE
- Shows actual used plan
- SQL shown to select plan is simplified
- In Oracle10g, you can use DBMS_XPLAN
SQLgt select / TRACK_ME / e.ename, d.dname
from scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, cost, bytes from
vsql_plan where hash_value (select
hash_value from vsql where sql_text like
'TRACK_ME'
and sql_text not like 'NOT_ME') order
by 1 ID OPERATION OBJECT_NAME
COST BYTES ---------- ----------------------
-------- --------------------- ----------
---------- 0 SELECT STATEMENT 5 1
HASH JOIN 4 588 2 TABLE ACCESS
DEPT 2 88 3 TABLE ACCESS
EMP 2 280
31VSQL_PLAN_STATISTICS_ALL
- VSQL_PLAN_STATISTICS shows actual cursor
execution statistics (overhead, not enabled by
default) - SQL shown to select plan is simplified
- In Oracle10g, you can use DBMS_XPLAN
SQLgt alter session set statistics_levelALL SQLgt
select / TRACK_ME / e.ename, d.dname from
scott.emp e, scott.dept d where
e.deptnod.deptno SQLgt select / NOT_ME / id,
operation, object_name, last_output_rows "ROWS",
last_cr_buffer_gets CR", last_disk_reads
PR", last_elapsed_time "TIME us" from
vsql_plan_statistics_all where hash_value
(select hash_value from vsql where sql_text
like 'TRACK_ME' and sql_text not like
'NOT_ME') order by 1 ID OPERATION
OBJECT_NAME ROWS CR PR TIME us --
------------ ----------------------- ---------
--------- -------- ---------- 1 HASH JOIN
14 7 0 2685 2 TABLE ACCESS DEPT
4 3 0 377 3 TABLE ACCESS EMP 14
4 0 426
32Enhanced SQL Trace
- By Default, SQL trace includes some runtime
statistics for a SQL statement
PARSING IN CURSOR 11 select e.ename, d.dname
from emp e, dept d where e.deptnod.deptno END OF
STMT PARSE 11c20000,e13838,p0,cr8,cu0,mis1
,r0,dep0,og1,tim6389414660329 EXEC
11c0,e102,p0,cr0,cu0,mis0,r0,dep0,og1,t
im6389414660694 FETCH 11c0,e2132,p0,cr6,cu
0,mis0,r1,dep0,og1,tim6389414663127 FETCH
11c0,e1329,p0,cr1,cu0,mis0,r13,dep0,og1
,tim6389414665803 XCTEND rlbk0, rd_only1 STAT
11 id1 cnt14 pid0 op'HASH JOIN (cr7 pr0
pw0 time3357 us)' STAT 11 id2 cnt4 pid1
op'TABLE ACCESS FULL DEPT (cr3 pr0 pw0
time525 us)' STAT 11 id3 cnt14 pid1
op'TABLE ACCESS FULL EMP (cr4 pr0 pw0
time380 us)'
33Workload Repository
- Data Warehouse of the Database
- Facility to collect, process, and maintain
important RDBMS statistics and workload SQL
workload, segment statistics, time wait
statistics, metrics, feature usage - Efficiently sample and compute statistics in
memory - Periodically flush coarser-grain information to
disk - in a self-managed tablespace - Information readily available real-time
accessible when needed - On by default - flush to disk every 30 min, keep
for 7 days
34Automatic Database Diagnostic Monitor (ADDM)
- Performance expert in a box
- Provides database-wide performance diagnostic,
including RAC - Automatically runs at every capture of workload
repository snapshot - Provides impact and benefit analysis
- Provides Information vs. raw data
- Integrates all components using the Time Model
Application SQL Management
Storage Management
System Resource Management
Space Management
Backup Recovery Management
Database Management
Intelligent Infrastructure
35Self-Tuning SQL OptimizationOracle10g
- Internally analyze most expensive queries, as
determined by Workload Repository - Use sampling, including join sampling, to
determine cardinalities of intermediate results - When optimizer errors are found, store corrective
information - Apply corrections next time query is executed
- Related techniques for truly ad hoc queries
36Oracle10g - No Slave SQL
- In Oracle9i, parallel execution plans were
complex - Difficult to read/understand due to multiple
cursors - Difficult to analyze statement-level performance
information
SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno -----------------
-------------------------------------------------
---------------- Id Operation
Name TQ IN-OUT PQ Distribution
-------------------------------------------------
---------------------------------- 0 SELECT
STATEMENT
1 HASH JOIN
88,01 P-gtS QC (RAND)
2 TABLE ACCESS FULL EMP
88,01 PCWP 3 TABLE
ACCESS FULL DEPT 88,00 P-gtP
BROADCAST --------------------------------
--------------------------------------------------
- PX Slave SQL Information (identified by
operation id) -----------------------------------
------------------- 1 - SELECT / ORDERED
NO_EXPAND USE_HASH(A2) / A1.C1,A2.C1
FROM (SELECT / NO_EXPAND ROWID(A3) /
A3."DEPTNO" C0,A3."ENAME" C1 FROM
"EMP" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A3
) A1,Q288000 A2 WHERE A1.C0A2.C0 3 -
SELECT / NO_EXPAND ROWID(A1) / A1."DEPTNO"
C0,A1."DNAME" C1 FROM "DEPT"
PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
37Oracle10g - No Slave SQL
- Oracle Database 10g single execution plan,
single cursor
SQLgt explain plan for select /parallel(d)
parallel(e) / dname, ename from emp e,
dept d where e.deptnod.deptno ----------------
------------------------------..
-------------------------------- Id
Operation Name TQ
IN-OUT PQ Distrib --------------------------
---------------------..---------------------------
----- 0 SELECT STATEMENT
1 PX
COORDINATOR
2 PX SEND QC (RANDOM)
TQ10001 Q1,01 P-gtS QC (RAND)
3 HASH JOIN
Q1,01 PCWP 4 PX
BLOCK ITERATOR Q1,01
PCWC 5 TABLE ACCESS FULL
EMP Q1,01 PCWP
6 BUFFER SORT
Q1,01 PCWC 7 PX
RECEIVE Q1,01 PCWP
8 PX SEND BROADCAST
TQ10000 Q1,00 P-gtP BROADCAST
9 PX BLOCK ITERATOR
Q1,00 PCWC 10 TABLE
ACCESS FULL DEPT Q1,00 PCWP
-----------------------------------------
-----..---------------------------------
38(No Transcript)
39Statistical Analysis
- Descriptive Statistics average, standard
deviation, variance, min, max, median (via
percentile_count), mode, group-by roll-up - Correlations Pearsons correlation coefficients,
Spearman's and Kendall's (both nonparametric). - Cross Tabs Enhanced with statistics chi
squared, phi coefficient, Cramer's V, contingency
coefficient, Cohen's kappa - Hypothesis Testing t-test , F-test, ANOVA,
Chi-square, Mann Whitney, Kolmogorov-Smirnov,
Wilcoxon signed ranks - Distribution Fitting normal, uniform, Poisson,
exponential, Weibull
New Oracle10g features are in yellow
40Oracle10g - Frequent Itemsets
- Built to support Data Mining
- Often used in market-basket analysis
- Finds the most common combinations of items which
are purchased together - Show the hundred most common pairs of items which
are purchased together. - Show all combinations of 3 or more products which
were purchased together in the same transaction
more than 1000 times and in which one of the
items was a six-pack of soft drinks.
41Frequent Itemsets
- Without any optimizations, frequent itemsets must
be computed by issuing thousands of SQL
statements (e.g. count the first pair of items,
then count the next pair of times, ) - With optimization, Oracle10g uses advanced
algorithms to count multiple itemsets more
efficiently - The build time for Oracle Data Minings
Association Rules algorithm measured at4-12X
faster, depending on workload
42Frequent Itemsets
- Example Query and Results
select cast(itemset as fi_char) itemset,
support, length, total_tranx from
table(dbms_frequent_itemset.fi_transactional
(cursor(select session_id, command
from web_log where
(command like 'pdf' or command like 'zip' or
command like 'PDF' or command like
'ZIP')), (60/2600), 2, 2, null,
null)) ITEMSET ---------------------------------
---------------------------------- SUPPORT
LENGTH RNK ---------- ----------
---------- FI_CHAR('GET twp_data_compression.pdf',
'GET vldw_winter.pdf') 115 2
1 FI_CHAR('GET 9idwperf_flow_twp.pdf', 'GET
9idwperf_sizes_twp.pdf') 109 2
2 FI_CHAR('GET 9imv_twp.pdf', 'GET
twp_query_optimization.pdf') 107
2 3
43Inverse Percentile FunctionsFind value
corresponding to a given percentile
Example Find the median home price within each
city
SELECT city, avg(price) avg_p,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
price) median_p FROM homes GROUP BY city
city price SF 500,000 SF
700,000 SF 1,000,000 NYC 300,000 NYC
600,000 NYC 900,000
44Hypothetical Rank Distribution functions Find
rank/distribution value of specified number
Example Find hypothetical rank for a house with
a price of 400,000, by Area
SELECT Area, RANK(400000) WITHIN GROUP (ORDER
BY Price DESC) hypo_rank FROM Homes GROUP BY AREA
45First/Last Aggregate Functions
Example Get beginning and closing balances per
year
SELECT year MIN(balance) KEEP (DENSE_RANK
FIRST ORDER BY month) beg_bal, MIN(balance)
KEEP (DENSE_RANK LAST ORDER BY month)
end_bal FROM ledger GROUP BY year
46Scenario for Aggregation
- Star Schema with two dimensions Product Time
How do we get to the logical cube?
47Simple Group By
SELECT Year, Division, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY Year,
Division
Produces subtotal
Creates a single group
48Group By Rollup
SELECT Year,Quarter,Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY ROLLUP (Year, Quarter, Month)
All Div.
Division
Brand
Item
Year
Produces subtotals
Month
Quarter
All Years
Creates groups at multiple levels
49Group By Cube
All Div.
SELECT Division, Year, Sum(Sales) FROM Sales,
Items, Time WHERE ltjoin-conditiongt GROUP BY
CUBE (Year, Division)
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates all possible combinations
50Group By Grouping Sets
All Div.
SELECT Division, Brand, Year, Quarter,
Sum(Sales) FROM Sales, Items, Time WHERE
ltjoin-conditiongt GROUP BY GROUPING SETS
((Quarter, Division), (Year, Brand))
Division
Brand
Item
Year
Month
Quarter
All Years
Produces subtotals
Creates just the specified groups
51Concatenated Groups
SELECT Division, Brand, Month, Sum(Sales) FROM
Sales, Items, Time WHERE ltjoin-conditiongt GROUP
BY Month, (G1) GROUPING SETS
(Division, Brand) (G2)
All Div.
Division
Brand
Item
Year
Month
Quarter
All Years
G1 x G2
G1 produces
G2 produces
Month
Division Brand