Title: PowerPointPrsentation
1Are Your Statistics Bad Enough?
Verify the effectiveness of gathering optimizer
statistics Jaromir D.B. Nemec UKOUG 2005
27.10.2005
2Gathering Statistics
- The way from ANALYZE TABLE to DBMS_STATS
- Increasing number of parameters and options
- changing defaults
- CBO more intelligent
- a small change can have a big effect
- . . . and all that stuff only to get
- some redundant information
3Complexity Trap
In the chain of dependent objects the quantity is
increasing
4Are My Statistics Adequate?
Way 1 - observe (they are not if you encounter
problems) This is reactive approach Way 2 -
test it (define test cases and check them)
define invariant the invariant must respect
the dynamic of the database!
5Basis Scenario
- Motto fix the problem in its origin
- Limit the scope to the basic table access only
- (full scan, index access)
- This reduces the complexity significantly and
- can expose the real origin of problems in the
execution plan
6Simple Example
select from ta, tb where ta.rand tb.rand
and ta.norm 141 and ta.rand lt 770 and (tb.norm
1 or tb.norm -1) ----------------------------
----------------------------- Id Operation
Name Rows
------------------------------------------------
--------- 0 SELECT STATEMENT
65073 1 HASH JOIN
65073 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- Predicate Information
(identified by operation id) --------------------
------------------------------- 1 -
access("TA"."RAND""TB"."RAND") 2 -
filter("TA"."RAND"lt770) 3 -
access("TA"."NORM"141) 4 - filter(("TB"."NORM"
(-1) OR "TB"."NORM"1) AND "TB"."RAND"lt770)
7Verify it! Re-estimate
- For each table access
- Get owner, object name and alias
- Option FULL get filter predicate
- Option (b-tree) INDEX get access and filter
predicate on table and index line - enclose in brackets and connect with AND
- form a re-estimation query
- execute the query
8Re-estimation how does it work?
Input from execution plan 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- 2 - filter("TA"."RAND"lt770)
3 - access("TA"."NORM"141) 4 -
filter(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) Generated re-estimation
query select count() from "TB" where
(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) -- gives 60317 rows select
count() from "TA" where ("TA"."NORM"141) and
("TA"."RAND"lt770) -- gives 1 row Run the
re-estimation query to get the real cardinality,
compare it with the original estimation
9Re-estimation Summary
Run the re-estimation query to get the real
cardinality, compare it with the original
estimation
Note, that the TA access cardinality is
overestimated The TB access cardinality is
underestimated
10Re-estimation Basic Idea
- The simplification is based on the assumption
that all problems in the execution plan are
caused directly or indirectly by the cardinality
assumption in the base table access. - Examples
- Cardinality underestimation can lead to NL join
with big inner table - Cardinality overestimation can switch a NL join
to a SORT MERGE join - improper assumption of cardinality 1 can lead to
CARTESIAN join
11Access and Filter Predicates
Access Predicate controls the index
access Filter Predicate additional filter
condition of index of table Availability of
predicates
12Hypothesis
- By selecting a limited set of objects and
corresponding predicates we can try to find an
answer to some fundamental questions - Â
- are my statistics precise enough?
- are my statistics up-to-date?
- does a histogram help?
- is the sample size OK?
- is dynamic sampling relevant?
- is the partition design suited for the
application? - ready for migration?
13Are My Statistics Precise Enough?
Comparing the estimated and re-estimated
cardinality we obtain the relative and absolute
difference
Predicate - where (("TB"."NORM"(-1) OR
"TB"."NORM"1) AND "TB"."RAND"lt770) ACCE
EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF----
---------- ----------- ---------- ----------same
23580 60317 36737
61Predicate - where ("TA"."NORM"141) and
("TA"."RAND"lt770) ACCE EST_CARD RE_EST_CARD
ABS_DIFF REL_DIFF ---- ---------- -----------
---------- ----------same 2771 1
-2770 -277000
14Are My Statistics Up-to-date?
By observing the historical view we obtain the
dynamic aspect of predicate selectivity
15Processing
Two basic steps are extracting and processing of
the predicate information for execution plans
example
operative task
administrative task
16Simple Model
17Does a Histogram Help?
- Histogram candidates
- column is used in predicate
- bad estimation
- access with different values
- real cardinality is in wide range
Test the influence of histogram size on the
precision of selectivity estimation using
re-estimation queries.
18Histogram / 2
Target precision
Without a histogram (SIZE1) the cardinality is
underestimated (average). If the histogram size
is higher than the number of distinct values the
estimation is very precise. For columns with a
high number of distinct values the estimation
remains imprecise.
19Sample Size
- Re-estimate a predicate on statistics gathered
with different estimate_percent parameter - observe the influence of the sample size
- Possible results
- relatively small percentage returns good results
- when increasing estimate percent the error is
reduced - there is marginal effect of the sample size on
the precision - Q is there one optimal single sample size for an
object?
20Re-estimation and Dynamic Sampling
- add a dynamic sampling hint to the re-estimation
query - trace with 10053 event
- select / dynamic_sampling(tc,2) /
- from tc where norm300 0
- Generated DS query
- SELECT / some hints / NVL(SUM(C1),0),
NVL(SUM(C2),0) FROM - (SELECT / FULL("TC") some other hints / 1 AS
C1, - CASE WHEN "TC"."NORM300"0 THEN 1 ELSE 0 END AS
C2 - FROM "TC" SAMPLE BLOCK (1.076923 , 1) SEED (1)
"TC" - ) SAMPLESUB
Total rows in sample
Sample size controlled by level
rows satisfying the predicate
21Re-estimation and Dynamic Sampling / 2
Results of DS query level 2 sample pct.
50.806452 actual sample size 54147 filtered
sample card. 6018 orig. card. 111358 single
table dynamic sel. est. 0.11114189
DS blocks / table blocks
Total rows in sample
rows satisfying the predicate
Computed selectivity filtered / actual rows (
satisfying / total rows)
22Re-estimation and Dynamic Sampling / 3
- Dynamic sampling has a neat positive side effect
- Level 10 performs exactly the re-estimation
- no need to execute the re-estimation query,
parsing with DS level 10 returns exact result - Lower levels of DS are relatively cheap and can
give some hints
23Dynamic Sampling Levels
Dynamic sampling levels are absolute (except for
level 10)
Block size 8KB
Even for a moderate table the estimate percent of
DS could be too low DS uses block sampling only
24Re-estimation Final Approach
- For a selected predicate
- build re-estimation query
- parse it with different dynamic sampling hints
(e.g. 0, 2, 5 and 10) - peek in the PLAN_TABLE and get the cardinality
and access option
Predicate where part_key 3 and GROW_SHIF gt 0
and shif 0 ACCE EST_CARD EST_DS2_CARD
EST_DS5_CARD RE_EST_CARD DIFF
DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 7982 94
116 89 -7893 -8869 Predicate
where part_key 2 and GROW 100 ACCE
EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD
DIFF DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 4281 13
2 0 -4281 -428100
25Estimation of Lower Levels of Dynamic Sampling
- Two extreme cases can be observed while
re-estimation on lower levels of DS. - Dynamic sampling is extremely precise even on
lower levels - Dynamic sampling doesnt work until a high
level
Real Cardinality
Estimated Cardinality
DS Level 0
5 10
26Evolution of Dynamic Sampling
Consider that dynamic sampling itself is subject
of evolution dynamic sampling on a predicate
with index 9i rel 2 FULL TABLE scan with
SAMPLE clause 10g rel 1 INDEX ACCESS with
ROWNUM lt 2500 10g rel 2 FULL TABLE scan with
SAMPLE clause
27Re-estimation and Partitioning
- Constant pstart and pstop values
- Add predicates for range and list partitions /
subpartitions - ---------------------------------
----------------- - Operation Name Pstart Pstop
- ---------------------------------
----------------- - SELECT STATEMENT
- PARTITION RANGE SINGLE 2 2
- PARTITION LIST SINGLE 3 3
- TABLE ACCESS FULL T2S 6 6
- ----------------------------------
---------------- - Verify granularity parameter
- Verify partition pruning
28Predicates in Parallel Plans
For parallel execution plans no additional
processing by generating re-estimation queries is
required Id Operation Name
Rows -------------------------------------
--------------- 0 SELECT STATEMENT
82647 1 PX COORDINATOR
2 PX SEND QC
(RANDOM) TQ10001 82647 3 HASH
JOIN 82647 4
PX BLOCK ITERATOR 3519 5
TABLE ACCESS FULL TA 3519 6
BUFFER SORT
7 PX RECEIVE 23580
8 PX SEND BROADCAST TQ10000
23580 9 TABLE ACCESS FULL TB
23580
29Bitmap Index Access
Generating of re-estimation queries for bitmap
index access must respect the access logic 1
TABLE ACCESS BY INDEX ROWID T1 13 2
BITMAP CONVERSION TO ROWIDS
3 BITMAP AND
4 BITMAP INDEX SINGLE VALUE IND2_T1
5 BITMAP INDEX SINGLE VALUE IND1_T1
Predicate Information (identified by
operation id) -----------------------------------
---------------- 4 - access("T1"."GROUP2_ID"28
) 5 - access("T1"."STATUS"'X') Generated
predicate ("T1"."GROUP2_ID"28) AND
("T1"."STATUS"'X') The bitmap index cardinality
is missing in the execution plan
30Ready for Upgrade or Crash?
How can the behavior of an application in
migrated environment be predicted? A
sophisticated but expensive method is to compare
execution plans Re-estimation method can be used
as a lightweight timely warning system
31Some Limitations
Limited usage on predicates with bind variables
and parse time partition pruning Predicates
stored in ACCESS_PREDICATES and FILTER_PREDICATES
are not always complete A single predicate is
limited to the length of 4000 bytes Further
Work Weighted function for the result of
re-estimation Relevance to SQL profile
32Summary
- Method to verify the status of database
statistics - Introduction of predicate mining
- A green bar for database statistics
- Dynamic sampling may be used as a verification
method
33References
Christian Antognini, CBO A Configuration
Roadmap, Hotsos Symposium 2005 Wolfgang
Breitling, Using DBMS_STATS in Access Path
Optimization, UKOUG 2004 Jonathan Lewis,
Strategies for Statistics, UKOUG
2004 metalink 72539.1 Interpreting Histogram
Information 114671.1 Gathering Statistics for
the Cost Based Optimizer 236935.1 Global
statistics - An Explanation