PowerPointPrsentation - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

PowerPointPrsentation

Description:

Jaromir D.B. Nemec. UKOUG 2005. 27.10.2005. J.Nemec. Are Your Statistics Bad Enough? 2 ... Predicate - where ('TA'.'NORM'=141) and ('TA'.'RAND' =770) ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 34

Provided by: Nem84

Category:

Tags: powerpointprsentation

more less

Transcript and Presenter's Notes

Title: PowerPointPrsentation

1
Are Your Statistics Bad Enough?

Verify the effectiveness of gathering optimizer
statistics Jaromir D.B. Nemec UKOUG 2005
27.10.2005
2
Gathering Statistics

The way from ANALYZE TABLE to DBMS_STATS
Increasing number of parameters and options
changing defaults
CBO more intelligent
a small change can have a big effect
. . . and all that stuff only to get
some redundant information

3
Complexity Trap
In the chain of dependent objects the quantity is
increasing
4
Are My Statistics Adequate?
Way 1 - observe (they are not if you encounter
problems) This is reactive approach Way 2 -
test it (define test cases and check them)
define invariant the invariant must respect
the dynamic of the database!
5
Basis Scenario

Motto fix the problem in its origin
Limit the scope to the basic table access only
(full scan, index access)
This reduces the complexity significantly and
can expose the real origin of problems in the
execution plan

6
Simple Example
select from ta, tb where ta.rand tb.rand
and ta.norm 141 and ta.rand lt 770 and (tb.norm
1 or tb.norm -1) ----------------------------
----------------------------- Id Operation
Name Rows
------------------------------------------------
--------- 0 SELECT STATEMENT
65073 1 HASH JOIN
65073 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- Predicate Information
(identified by operation id) --------------------
------------------------------- 1 -
access("TA"."RAND""TB"."RAND") 2 -
filter("TA"."RAND"lt770) 3 -
access("TA"."NORM"141) 4 - filter(("TB"."NORM"
(-1) OR "TB"."NORM"1) AND "TB"."RAND"lt770)
7
Verify it! Re-estimate

For each table access
Get owner, object name and alias
Option FULL get filter predicate
Option (b-tree) INDEX get access and filter
predicate on table and index line
enclose in brackets and connect with AND
form a re-estimation query
execute the query

8
Re-estimation how does it work?
Input from execution plan 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- 2 - filter("TA"."RAND"lt770)
3 - access("TA"."NORM"141) 4 -
filter(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) Generated re-estimation
query select count() from "TB" where
(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) -- gives 60317 rows select
count() from "TA" where ("TA"."NORM"141) and
("TA"."RAND"lt770) -- gives 1 row Run the
re-estimation query to get the real cardinality,
compare it with the original estimation
9
Re-estimation Summary
Run the re-estimation query to get the real
cardinality, compare it with the original
estimation
Note, that the TA access cardinality is
overestimated The TB access cardinality is
underestimated
10
Re-estimation Basic Idea

The simplification is based on the assumption
that all problems in the execution plan are
caused directly or indirectly by the cardinality
assumption in the base table access.
Examples
Cardinality underestimation can lead to NL join
with big inner table
Cardinality overestimation can switch a NL join
to a SORT MERGE join
improper assumption of cardinality 1 can lead to
CARTESIAN join

11
Access and Filter Predicates
Access Predicate controls the index
access Filter Predicate additional filter
condition of index of table Availability of
predicates
12
Hypothesis

By selecting a limited set of objects and
corresponding predicates we can try to find an
answer to some fundamental questions
are my statistics precise enough?
are my statistics up-to-date?
does a histogram help?
is the sample size OK?
is dynamic sampling relevant?
is the partition design suited for the
application?
ready for migration?

13
Are My Statistics Precise Enough?
Comparing the estimated and re-estimated
cardinality we obtain the relative and absolute
difference
Predicate - where (("TB"."NORM"(-1) OR
"TB"."NORM"1) AND "TB"."RAND"lt770) ACCE
EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF----
---------- ----------- ---------- ----------same
23580 60317 36737
61Predicate - where ("TA"."NORM"141) and
("TA"."RAND"lt770) ACCE EST_CARD RE_EST_CARD
ABS_DIFF REL_DIFF ---- ---------- -----------
---------- ----------same 2771 1
-2770 -277000
14
Are My Statistics Up-to-date?

By observing the historical view we obtain the
dynamic aspect of predicate selectivity
15
Processing

Two basic steps are extracting and processing of
the predicate information for execution plans
example
operative task
administrative task
16
Simple Model

17
Does a Histogram Help?

Histogram candidates
column is used in predicate
bad estimation
access with different values
real cardinality is in wide range

Test the influence of histogram size on the
precision of selectivity estimation using
re-estimation queries.
18
Histogram / 2

Target precision
Without a histogram (SIZE1) the cardinality is
underestimated (average). If the histogram size
is higher than the number of distinct values the
estimation is very precise. For columns with a
high number of distinct values the estimation
remains imprecise.
19
Sample Size

Re-estimate a predicate on statistics gathered
with different estimate_percent parameter
observe the influence of the sample size
Possible results
relatively small percentage returns good results
when increasing estimate percent the error is
reduced
there is marginal effect of the sample size on
the precision
Q is there one optimal single sample size for an
object?

20
Re-estimation and Dynamic Sampling

add a dynamic sampling hint to the re-estimation
query
trace with 10053 event
select / dynamic_sampling(tc,2) /
from tc where norm300 0
Generated DS query
SELECT / some hints / NVL(SUM(C1),0),
NVL(SUM(C2),0) FROM
(SELECT / FULL("TC") some other hints / 1 AS
C1,
CASE WHEN "TC"."NORM300"0 THEN 1 ELSE 0 END AS
C2
FROM "TC" SAMPLE BLOCK (1.076923 , 1) SEED (1)
"TC"
) SAMPLESUB

Total rows in sample
Sample size controlled by level
rows satisfying the predicate
21
Re-estimation and Dynamic Sampling / 2

Results of DS query level 2 sample pct.
50.806452 actual sample size 54147 filtered
sample card. 6018 orig. card. 111358 single
table dynamic sel. est. 0.11114189
DS blocks / table blocks
Total rows in sample
rows satisfying the predicate
Computed selectivity filtered / actual rows (
satisfying / total rows)
22
Re-estimation and Dynamic Sampling / 3

Dynamic sampling has a neat positive side effect
Level 10 performs exactly the re-estimation
no need to execute the re-estimation query,
parsing with DS level 10 returns exact result
Lower levels of DS are relatively cheap and can
give some hints

23
Dynamic Sampling Levels

Dynamic sampling levels are absolute (except for
level 10)
Block size 8KB
Even for a moderate table the estimate percent of
DS could be too low DS uses block sampling only
24
Re-estimation Final Approach

For a selected predicate
build re-estimation query
parse it with different dynamic sampling hints
(e.g. 0, 2, 5 and 10)
peek in the PLAN_TABLE and get the cardinality
and access option

Predicate where part_key 3 and GROW_SHIF gt 0
and shif 0 ACCE EST_CARD EST_DS2_CARD
EST_DS5_CARD RE_EST_CARD DIFF
DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 7982 94
116 89 -7893 -8869 Predicate
where part_key 2 and GROW 100 ACCE
EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD
DIFF DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 4281 13
2 0 -4281 -428100
25
Estimation of Lower Levels of Dynamic Sampling

Two extreme cases can be observed while
re-estimation on lower levels of DS.
Dynamic sampling is extremely precise even on
lower levels
Dynamic sampling doesnt work until a high
level

Real Cardinality
Estimated Cardinality
DS Level 0
5 10
26
Evolution of Dynamic Sampling

Consider that dynamic sampling itself is subject
of evolution dynamic sampling on a predicate
with index 9i rel 2 FULL TABLE scan with
SAMPLE clause 10g rel 1 INDEX ACCESS with
ROWNUM lt 2500 10g rel 2 FULL TABLE scan with
SAMPLE clause
27
Re-estimation and Partitioning

Constant pstart and pstop values
Add predicates for range and list partitions /
subpartitions
---------------------------------
-----------------
Operation Name Pstart Pstop
---------------------------------
-----------------
SELECT STATEMENT
PARTITION RANGE SINGLE 2 2
PARTITION LIST SINGLE 3 3
TABLE ACCESS FULL T2S 6 6
----------------------------------
----------------
Verify granularity parameter
Verify partition pruning

28
Predicates in Parallel Plans

For parallel execution plans no additional
processing by generating re-estimation queries is
required Id Operation Name
Rows -------------------------------------
--------------- 0 SELECT STATEMENT
82647 1 PX COORDINATOR
2 PX SEND QC
(RANDOM) TQ10001 82647 3 HASH
JOIN 82647 4
PX BLOCK ITERATOR 3519 5
TABLE ACCESS FULL TA 3519 6
BUFFER SORT
7 PX RECEIVE 23580
8 PX SEND BROADCAST TQ10000
23580 9 TABLE ACCESS FULL TB
23580
29
Bitmap Index Access

Generating of re-estimation queries for bitmap
index access must respect the access logic 1
TABLE ACCESS BY INDEX ROWID T1 13 2
BITMAP CONVERSION TO ROWIDS
3 BITMAP AND
4 BITMAP INDEX SINGLE VALUE IND2_T1
5 BITMAP INDEX SINGLE VALUE IND1_T1
Predicate Information (identified by
operation id) -----------------------------------
---------------- 4 - access("T1"."GROUP2_ID"28
) 5 - access("T1"."STATUS"'X') Generated
predicate ("T1"."GROUP2_ID"28) AND
("T1"."STATUS"'X') The bitmap index cardinality
is missing in the execution plan
30
Ready for Upgrade or Crash?

How can the behavior of an application in
migrated environment be predicted? A
sophisticated but expensive method is to compare
execution plans Re-estimation method can be used
as a lightweight timely warning system
31
Some Limitations
Limited usage on predicates with bind variables
and parse time partition pruning Predicates
stored in ACCESS_PREDICATES and FILTER_PREDICATES
are not always complete A single predicate is
limited to the length of 4000 bytes Further
Work Weighted function for the result of
re-estimation Relevance to SQL profile

32
Summary

Method to verify the status of database
statistics
Introduction of predicate mining
A green bar for database statistics
Dynamic sampling may be used as a verification
method

33
References

Christian Antognini, CBO A Configuration
Roadmap, Hotsos Symposium 2005 Wolfgang
Breitling, Using DBMS_STATS in Access Path
Optimization, UKOUG 2004 Jonathan Lewis,
Strategies for Statistics, UKOUG
2004 metalink 72539.1 Interpreting Histogram
Information 114671.1 Gathering Statistics for
the Cost Based Optimizer 236935.1 Global
statistics - An Explanation

Write a Comment

User Comments (0)