PowerPointPrsentation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

PowerPointPrsentation

Description:

Jaromir D.B. Nemec. UKOUG 2005. 27.10.2005. J.Nemec. Are Your Statistics Bad Enough? 2 ... Predicate - where ('TA'.'NORM'=141) and ('TA'.'RAND' =770) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 34
Provided by: Nem84
Category:

less

Transcript and Presenter's Notes

Title: PowerPointPrsentation


1
Are Your Statistics Bad Enough?

Verify the effectiveness of gathering optimizer
statistics Jaromir D.B. Nemec UKOUG 2005
27.10.2005
2
Gathering Statistics
  • The way from ANALYZE TABLE to DBMS_STATS
  • Increasing number of parameters and options
  • changing defaults
  • CBO more intelligent
  • a small change can have a big effect
  • . . . and all that stuff only to get
  • some redundant information

3
Complexity Trap
In the chain of dependent objects the quantity is
increasing
4
Are My Statistics Adequate?
Way 1 - observe (they are not if you encounter
problems) This is reactive approach Way 2 -
test it (define test cases and check them)
define invariant the invariant must respect
the dynamic of the database!
5
Basis Scenario
  • Motto fix the problem in its origin
  • Limit the scope to the basic table access only
  • (full scan, index access)
  • This reduces the complexity significantly and
  • can expose the real origin of problems in the
    execution plan

6
Simple Example
select from ta, tb where ta.rand tb.rand
and ta.norm 141 and ta.rand lt 770 and (tb.norm
1 or tb.norm -1) ----------------------------
----------------------------- Id Operation
Name Rows
------------------------------------------------
--------- 0 SELECT STATEMENT
65073 1 HASH JOIN
65073 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- Predicate Information
(identified by operation id) --------------------
------------------------------- 1 -
access("TA"."RAND""TB"."RAND") 2 -
filter("TA"."RAND"lt770) 3 -
access("TA"."NORM"141) 4 - filter(("TB"."NORM"
(-1) OR "TB"."NORM"1) AND "TB"."RAND"lt770)
7
Verify it! Re-estimate
  • For each table access
  • Get owner, object name and alias
  • Option FULL get filter predicate
  • Option (b-tree) INDEX get access and filter
    predicate on table and index line
  • enclose in brackets and connect with AND
  • form a re-estimation query
  • execute the query

8
Re-estimation how does it work?
Input from execution plan 2 TABLE ACCESS
BY INDEX ROWID TA 2771 3
INDEX RANGE SCAN TA_INDEX 3598
4 TABLE ACCESS FULL TB
23580 ------------------------------------------
--------------- 2 - filter("TA"."RAND"lt770)
3 - access("TA"."NORM"141) 4 -
filter(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) Generated re-estimation
query select count() from "TB" where
(("TB"."NORM"(-1) OR "TB"."NORM"1) AND
"TB"."RAND"lt770) -- gives 60317 rows select
count() from "TA" where ("TA"."NORM"141) and
("TA"."RAND"lt770) -- gives 1 row Run the
re-estimation query to get the real cardinality,
compare it with the original estimation
9
Re-estimation Summary
Run the re-estimation query to get the real
cardinality, compare it with the original
estimation
Note, that the TA access cardinality is
overestimated The TB access cardinality is
underestimated
10
Re-estimation Basic Idea
  • The simplification is based on the assumption
    that all problems in the execution plan are
    caused directly or indirectly by the cardinality
    assumption in the base table access.
  • Examples
  • Cardinality underestimation can lead to NL join
    with big inner table
  • Cardinality overestimation can switch a NL join
    to a SORT MERGE join
  • improper assumption of cardinality 1 can lead to
    CARTESIAN join

11
Access and Filter Predicates
Access Predicate controls the index
access Filter Predicate additional filter
condition of index of table Availability of
predicates
12
Hypothesis
  • By selecting a limited set of objects and
    corresponding predicates we can try to find an
    answer to some fundamental questions
  •  
  • are my statistics precise enough?
  • are my statistics up-to-date?
  • does a histogram help?
  • is the sample size OK?
  • is dynamic sampling relevant?
  • is the partition design suited for the
    application?
  • ready for migration?

13
Are My Statistics Precise Enough?
Comparing the estimated and re-estimated
cardinality we obtain the relative and absolute
difference
Predicate - where (("TB"."NORM"(-1) OR
"TB"."NORM"1) AND "TB"."RAND"lt770) ACCE
EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF----
---------- ----------- ---------- ----------same
23580 60317 36737
61Predicate - where ("TA"."NORM"141) and
("TA"."RAND"lt770) ACCE EST_CARD RE_EST_CARD
ABS_DIFF REL_DIFF ---- ---------- -----------
---------- ----------same 2771 1
-2770 -277000
14
Are My Statistics Up-to-date?

By observing the historical view we obtain the
dynamic aspect of predicate selectivity
15
Processing

Two basic steps are extracting and processing of
the predicate information for execution plans
example
operative task
administrative task
16
Simple Model

17
Does a Histogram Help?
  • Histogram candidates
  • column is used in predicate
  • bad estimation
  • access with different values
  • real cardinality is in wide range


Test the influence of histogram size on the
precision of selectivity estimation using
re-estimation queries.
18
Histogram / 2

Target precision
Without a histogram (SIZE1) the cardinality is
underestimated (average). If the histogram size
is higher than the number of distinct values the
estimation is very precise. For columns with a
high number of distinct values the estimation
remains imprecise.
19
Sample Size
  • Re-estimate a predicate on statistics gathered
    with different estimate_percent parameter
  • observe the influence of the sample size
  • Possible results
  • relatively small percentage returns good results
  • when increasing estimate percent the error is
    reduced
  • there is marginal effect of the sample size on
    the precision
  • Q is there one optimal single sample size for an
    object?


20
Re-estimation and Dynamic Sampling
  • add a dynamic sampling hint to the re-estimation
    query
  • trace with 10053 event
  • select / dynamic_sampling(tc,2) /
  • from tc where norm300 0
  • Generated DS query
  • SELECT / some hints / NVL(SUM(C1),0),
    NVL(SUM(C2),0) FROM
  • (SELECT / FULL("TC") some other hints / 1 AS
    C1,
  • CASE WHEN "TC"."NORM300"0 THEN 1 ELSE 0 END AS
    C2
  • FROM "TC" SAMPLE BLOCK (1.076923 , 1) SEED (1)
    "TC"
  • ) SAMPLESUB

Total rows in sample
Sample size controlled by level
rows satisfying the predicate
21
Re-estimation and Dynamic Sampling / 2

Results of DS query level 2 sample pct.
50.806452 actual sample size 54147 filtered
sample card. 6018 orig. card. 111358 single
table dynamic sel. est. 0.11114189
DS blocks / table blocks
Total rows in sample
rows satisfying the predicate
Computed selectivity filtered / actual rows (
satisfying / total rows)
22
Re-estimation and Dynamic Sampling / 3
  • Dynamic sampling has a neat positive side effect
  • Level 10 performs exactly the re-estimation
  • no need to execute the re-estimation query,
    parsing with DS level 10 returns exact result
  • Lower levels of DS are relatively cheap and can
    give some hints

23
Dynamic Sampling Levels

Dynamic sampling levels are absolute (except for
level 10)
Block size 8KB
Even for a moderate table the estimate percent of
DS could be too low DS uses block sampling only
24
Re-estimation Final Approach
  • For a selected predicate
  • build re-estimation query
  • parse it with different dynamic sampling hints
    (e.g. 0, 2, 5 and 10)
  • peek in the PLAN_TABLE and get the cardinality
    and access option

Predicate where part_key 3 and GROW_SHIF gt 0
and shif 0 ACCE EST_CARD EST_DS2_CARD
EST_DS5_CARD RE_EST_CARD DIFF
DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 7982 94
116 89 -7893 -8869 Predicate
where part_key 2 and GROW 100 ACCE
EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD
DIFF DIFF_PREC ---- ---------- ------------
------------ ----------- ----------
---------- same 4281 13
2 0 -4281 -428100
25
Estimation of Lower Levels of Dynamic Sampling
  • Two extreme cases can be observed while
    re-estimation on lower levels of DS.
  • Dynamic sampling is extremely precise even on
    lower levels
  • Dynamic sampling doesnt work until a high
    level

Real Cardinality
Estimated Cardinality
DS Level 0
5 10
26
Evolution of Dynamic Sampling

Consider that dynamic sampling itself is subject
of evolution dynamic sampling on a predicate
with index 9i rel 2 FULL TABLE scan with
SAMPLE clause 10g rel 1 INDEX ACCESS with
ROWNUM lt 2500 10g rel 2 FULL TABLE scan with
SAMPLE clause
27
Re-estimation and Partitioning
  • Constant pstart and pstop values
  • Add predicates for range and list partitions /
    subpartitions
  • ---------------------------------
    -----------------
  • Operation Name Pstart Pstop
  • ---------------------------------
    -----------------
  • SELECT STATEMENT
  • PARTITION RANGE SINGLE 2 2
  • PARTITION LIST SINGLE 3 3
  • TABLE ACCESS FULL T2S 6 6
  • ----------------------------------
    ----------------
  • Verify granularity parameter
  • Verify partition pruning


28
Predicates in Parallel Plans

For parallel execution plans no additional
processing by generating re-estimation queries is
required Id Operation Name
Rows -------------------------------------
--------------- 0 SELECT STATEMENT
82647 1 PX COORDINATOR
2 PX SEND QC
(RANDOM) TQ10001 82647 3 HASH
JOIN 82647 4
PX BLOCK ITERATOR 3519 5
TABLE ACCESS FULL TA 3519 6
BUFFER SORT
7 PX RECEIVE 23580
8 PX SEND BROADCAST TQ10000
23580 9 TABLE ACCESS FULL TB
23580
29
Bitmap Index Access

Generating of re-estimation queries for bitmap
index access must respect the access logic 1
TABLE ACCESS BY INDEX ROWID T1 13 2
BITMAP CONVERSION TO ROWIDS
3 BITMAP AND
4 BITMAP INDEX SINGLE VALUE IND2_T1
5 BITMAP INDEX SINGLE VALUE IND1_T1
Predicate Information (identified by
operation id) -----------------------------------
---------------- 4 - access("T1"."GROUP2_ID"28
) 5 - access("T1"."STATUS"'X') Generated
predicate ("T1"."GROUP2_ID"28) AND
("T1"."STATUS"'X') The bitmap index cardinality
is missing in the execution plan
30
Ready for Upgrade or Crash?

How can the behavior of an application in
migrated environment be predicted? A
sophisticated but expensive method is to compare
execution plans Re-estimation method can be used
as a lightweight timely warning system
31
Some Limitations
Limited usage on predicates with bind variables
and parse time partition pruning Predicates
stored in ACCESS_PREDICATES and FILTER_PREDICATES
are not always complete A single predicate is
limited to the length of 4000 bytes Further
Work Weighted function for the result of
re-estimation Relevance to SQL profile

32
Summary
  • Method to verify the status of database
    statistics
  • Introduction of predicate mining
  • A green bar for database statistics
  • Dynamic sampling may be used as a verification
    method

33
References

Christian Antognini, CBO A Configuration
Roadmap, Hotsos Symposium 2005 Wolfgang
Breitling, Using DBMS_STATS in Access Path
Optimization, UKOUG 2004 Jonathan Lewis,
Strategies for Statistics, UKOUG
2004 metalink 72539.1 Interpreting Histogram
Information 114671.1 Gathering Statistics for
the Cost Based Optimizer 236935.1 Global
statistics - An Explanation
Write a Comment
User Comments (0)
About PowerShow.com