Title: You
1Youre Smarter than a Database
- Overcoming the optimizers bad cardinality
estimates
2About me
- Bobby Durrett
- US Foodservice
- bobbyandmarielle_at_yahoo.com
- Scripts in http//www.geocities.com/bobbyandmariel
le/sqltuning.zip
3What you know
4What the database knows
5Before SQL
- Example - mainframe Datacom/DB COBOL
- List index names
- Write loops
- read a from one index i1 where one.c10
- while more table one rows exist get next row
- read b from two index i2 where two.a one.a
- while more table two rows exist get next row
- print one.a,two.b
- end while
- end while
6SQL
- Tell what you want, not how to get it
- select one.a,two.b
- from
- one,two
- where
- one.c10 and one.atwo.a
7Pre-SQL versus SQL
- Pre-SQL code very efficient runs in megabytes
VSE mainframe COBOL - Labor intensive
- SQL can be inefficient runs in gigabytes (if
you are lucky!) - Much more productive do in minutes what took
hours before create tables
8What the database doesnt know
- Optimizer has a limited set of statistics that
describe the data - It can miscalculate the number of rows a query
will return, its cardinality - A cardinality error can lead optimizer to choose
a slow way to run the SQL
9Example plan/Cardinality
- -------------------------------------------------
- Id Operation Name Rows Cost
- -------------------------------------------------
- 0 SELECT STATEMENT 10 3
- 1 TABLE ACCESS FULL TEST1 10 3
- -------------------------------------------------
- Plan how Oracle will run your query
- Rows how many rows optimizer thinks that step
will return - Cost estimate of time query will take, a
function of the number of rows
10How to fix cardinality problems
- Find out if it really is a cardinality issue
- Determine the reason it occurred
- Single column
- Multiple columns
- Choose a strategy
- Give the optimizer more information
- Override optimizer decision
- Change the application
11Four examples
- Four examples of how the optimizer calculates
cardinality - Full scripts and their outputs on portal, pieces
on slides edited for simplicity
12Step 1 Find out if it really is a cardinality
issue
- Example 1
- Data
- select a,count() from test1 group by a
- A COUNT()
- ---------- ----------
- 1 10
- Query
- select from test1 where a1
13Step 1 Find out if it really is a cardinality
issue
- Get estimated cardinality from plan
- -------------------------------------------
- Id Operation Name Rows
- -------------------------------------------
- 0 SELECT STATEMENT 10
- 1 TABLE ACCESS FULL TEST1 10
- -------------------------------------------
- Do query for actual number of rows
- select count() from test1 where a1
14Step 1 Find out if it really is a cardinality
issue
- Plan is a tree find cardinality and select
count() on part of query represented by that
part of plan.
15Step 2 Understand the reason for the wrong
cardinality
- Unequal distribution of data
- Within a single column
- Last name
- Smith or Jones
- Among multiple columns
- Address
- Zipcode and State
16Step 2 Understand the reason for the wrong
cardinality
- Example 2 - Unequal distribution of values in a
single column - 1,000,000 rows with value 1
- 1 row with value 2
- select a,count() from TEST2 group by a
- A COUNT()
- ---------- ----------
- 1 1000000
- 2 1
17Step 2 Understand the reason for the wrong
cardinality
- SQL statement returns one row
- select from TEST2 where a2
18Step 2 Understand the reason for the wrong
cardinality
- Plan with wrong number of rows 500,000
- Full scan instead of range scan 100 times
slower - ---------------------------------------------
- Operation Name Rows
- ---------------------------------------------
- SELECT STATEMENT 500K
- INDEX FAST FULL SCAN TEST2INDEX 500K
- ---------------------------------------------
19Step 2 Understand the reason for the wrong
cardinality
- Column statistics two distinct values
- LOW HIGH NUM_DISTINCT
- ---------- ---------- ------------
- 1 2 2
- Table statistic total of rows 1,000,001
- NUM_ROWS
- ----------
- 1000001
20Step 2 Understand the reason for the wrong
cardinality
- Rows in plan (rows in table)/ (distinct values
of column) - 5000001000001/2
- Optimizer knew that there were only two values
assumed they had equal number of rows
21Step 2 Understand the reason for the wrong
cardinality
- Example 3 - Combinations of column values not
equally distributed - 1,000,000 rows with values 1,1
- 1,000,000 rows with values 2,2
- 1 row with value 1,2
- Equal numbers of 1s and 2s in each column
- A B COUNT()
- ---------- ---------- ----------
- 1 1 1000000
- 1 2 1
- 2 2 1000000
22Step 2 Understand the reason for the wrong
cardinality
- SQL statement retrieves one row
- select sum(ab)
- from TEST3
- where
- a1 and b2
23Step 2 Understand the reason for the wrong
cardinality
- Plan with wrong number of rows 500,000
- Inefficient full scan
- ----------------------------------------------
- Operation Name Rows
- ----------------------------------------------
- SELECT STATEMENT 1
- SORT AGGREGATE 1
- INDEX FAST FULL SCAN TEST3INDEX 500K
- ----------------------------------------------
24Step 2 Understand the reason for the wrong
cardinality
- Column statistics
- C LOW HIGH NUM_DISTINCT
- - ---------- ---------- ------------
- A 1 2 2
- B 1 2 2
- Table statistic total of rows 2,000,001
- NUM_ROWS
- ----------
- 2000001
25Step 2 Understand the reason for the wrong
cardinality
- Rows in plan (rows in table)/ (distinct
values A distinct values B) - 5000002000001/(2 2)
- Optimizer assumes all four combinations
(1,1),(1,2),(2,1),(2,2) equally likely
26Step 2 Understand the reason for the wrong
cardinality
- How to tell which assumption is in play?
- Select count() each column
- select a,count() from TEST3 group by a
- select b,count() from TEST3 group by b
- count() each column combination
- select a,b,count() from TEST3
- group by a,b
27Step 3 Choose the best strategy for fixing the
cardinality problem
- Giving the optimizer more information
- Histograms
- SQL Profiles
- Overriding optimizer decisions
- Hints
- Changing the application
- Try to use optimizer as much as possible to
minimize development work
28Step 3 Choose the best strategy for fixing the
cardinality problem
- Giving the optimizer more information using
histograms - Works for unequal distribution within a single
column - A histogram records the distribution of values
within a column in up to 254 buckets - Works best on columns with fewer than 255
distinct values
29Step 3 Choose the best strategy for fixing the
cardinality problem
- Run gather_table_stats command to get histograms
on the column 254 is max number of buckets - method_optgt'FOR ALL COLUMNS SIZE 254'
30Step 3 Choose the best strategy for fixing the
cardinality problem
- Plan for Example 2 with correct number of rows
with histogram - Uses range scan
- -----------------------------------------
- Operation Name Rows
- -----------------------------------------
- SELECT STATEMENT 1
- INDEX RANGE SCAN TEST2INDEX 1
- -----------------------------------------
31Step 3 Choose the best strategy for fixing the
cardinality problem
- Column statistics two buckets
- LOW HIGH NUM_DISTINCT
NUM_BUCKETS - ---------- ---------- ------------ -----------
- 1 2 2 2
- Table statistic unchanged
- NUM_ROWS
- ----------
- 1000001
32Step 3 Choose the best strategy for fixing the
cardinality problem
- Time without histograms (1 second)
- Elapsed 000001.00
- Time with histograms(1/100th second)
- Elapsed 000000.01
33Step 3 Choose the best strategy for fixing the
cardinality problem
- Giving the optimizer more information using SQL
Profiles - Works for unequal distribution among multiple
columns - Includes information about the relationship
between columns in the SQL correlated columns
or predicates
34Step 3 Choose the best strategy for fixing the
cardinality problem
- SQL Tuning Advisor gathers statistics on the
columns - ...DBMS_SQLTUNE.CREATE_TUNING_TASK(...
- ...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...
- Accept the SQL Profile it creates to use the new
statistics - ...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...
35Step 3 Choose the best strategy for fixing the
cardinality problem
- Example 3 plan with correct number of rows 1
using SQL profile - --------------------------------------------------
- Operation Name Rows Bytes
- --------------------------------------------------
- SELECT STATEMENT 1 6
- SORT AGGREGATE 1 6
- INDEX RANGE SCAN TEST3INDEX 1 6
- -------------------------------------------------
36Step 3 Choose the best strategy for fixing the
cardinality problem
- Time without a profile (1 second)
- Elapsed 000001.09
- Time with a profile(1/100th second)
- Elapsed 000000.01
37Step 3 Choose the best strategy for fixing the
cardinality problem
- Overriding optimizer decisions using hints
- Example 4 has unequal distribution of column
values across two tables histograms and SQL
Profiles dont work - Hint forces index range scan
- Small amount of additional code not like Cobol
on mainframe
38Step 3 Choose the best strategy for fixing the
cardinality problem
- Example 4 - SMALL table
- MANY relates to 1 there are many rows with
value 1 - FEW relates to 2 there are few with value 2
- insert into SMALL values ('MANY',1)
- insert into SMALL values ('FEW',2)
39Step 3 Choose the best strategy for fixing the
cardinality problem
- Example 4 - LARGE table
- 1,000,000 rows with value 1
- 1 row with value 2
- NUM COUNT()
- ---------- ----------
- 1 1000000
- 2 1
40Step 3 Choose the best strategy for fixing the
cardinality problem
- SQL statement returns one row
- select B.NUM
- from SMALL A,LARGE B
- where
- A.NUMB.NUM and
- A.NAME'FEW'
41Step 3 Choose the best strategy for fixing the
cardinality problem
- Plan with wrong number of rows 125,000
- ----------------------------------------------
- Operation Name Rows
- ----------------------------------------------
- SELECT STATEMENT 125K
- HASH JOIN 125K
- TABLE ACCESS FULL SMALL 1
- INDEX FAST FULL SCAN LARGEINDEX 1000K
- ----------------------------------------------
42Step 3 Choose the best strategy for fixing the
cardinality problem
- Column statistics two buckets on all columns
using histograms - LOW HIGH NUM_DISTINCT NUM_BUCKETS
- ---------- ---------- ------------ -----------
- 1 2 2 2
- LOW HIGH NUM_DISTINCT NUM_BUCKETS
- ---- ---- ------------ -----------
- FEW MANY 2 2
43Step 3 Choose the best strategy for fixing the
cardinality problem
- Table statistics SMALL has 2 rows, LARGE
1000001 - NUM_ROWS
- ----------
- 2
- NUM_ROWS
- ----------
- 1000001
44Step 3 Choose the best strategy for fixing the
cardinality problem
- 1250001000001/8
- Optimizer appears to assume all eight
combinations of the three columns values are
equally likely - Cant verify formula references dont include
formula with histograms - Even worse without histograms cardinality is
500000
45Step 3 Choose the best strategy for fixing the
cardinality problem
- No SQL profile from SQL Tuning Advisor
- There are no recommendations to improve the
statement. - Neither histograms nor SQL profiles help example 4
46Step 3 Choose the best strategy for fixing the
cardinality problem
- Statement with hints
- Use index
- Dont do full scan
- select / INDEX(B LARGEINDEX)
- NO_INDEX_FFS(B LARGEINDEX) /
- B.NUM
- from SMALL A,LARGE B
- where
- a.NUMB.NUM and
- A.NAME'FEW'
47Step 3 Choose the best strategy for fixing the
cardinality problem
- Time without a hint (1 second)
- Elapsed 000001.03
- Time with a hint (1/100th second)
- Elapsed 000000.01
48Step 3 Choose the best strategy for fixing the
cardinality problem
- Changing the application
- Change your tables so that the optimizer gets
your SQLs cardinality right - Requires more work designing tables, but keeps
productivity benefits of SQL
49Step 3 Choose the best strategy for fixing the
cardinality problem
- Example 4 moved NAME column to LARGE table and
split table in two - One million (MANY,1) rows in LARGEA
- One (FEW,2) row in LARGEB
- Query
- select NUM
- from (select from largea
- union
- select from largeb)
- where
- NAME'FEW'
50Step 3 Choose the best strategy for fixing the
cardinality problem
- Plan is just as efficient as with hint
- Number of rows 2 (reality is 1)
- Range Scan
- --------------------------------------------------
------------ - Id Operation Name
Rows - --------------------------------------------------
------------ - 0 SELECT STATEMENT
2 - 1 VIEW
2 - 2 SORT UNIQUE
2 - 3 UNION-ALL
- 4 TABLE ACCESS BY INDEX ROWID LARGEA
1 - 5 INDEX RANGE SCAN
LARGEAINDEX 1 - 6 TABLE ACCESS BY INDEX ROWID LARGEB
1 - 7 INDEX RANGE SCAN
LARGEBINDEX 1 - --------------------------------------------------
------------
51Step 3 Choose the best strategy for fixing the
cardinality problem
- Time without table change (1 second)
- Elapsed 000001.03
- Time with table change (1/100th second)
- Elapsed 000000.01
52Conclusion
- SQL improves productivity, optimizer has limits
- Identify cases where cardinality is wrong
- Understand why the database got it wrong
- One column
- Multiple columns
- Choose best strategy to fix
- Give optimizer more info
- Override optimizers choices
- Redesign tables
53References
- Cost Based Optimizer Fundamentals, Jonathan Lewis
- Metalink Note212809.1, Limitations of the Oracle
Cost Based Optimizer - Metalink Note68992.1, Predicate Selectivity
- Histograms Myths and Facts, Wolfgang Breitling
Select Journal,
Volume 13, Number 3