www.EvDBT.com - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

www.EvDBT.com

Description:

1. The Search. for Intelligent Life in. Oracle's Cost-Based Optimizer. in Oracle8 v8.0 and Oracle8i v8.1. Tim Gorman. Consultant ... Mark Twain, 1897 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 45
Provided by: TimGo6
Category:
Tags: evdbt | com | twain | www

less

Transcript and Presenter's Notes

Title: www.EvDBT.com


1
Ottawa Oracle Users Group
The Searchfor Intelligent Life inOracles
Cost-Based Optimizerin Oracle8 v8.0 and Oracle8i
v8.1Tim GormanConsultant Evergreen Database
Technologies, Inc.
  • www.EvDBT.com

2
Agenda for Today
  • The Oracle Optimizer
  • RBO and how it works
  • CBO and how it works
  • Case studies
  • Debugging CBO
  • The recommendations in this presentation are
    obsolete for Oracle 9i and higher
  • Use procedure GATHER_SYSTEM_STATS in DBMS_STATS
    instead
  • Paper at http//www.EvDBT.com/library.htm
  • Also published in
  • SQLgtUpdate (RMOUG newsletter)
  • IOUG-A SELECT magazine
  • And in Russian (see link on website)

3
The infamous Oracle Optimizer
  • Rule-based optimizer (RBO)
  • Was to have been discontinued in Oracle9i
  • The report of my death was an exaggeration
  • Mark Twain, 1897
  • Uses a set of 15-20 rules to determine
    execution path based on hard-coded best practices
  • Cost-based optimizer (CBO)
  • Introduced in Oracle7 v7.0
  • Most problems were due to bugs in ANALYZE
  • Not really usable in production until Oracle8
    v8.0
  • Very reliable from Oracle8 v8.0 onwards
  • Very desirable for production systems in Oracle8i
    onwards

4
RBO
  • RBO seems easy to understand
  • Tuning SQL under RBO involves tricks like
  • WHERE clause predicates are evaluated from
    bottom-to-top
  • Each predicate is ranked according to rules
  • Lower rank determines order of access and join
    operations
  • In the event of a tie in ranking, the order of
    row sources in the FROM clause can affect how the
    tie is broken
  • Sometimes, the ID of an object (table or index)
    can be used to break a tie
  • Dropping/recreating indexes can change plans!
  • Parameter OPTIMIZER_MODE RULE can be set
  • For entire database instance (using init.ora or
    ALTER SYSTEM)
  • For individual sessions (using ALTER SESSION)
  • For individual SQL statements (using hints)

5
RBO Access Path Rankings
  • 1. Single row by ROWID
  • 2. Unique indexed cluster key constant
  • 3. Unique hash cluster key w/ unique key
    constant
  • 4. Entire unique concatenated index constant
  • 5. Unique single-column index constant
  • 6. Entire cluster key cluster key join within
    cluster
  • 7. Hash cluster key constant
  • 8. Entire indexed cluster key constant
  • 9. Entire non-unique concatenated index
    constant
  • 10. Merge of non-unique indexes

6
RBO Access Path Rankings (contd)
  • 11. Entire concatenated index of lower-bound of
    range
  • 12. Incomplete concatenated index constant
  • 13. Unique indexed column BETWEEN or LIKE
  • 14. Non-unique indexed column BETWEEN or LIKE
  • 15. Unbounded range-scan on unique indexed
    columns
  • 16. Unbounded range-scan on non-unique indexed
    columns
  • 17. Sort-merge join
  • 18. MAX or MIN of indexed column
  • 19. ORDER BY on indexed columns
  • 20. Full table scan

7
RBO example
  • SQLgt SELECT SUM(L.SELLING_PRICE
  • 2 (NVL(SUM(LD.QUANTITY()),
  • 3 NVL(L.ORDERED_QUANTITY,0) -
  • 4 NVL(L.CANCELLED_QUANTITY,0)) -
  • 5 NVL(L.INVOICED_QUANTITY,0)))
  • 6 FROM RA_SITE_USES SU,
  • 7 RA_ADDRESSES A,
  • 8 SO_LINE_DETAILS LD,
  • 9 SO_LINES L,
  • 10 SO_HEADERS H
  • 11 WHERE H.S1 1 Which predicate would
    RBO choose first?
  • 12 AND A.CUSTOMER_ID customer_id
  • 13 AND SU.ADDRESS_ID A.ADDRESS_ID
  • 14 AND H.INVOICE_TO_SITE_USE_ID
    SU.SITE_USE_ID
  • 15 AND H.CURRENCY_CODE RTRIM(currency_cod
    e)
  • 16 AND NVL(LD.SCHEDULE_DATE,
  • SQL statement continues for 23 more lines

8
RBO example (contd)
  • SQL statement belongs to an important batch
    program in Oracle Order Entry named Book
    Orders
  • Generally one of the single greatest resource
    hogs on any Oracle OE system
  • What do you think?
  • RBO eliminates the RA_ADDRESSES row-source
    because of ordering in FROM clause
  • RBO chooses index on S1 (instead of
    CURRENCY_CODE) because index on S1 was created
    later than index on CURRENCY_CODE (can be
    verified by querying DBA_OBJECTS)

9
CBO
  • Uses statistics stored in the data dictionary to
    determine cost
  • CBO is just a math processor, following formulas
  • Processing steps includes
  • Base table access costs
  • All access methods are considered for all row
    sources
  • Where parallelism is possible, costs are
    calculated for serial and parallel access
  • Join order and join method computations
  • All possible join orders and join methods are
    considered

10
CBO (contd)
  • Processing steps (contd)
  • OR expansion
  • Possible sorting for ORDER BY using indexes
  • Partition-pruning
  • Parameter OPTIMIZER_MODE can be set to CHOOSE,
    FIRST_ROWS, or ALL_ROWS
  • At instance, session, or SQL statement level
  • CHOOSE is the default setting
  • If none of the tables involved in a SQL statement
    has statistics, then RULE is chosen, otherwise
    COST
  • Using FIRST_ROWS or ALL_ROWS is intended to
    influence the type of join method chosen

11
So what is cost?
  • Cost is intended to represent physical I/O
  • Documentation says physical-IO CPU/1000 1.5
    NetIO
  • Lets just keep it simple and focus on physical
    I/O
  • FULL table scan cost calculations
  • Depends on the number of formatted blocks in a
    table (DBA_TABLES.BLOCKS) beneath the highwater
    mark
  • blocks / DB_FILE_MULTIBLOCK_READ_COUNT
  • EMPTY_BLOCKS column on DBA_TABLES displays blocks
    allocated to table segment but not yet used
  • Does not affect optimizer

12
Index scans
  • Index scan costs
  • Access cost of the index depends on
  • Which index scan operation is used
  • UNIQUE scan
  • Probes root-gtbranch-gtleaf for a single row
  • RANGE scan
  • Probes root-gtbranch-gtleaf to first index entry
  • Then scans leaf-to-leaf until last index entry
  • FULL scan
  • Scans leaf blocks using single-block access
  • FAST FULL scan
  • Scans all blocks in index using multi-block
    access, keeps leaf blocks and discards branch
    blocks

13
Index scans (contd)
Index PK_X
root

branch
branch
branch
branch

leaf
leaf
leaf
leaf
leaf
leaf
leaf


Table X
14
Index scans (contd)
  • Access cost of the index depends on (contd)
  • BTree height (column BLEVEL)
  • Affects UNIQUE index scans
  • Number of leaf blocks (LEAF_BLOCKS)
  • Average number of leaf blocks per distinct value
  • Selectivity of data values
  • Affects RANGE, FULL, and FAST FULL index scans

15
Selectivity
  • Calculated from estimated or computed cardinality
    statistics from ANALYZE or DBMS_STATS
  • DBA_INDEXES.DISTINCT_KEYS
  • DBA_INDEXES.NUM_ROWS
  • DBA_INDEXES.UNIQUENESS
  • LOVAL and HIVAL data values of all indexed
    columns stored in DBA_TAB_HISTOGRAMS
  • S ratio ranging between 0 and 1
  • Roughly calculated as S 1 / DISTINCT_KEYS
  • tending toward 0 more distinct keys
  • tending toward 1 fewer distinct keys

16
Selectivity (contd)
  • Selectivity is based on the predicate operation
  • For equivalence operations (, !, IN, NOT IN)
  • where col value
  • selectivity 1 / -distinct-keys
  • For open-ended range scans
  • where col gt value
  • selectivity (HIVAL value) / (HIVAL LOWVAL)
  • For bounded range scans
  • where col between value1and value2
  • selectivity (value2 value1) / (HIVAL
    LOWVAL)
  • Bind-variables short-circuit this calculation
  • Equivalence and bounded hard-coded at S0.05
  • Open-ended hard-coded at S0.25

17
Histograms
  • CBO normally only has HIVAL, LOVAL, and
    DISTINCT_KEYS to calculate selectivity
  • Must assume an even distribution of data values
  • Not always a valid assumption in real-life!
  • ANALYZE TABLE FOR ALL INDEXED COLUMNS
    column , column-list
  • Gathers histogram information to store in
  • DBA_TAB_HISTOGRAMS (DBA_HISTOGRAMS in v7.3)
  • DBA_PART_HISTOGRAMS (Oracle8 v8.0 and above)
  • DBA_SUBPART_HISTOGRAMS (Oracle8i and above)
  • Histograms are just data range endpoints
    (default 75)

18
Histograms (contd)
Width-Balanced? No
Frequency
10
20
30
40
50
Range
19
Histograms (contd)
Height-Balanced!
Frequency
1
1
5
45
50
Range
20
Histograms (contd)
  • Histograms identify popular and unpopular data
    values
  • Using an index RANGE scan against popular data
    values can be a bad idea
  • More efficient to perform FULL table scan and
    filter out unwanted rows
  • If CBO only knows that an indexed column has a
    small number of DISTINCT_KEYS, it will decide to
    perform FULL table scan
  • Unless it can identify that the data value
    specified in the query is unpopular
  • Then it will use the index instead
  • Pop quiz Why would the use of bind variables
    disable histogram utilization?

21
CBO example
  • SQLgt SELECT SUM(L.SELLING_PRICE
  • 2 (NVL(SUM(LD.QUANTITY()),
  • 3 NVL(L.ORDERED_QUANTITY,0) -
  • 4 NVL(L.CANCELLED_QUANTITY,0)) -
  • 5 NVL(L.INVOICED_QUANTITY,0)))
  • 6 FROM RA_SITE_USES SU,
  • 7 RA_ADDRESSES A,
  • 8 SO_LINE_DETAILS LD,
  • 9 SO_LINES L,
  • 10 SO_HEADERS H
  • 11 WHERE H.S1 1 Which predicate would
    CBO choose first?
  • 12 AND A.CUSTOMER_ID customer_id
  • 13 AND SU.ADDRESS_ID A.ADDRESS_ID
  • 14 AND H.INVOICE_TO_SITE_USE_ID
    SU.SITE_USE_ID
  • 15 AND H.CURRENCY_CODE RTRIM(currency_cod
    e)
  • 16 AND NVL(LD.SCHEDULE_DATE,
  • SQL statement continues for 23 more lines

22
Index scans the rest of the story
  • Cost of accessing the index itself is only half
    of the cost calculation of indexed access
  • Cost of table access from the index also a factor
  • For UNIQUE scans
  • just another single logical read to retrieve a
    row from the table
  • For RANGE, FULL, and FAST FULL scans
  • CLUSTERING_FACTOR measures ordering of table rows
    with index entries
  • Measuring efficiency of table access from index
    entries
  • CLUSTERING_FACTOR value on DBA_INDEXES ranges
    between NUM_ROWS (bad extreme) and BLOCKS (good
    extreme) on DBA_TABLES

23
Basic costing for index scans
  • Calculating cost for a RANGE scan on NONUNIQUE
    index for predicate col1 b1
  • Some statistics
  • DBA_TABLES.BLOCKS 100,000
  • DBA_TABLES.NUM_ROWS 5,000,000
  • DBA_INDEXES.BLEVEL 3
  • DBA_INDEXES.DISTINCT_KEYS 10,000
  • DBA_INDEXES.CLUSTERING_FACTOR 1,000,000
  • DBA_INDEXES.LEAF_BLOCKS10,000
  • DBA_INDEXES.AVG_DATA_BLOCKS_PER_KEY2
  • DBA_INDEXES.AVG_LEAF_BLOCKS_PER_KEY1

24
Basic costing for index scans (contd)
  • Basic cost calculations start with a calculation
    of logical reads
  • Full-table scan cost is 100,000 / 16 6,250
  • Full-index scan cost 10,000
  • Fast full-index scan cost is 10,000 / 16 625
  • Range scan cost
  • BLEVEL (AVG_LEAF_BLOCKS_PER_KEY
  • (NUM_ROWS SELECTIVITY)) 1,500
  • It looks like fast full-index scans win with the
    lowest score, but

25
Cost adjustments for index scans
  • Looking at the cost calculations so far
  • CBO is not considering the fact that physical-I/O
    from indexed access is different than
    physical-I/O from FULL table scan access
  • Since Oracle v6.0.x
  • For indexed access, logical-I/O (a.k.a. accesses
    of blocks in the Buffer Cache) resulting in a
    cache miss, causes single-block physical-I/O
    (a.k.a. accesses of blocks in datafiles)
  • Blocks retrieved by such I/O (wait event db file
    sequential read) are stored in the MRU end of
    the LRU chain of the Buffer Cache
  • For FULL table scan access, logical-I/O resulting
    in a cache miss causes multi-block physical-I/O
  • Blocks retrieved by such I/O (wait event db file
    scattered read) are stored in the LRU end of the
    LRU chain of the Buffer Cache
  • This is not the full story, but it is the general
    picture

26
OPTIMIZER_INDEX_CACHING
  • With such differences in the cacheability of
    index scans versus FULL table scans, is it valid
    to assume a 0 buffer cache hit ratio for indexed
    scans, as is true for FULL table scans?
  • um, nope
  • logical reads on indexed UNIQUE, RANGE and FULL
    scans should then be adjusted to produce physical
    reads
  • Parameter OPTIMIZER_INDEX_CACHING
  • Defaults to 0, can range from 0 to 99
  • Cost (1 (OPTIMIZER_INDEX_CACHING / 100)
  • I recommend setting this parameter value to 90
  • Dont set to maximum value of 99!

27
OPTIMIZER_INDEX_COST_ADJ
  • Similarly, is the service time on I/O requests
    the same for indexed scans as FULL table scans?
  • This can be measured empirically using
  • SQLgt select event, average_wait from
    vsystem_event
  • 2 where event like db file s
  • EVENT AVERAGE_WAIT
  • -------------------------- ------------
  • db file scattered read 1.1283475
  • db file sequential read .121103
  • Parameter OPTIMIZER_INDEX_COST_ADJ
  • Defaults to 100, can range from 1 through 10000
  • Informs CBO of relative cost of indexed access to
    table access
  • Cost (OPTIMIZER_INDEX_COST_ADJ / 100)
  • Set parameter to the ratio of AVERAGE_TIME values
    for
  • db file sequential read / db file scattered
    read

28
Cost adjustments for index scans (contd)
  • So, if OPTIMIZER_INDEX_CACHING set to 90 then
  • Full table scan 6,250 (unadjusted)
  • Fast full index scan 625 (unadjusted)
  • Index FULL scan 10,000 (1 (90/100)) 1,000
    (adjusted)
  • Index RANGE scan 1,500 (1 (90/100)) 150
    (adjusted)
  • And, if OPTIMIZER_INDEX_COST_ADJ set to 50
  • Full table scan 6,250 (unadjusted)
  • Fast full index scan 625 (unadjusted)
  • Index FULL scan 1,000 (50/100) 500
    (adjusted)
  • Index RANGE scan 150 (50/100) 75 (adjusted)
  • But wait! Theres more

29
Now add in costs for table access
  • Costs for indexed scans only do not yet take into
    account the costs of accessing the table rows
  • Cost
  • (AVG_DATA_BLOCKS_PER_KEY
  • (CLUSTERING_FACTOR / BLOCKS))
  • Full table scan 6,250
  • Fast full index scan 625 (2
    (1,000,000/100,000)) 12,500
  • Index FULL scan 500 (2 (1,000,000/100,000))
    10,000
  • Index RANGE scan 75 (2 (1,000,000/100,000))
    1,500
  • Now, plug in different values and see what
    happens!

30
Parameters which affect CBO
  • OPTIMIZER_MODE default CHOOSE, suggested
    default
  • OPTIMIZER_PERCENT_PARALLEL default 0,
    suggested default
  • OPTIMIZER_INDEX_COST_ADJ default 100, suggested
    10-50
  • OPTIMIZER_INDEX_CACHING default 0, suggested 90
  • DB_FILE_MULTIBLOCK_READ_COUNT
  • Default 16, suggested default or less
  • SORT_AREA_SIZE
  • Default 65536, suggested 512K-8M
  • Affects choice of join method favors SORT-MERGE
    when set high
  • HASH_AREA_SIZE
  • Default (SORT_AREA_SIZE 2), suggested default
  • Affects choice of join method favors HASH when
    set high

31
Statistics to examine
  • If you dont like the plan that CBO chose
  • If you wanted to see an index RANGE scan, then
    check
  • DBA_TABLES.BLOCKS
  • DBA_TABLES.NUM_ROWS
  • DBA_INDEXES.CLUSTERING_FACTOR
  • If you suspect the selectivity calculation
  • DBA_INDEXES.DISTINCT_KEYS
  • DBA_TAB_HISTOGRAMS
  • Were bind-variables used instead of embedded
    data values in the SQL text? Then use constant
    literals
  • Often, youll find that the CBO made a wise choice

32
Event 10053
  • This event can be used to dump the CBOs
    decision-tree
  • Set with
  • alter session set events 10053 trace name
    context forever, level 10
  • Creates a .trc file in USER_DUMP_DEST
  • Only when SQL statement is hard-parsed
  • If soft-parsed, then no trace file created
  • The dump is extremely cryptic
  • Only displays calculations, not formulas
  • Only displays the winning cost, not the winning
    execution plan permutation
  • Illustrates how the CBO operates, what factors
    are considered, etc

33
Summary for cost-based optimizer
  • Cost-based optimizer (CBO) works very well
  • If you configure initialization parameters
    sensibly
  • If you remember feed it properly with ANALYZEd
    statistics
  • Can also use DBMS_STATS package in place of
    ANALYZE
  • If you use HISTOGRAMS sparingly and
    appropriately, when ever it seems like that
    columns (indexed or not) will have popular or
    unpopular data values
  • The issue of bind variables is hopefully
    addressed with parameter CURSOR_SHARING EXACT
    FORCE
  • Introduced in v8.1.6, too many bugs!
  • Hopefully works well in v8.1.7, v9.0.1, or v9.2.0
    (havent yet tested it)

34
More time?
  • Lets discuss indexes!
  • BTree indexes and how they work
  • Addressing BTree weaknesses
  • Rebuilding sparse indexes
  • Low cardinality and bitmap indexes
  • Sequential data and reverse indexes

35
BTree index architecture
  • Oracle implements balanced B-Tree (a.k.a. BTree)
    indexes as its primary indexing method
  • tree-structured mechanism consisting of root,
    branch, and leaf nodes
  • each node is one database block in Oracle
  • root is the top-level branch node which is the
    starting point for searches into the index
  • branch nodes point to other branch nodes or to
    leaf nodes
  • leaf nodes contain data to point to table rows
  • when an index is created on an empty table, one
    database block is root, branch, and leaf together
  • indexes can be populated transactionally as rows
    in the table are populated or populated upon
    creation

36
BTree index architecture (contd)
  • Branch entries contain
  • Max data value piece
  • DBA of next level branch or leaf
  • Leaf entries contain
  • Data value
  • ROWID of table row

Index PK_X
root

branch
branch
branch
branch

leaf
leaf
leaf
leaf
leaf
leaf
leaf


Table X
37
BTree index architecture (contd)
  • When indexes are built on populated tables using
    CREATE INDEX, ALTER INDEX REBUILD, or direct-path
    INSERTs
  • root/branch and leaf blocks are populated with
    entries up to the setting ((100 - PCTFREE)
    DB_BLOCK_SIZE)
  • Exactly enough branches to support required leafs
  • When indexes are built transactionally using
    conventional-path INSERT statements
  • Blocks are split in half as they exceed (100 -
    PCTFREE)
  • If data values are random, then back-filling will
    occur in the half-full split blocks
  • If data values are monotonically-ascending, then
    back-filling will not occur to populate the
    half-full split blocks

38
BTree index architecture (contd)
  • Impact of monotonically-ascending data values
  • less efficient space usage
  • more levels in tree-structure




  • How to detect the condition
  • ALTER INDEX VALIDATE STRUCTURE
  • populates view INDEX_STATS with one row
  • column PCT_USED on INDEX_STATS is percentage of
    space utilized in blocks
  • usually about 50-70 percent for this condition

39
BTree index architecture (contd)
Impact of monotonically-ascending data values
with sliding window



  • How to detect the condition
  • column PCT_USED on INDEX_STATS is percentage of
    space utilized in blocks
  • usually less than 35-50 for this condition!!

40
BTree index architecture (contd)
  • Summary of issues with BTree indexes
  • BTree indexes are designed to optimize random
    data distributions with high cardinality while
    supporting high volumes of transactions
  • non-random (monotonically-ascending) data values
    are supported completed, but result in
    less-efficient space management
  • DROP/reCREATE is necessary to correct
  • To address these short-comings
  • Oracle7 v7.3 faster rebuilds with ALTER INDEX
    REBUILD
  • Oracle7 v7.3 bitmap indexes for low-cardinality
    data
  • Oracle8 v8.0 REVERSE indexes to randomize
    non-random data

41
Faster index rebuilds
  • ALTER INDEX REBUILD is faster than DROP then
    re-CREATE INDEX because
  • existing index is used as source for new index
  • only leaf blocks of existing index are scanned to
    build new index
  • less volume of data to scan than table rows
  • data in leaf blocks already sorted
  • No sort operation necessary
  • ALTER INDEX REBUILD can also use
  • parallel execution
  • direct-path (no rollback generated)
  • nologging/unrecoverable (no redo generated)

42
Bitmap indexes
  • Bitmap indexes are designed for quickly scanning
    low cardinality data
  • each database block is comprised of two bitmap
    segments
  • each bitmap segment is comprised of a bitmap and
    a ROWID list
  • ROWID list is a
  • forward-compressed list
  • of ROWIDs, positioned
  • according to bits in bitmap
  • Overhead of rewriting a bitmap segment during
    INSERT, UPDATE, or DELETE results in 40-50 fold
    performance impact!!!

ROWID list
Bitmap
  • Bitmap is organized into
  • columns for distinct data
  • values and rows of bits
  • representing rows

ROWID list
Bitmap
  • More distinct data values
  • means less room for rows
  • within each data value

43
REVERSE indexes
  • REVERSE indexes
  • causes monotonically-ascending data values to
    become more random by simply reversing data
  • 123456 becomes 654321
  • 123457 becomes 754321, etc...
  • data is transparently converted and unconverted
    upon insert and retrieval
  • data is re-fitted to fit Oracles original design
    decision
  • Impact
  • only equivalence operations will use the index
  • , !, ltgt, IN, and NOT IN
  • range-scans will not use the index
  • gt, gt, lt, lt, LIKE, BETWEEN

44
Q A
Write a Comment
User Comments (0)
About PowerShow.com