Unlocking the Mysteries Behind Update Statistics - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Unlocking the Mysteries Behind Update Statistics

Description:

Scan. The table is scanned in its entirety for update stats high, while it is only ... The approximate number of table scans is defined by the (size of the data to ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 35
Provided by: johnfm5
Category:

less

Transcript and Presenter's Notes

Title: Unlocking the Mysteries Behind Update Statistics


1
Unlocking the Mysteries Behind Update Statistics
  • John F. Miller III

2
The Dice Problem
  • Throw dice, how many will be 1?

3
Questions about the Dice
  • How many dice are you throwing?
  • How many sides does each dice have?
  • Are all the dice the same?

The better the information, the more accurate the
estimate.
4
What does Update Statistics do?
  • Collects information for the optimizer
  • Statistics LOW
  • Distributions MEDIUM
    HIGH
  • Drop Distributions
  • Compile stored procedures

5
Statistics Collected
  • systables
  • systables
  • syscolumns
  • syscolumns
  • sysindexes
  • sysindexes
  • Number of Rows
  • Number of pages to store the data
  • Second largest value for a column
  • Second smallest value for a column
  • of unique values for the lead key
  • How highly clustered the values for the lead key

6
Update Statistics LowBasic Algorithm
  • Walk the leaf pages in each index
  • Submit btree cleaner requests when deleted items
    are found causing re-balancing of indexes
  • Collects the following information
  • Number of unique items
  • Number of leave pages
  • How clustered the data is
  • Second highest and lowest value

7
How to Read Distributions
of rows represented in this bin
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)
of unique values
Highest Value in this bin
  • To get the range of values look at the highest
    value in the previous bin.

The value
of rows for this value
8
Example - Approximating a Value
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)
  • There are 868317 rows containing a value between
    -1 and 75
  • There are 70 unique values in this range
  • The optimizer will deduce 868317 / 70 12,404
    records for each value between -1 and 75

9
Example - Dealing with Data Skew
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)
  • Data skew
  • For the value 43 how many records will the
    optimizer estimate will exist?
  • Answer 779848 values
  • Any value that exceeds 25 of the bin size will
    be placed in an overflow bin

10
Basic Algorithmfor Medium and High
  • Develop scan plan based on available resources
  • Scan table
  • High All rows
  • Medium Sample of rows
  • Sort each column
  • Build distributions
  • Begin transaction
  • Delete old columns distributions
  • Insert new columns distributions
  • Commit transaction

11
Scan
  • The table is scanned in its entirety for update
    stats high, while it is only sampled for update
    stats medium (see Sample Size)
  • The reading of rows is done in dirty read
    isolation, regardless of what the user has set
    for their transaction level.

12
Scan
  • This scan of the table may occur several times
    depending on the amount of sort memory available
    and the number of columns to collect statistics
    about.
  • The approximate number of table scans is defined
    by the (size of the data to sort) / (amount of
    sort memory)

13
Sort
  • The rows processed by the scan phase are passed
    directly to the sort package.
  • Each column in the row for which statistics are
    being generated is passed to a unique invocation
    of a sort.

14
Build
  • After the sort is completed we read the sorted
    column data finding out the number of duplicates
    and unique values creating approximately 200
    range bins by default.
  • Any count of a duplicates value that exceeds 25
    the size of a bin will be placed in an overflow
    bin.

15
Insert
  • Now we have to delete the old distributions and
    insert the new distributions. As long as the user
    was not in a transaction this will be done as its
    own transaction. This transaction will last for
    less than 1 second and will hold NO locks on the
    tables, but locks on the system catalogs while
    the update occurs.

16
Sample Size
  • HIGH
  • The entire tables is scanned and all rows are
    used.
  • Medium
  • Misconception about the number of rows sampled is
    based on the number of rows in the table, this is
    incorrect.
  • The number of samples depends on the Confidence
    and Resolution. See the following chart.

17
Update Statistics Medium Sample Size
18
Update Statistics Medium Memory Requirements
19
Update Statistics High Memory Requirements
  • In memory sort
  • Approximate Memory number of rows sum(column
    widths 2 sizeof(pointer) )

20
Memory Rules
  • Estimated Update Stats memory is below 100MB
  • Hard coded limit of 4MB
  • Attempts to minimize the scans by fitting as many
    columns into 4MB
  • Estimated Update Stats memory is above 100MB
  • Memory is requested from MGM
  • Attempt to minimize the scans by fitting as many
    columns in the MGM memory

21
Examples
  • Customer Table
  • Cust_id integer
  • Fname char(50)
  • Lname char(50)
  • Address1 char(200)
  • Address2 char(200)
  • State char(2)
  • zipcode integer
  • Number of Rows 500,000

22
ExamplesMemory for Incore Sort
23
ExamplesNumber of Table Scans
24
Confidence
  • A factor in the number of samples used by update
    statistics medium

25
Resolution
  • Percentage of data that is represented in a
    distribution bin
  • Example
  • 100,000 rows in the table
  • Resolution of 2
  • Each bin will represent 2,000 rows

26
Improvements in update statistics in 7.31.UD2
  • UPDATE STATISTICS CAN NOT ALLOCATE SORT MEMORY
    BETWEEN 4 AND 100 MB
  • The default has been raised from 4MB to 15MB
  • User can now configure the amount of memory
  • UPDATE STATISTICS USES SLOW SCANNING TECHNOLOGY
    WHEN SCANNING A TABLE -- ENABLE LIGHT SCANS
  • Implemented light scans
  • Set oriented reads

27
Improvements in update statistics in 7.31.UD2
  • THE PLAN UPDATE STATISTICS MAKE WHEN SCANNING IS
    NOT VIEWABLE BY THE DBA
  • Set explain will now print the scan path and
    resource usage
  • UPDATE STATISTICS LOW ON FRAGMENT INDEXES RUNS
    SERIALLY AND VERY SLOW
  • With PDQ turned on each index fragment will be
    scanned in parallel
  • PDQ at 1 means 10 of the index fragments scanned
    in parallel, while PDQ at 10 means all the index
    fragments will be scanned in parallel

28
Improvements in update statistics in 7.31.UD2
  • ERROR 126 WHEN EXECUTING UPDATE STATISTICS AND
    STORED PROCECURE (ALSO ERRORS 312/100)
  • Errors when trying to insert the distributions
    because set lock mode to wait was not handled
    properly inside update statistics
  • SCANNING AN INDEX WHICH IS FRAGMENT IS SLOW DUE
    TO THE INEFFICIENT MERGE IN THE FRAGMENT MANAGER
  • Binary search used instead of the previous nest
    loop merge when ordering index fragments
  • Most noticeable when the number of fragments in
    an index is large

29
Tuning with the New Statistics
  • Turn on PDQ when running update statistics, but
    only for tables
  • Avoid PDQ when updating statistics for procedures
  • When running high or medium increase the memory
    update statistics has to work with
  • Enable parallel sorting (i.e. PSORT_NPROCS)

30
Considerations
  • Change the RESOLUTION to 1.5
  • Increasing the number of bins for the
    distributions
  • Increasing the sample size for update statistics
    medium

31
Example
  • Following Example
  • Table size 215,000 rows
  • Row size 445 bytes
  • Uniprocessor

32
Example of the current update statistics
  • Table jmiller.t9
  • Mode HIGH
  • Number of Bins 267 Bin size 1082
  • Sort data 101.4 MB
  • Sort memory granted 4.0 MB
  • Estimated number of table scans 10
  • PASS 1 c9
  • PASS 2 c5
  • PASS 3 c7
  • PASS 4 c6
  • ..
  • PASS 10 c4
  • Completed pass 1 in 0 minutes 24 seconds
  • Completed pass 2 in 0 minutes 20 seconds
  • Completed pass 3 in 0 minutes 17 seconds
  • Completed pass 4 in 0 minutes 17 seconds
  • Completed pass 5 in 0 minutes 17 seconds
  • Completed pass 6 in 0 minutes 15 seconds
  • Completed pass 7 in 0 minutes 14 seconds
  • Completed pass 8 in 0 minutes 15 seconds
  • Completed pass 9 in 0 minutes 16 seconds
  • Completed pass 10 in 0 minutes 14 seconds

Total Time 146 seconds
33
The new Defaults in 7.31.UD2
Table jmiller.t9 Mode
HIGH Number of Bins 267 Bin size 1082 Sort
data 101.4 MB Sort memory granted
15.0 MB Estimated number of table scans
7 PASS 1 c9,c8,c10,c5,c7 PASS 2 c6,c1 PASS 3
c3 PASS 4 c2 PASS 5 c4
  • Completed pass 1 in 0 minutes 34 seconds
  • Completed pass 2 in 0 minutes 19 seconds
  • Completed pass 3 in 0 minutes 16 seconds
  • Completed pass 4 in 0 minutes 14 seconds
  • Completed pass 5 in 0 minutes 15 seconds

Total Time 98 seconds
New Memory Default
34
Enabling PDQ with update statistics
  • Table jmiller.t9
  • Mode HIGH
  • Number of Bins 267 Bin size 1082
  • Sort data 101.4 MB
  • PDQ memory granted 106.5 MB
  • Estimated number of table scans 1
  • PASS 1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10
  • Index scans disabled
  • Light scans enabled
  • Completed pass 1 in 0 minutes 29 seconds

PDQ Memory
Features Enabled
Total Time 29 seconds
Write a Comment
User Comments (0)
About PowerShow.com