Unlocking the Mysteries Behind Update Statistics - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Unlocking the Mysteries Behind Update Statistics

Description:

Scan. The table is scanned in its entirety for update stats high, while it is only ... The approximate number of table scans is defined by the (size of the data to ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 35

Provided by: johnfm5

Category:

more less

Transcript and Presenter's Notes

Title: Unlocking the Mysteries Behind Update Statistics

1
Unlocking the Mysteries Behind Update Statistics

John F. Miller III

2
The Dice Problem

Throw dice, how many will be 1?

3
Questions about the Dice

How many dice are you throwing?
How many sides does each dice have?
Are all the dice the same?

The better the information, the more accurate the
estimate.
4
What does Update Statistics do?

Collects information for the optimizer
Statistics LOW
Distributions MEDIUM
HIGH
Drop Distributions
Compile stored procedures

5
Statistics Collected

systables
systables
syscolumns
syscolumns
sysindexes
sysindexes

Number of Rows
Number of pages to store the data
Second largest value for a column
Second smallest value for a column
of unique values for the lead key
How highly clustered the values for the lead key

6
Update Statistics LowBasic Algorithm

Walk the leaf pages in each index
Submit btree cleaner requests when deleted items
are found causing re-balancing of indexes
Collects the following information
Number of unique items
Number of leave pages
How clustered the data is
Second highest and lowest value

7
How to Read Distributions
of rows represented in this bin
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)
of unique values
Highest Value in this bin

To get the range of values look at the highest
value in the previous bin.

The value
of rows for this value
8
Example - Approximating a Value
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)

There are 868317 rows containing a value between
-1 and 75
There are 70 unique values in this range
The optimizer will deduce 868317 / 70 12,404
records for each value between -1 and 75

9
Example - Dealing with Data Skew
--- DISTRIBUTION --- (
-1 1 ( 868317, 70,
75) 2 ( 868317, 24,
100) 3 ( 868317, 12, 116) 4
( 868317, 30, 147) 5 (
868317, 39, 194) 6 ( 868317,
28, 222) --- OVERFLOW
--- 1 ( 779848, 43) 2
( 462364, 45)

Data skew
For the value 43 how many records will the
optimizer estimate will exist?
Answer 779848 values
Any value that exceeds 25 of the bin size will
be placed in an overflow bin

10
Basic Algorithmfor Medium and High

Develop scan plan based on available resources
Scan table
High All rows
Medium Sample of rows
Sort each column

Build distributions
Begin transaction
Delete old columns distributions
Insert new columns distributions
Commit transaction

11
Scan

The table is scanned in its entirety for update
stats high, while it is only sampled for update
stats medium (see Sample Size)
The reading of rows is done in dirty read
isolation, regardless of what the user has set
for their transaction level.

12
Scan

This scan of the table may occur several times
depending on the amount of sort memory available
and the number of columns to collect statistics
about.
The approximate number of table scans is defined
by the (size of the data to sort) / (amount of
sort memory)

13
Sort

The rows processed by the scan phase are passed
directly to the sort package.
Each column in the row for which statistics are
being generated is passed to a unique invocation
of a sort.

14
Build

After the sort is completed we read the sorted
column data finding out the number of duplicates
and unique values creating approximately 200
range bins by default.
Any count of a duplicates value that exceeds 25
the size of a bin will be placed in an overflow
bin.

15
Insert

Now we have to delete the old distributions and
insert the new distributions. As long as the user
was not in a transaction this will be done as its
own transaction. This transaction will last for
less than 1 second and will hold NO locks on the
tables, but locks on the system catalogs while
the update occurs.

16
Sample Size

HIGH
The entire tables is scanned and all rows are
used.
Medium
Misconception about the number of rows sampled is
based on the number of rows in the table, this is
incorrect.
The number of samples depends on the Confidence
and Resolution. See the following chart.

17
Update Statistics Medium Sample Size
18
Update Statistics Medium Memory Requirements
19
Update Statistics High Memory Requirements

In memory sort
Approximate Memory number of rows sum(column
widths 2 sizeof(pointer) )

20
Memory Rules

Estimated Update Stats memory is below 100MB
Hard coded limit of 4MB
Attempts to minimize the scans by fitting as many
columns into 4MB
Estimated Update Stats memory is above 100MB
Memory is requested from MGM
Attempt to minimize the scans by fitting as many
columns in the MGM memory

21
Examples

Customer Table
Cust_id integer
Fname char(50)
Lname char(50)
Address1 char(200)
Address2 char(200)
State char(2)
zipcode integer
Number of Rows 500,000

22
ExamplesMemory for Incore Sort
23
ExamplesNumber of Table Scans
24
Confidence

A factor in the number of samples used by update
statistics medium

25
Resolution

Percentage of data that is represented in a
distribution bin
Example
100,000 rows in the table
Resolution of 2
Each bin will represent 2,000 rows

26
Improvements in update statistics in 7.31.UD2

UPDATE STATISTICS CAN NOT ALLOCATE SORT MEMORY
BETWEEN 4 AND 100 MB
The default has been raised from 4MB to 15MB
User can now configure the amount of memory
UPDATE STATISTICS USES SLOW SCANNING TECHNOLOGY
WHEN SCANNING A TABLE -- ENABLE LIGHT SCANS
Implemented light scans
Set oriented reads

27
Improvements in update statistics in 7.31.UD2

THE PLAN UPDATE STATISTICS MAKE WHEN SCANNING IS
NOT VIEWABLE BY THE DBA
Set explain will now print the scan path and
resource usage
UPDATE STATISTICS LOW ON FRAGMENT INDEXES RUNS
SERIALLY AND VERY SLOW
With PDQ turned on each index fragment will be
scanned in parallel
PDQ at 1 means 10 of the index fragments scanned
in parallel, while PDQ at 10 means all the index
fragments will be scanned in parallel

28
Improvements in update statistics in 7.31.UD2

ERROR 126 WHEN EXECUTING UPDATE STATISTICS AND
STORED PROCECURE (ALSO ERRORS 312/100)
Errors when trying to insert the distributions
because set lock mode to wait was not handled
properly inside update statistics
SCANNING AN INDEX WHICH IS FRAGMENT IS SLOW DUE
TO THE INEFFICIENT MERGE IN THE FRAGMENT MANAGER
Binary search used instead of the previous nest
loop merge when ordering index fragments
Most noticeable when the number of fragments in
an index is large

29
Tuning with the New Statistics

Turn on PDQ when running update statistics, but
only for tables
Avoid PDQ when updating statistics for procedures
When running high or medium increase the memory
update statistics has to work with
Enable parallel sorting (i.e. PSORT_NPROCS)

30
Considerations

Change the RESOLUTION to 1.5
Increasing the number of bins for the
distributions
Increasing the sample size for update statistics
medium

31
Example

Following Example
Table size 215,000 rows
Row size 445 bytes
Uniprocessor

32
Example of the current update statistics

Table jmiller.t9
Mode HIGH
Number of Bins 267 Bin size 1082
Sort data 101.4 MB
Sort memory granted 4.0 MB
Estimated number of table scans 10
PASS 1 c9
PASS 2 c5
PASS 3 c7
PASS 4 c6
..
PASS 10 c4

Completed pass 1 in 0 minutes 24 seconds
Completed pass 2 in 0 minutes 20 seconds
Completed pass 3 in 0 minutes 17 seconds
Completed pass 4 in 0 minutes 17 seconds
Completed pass 5 in 0 minutes 17 seconds
Completed pass 6 in 0 minutes 15 seconds
Completed pass 7 in 0 minutes 14 seconds
Completed pass 8 in 0 minutes 15 seconds
Completed pass 9 in 0 minutes 16 seconds
Completed pass 10 in 0 minutes 14 seconds

Total Time 146 seconds
33
The new Defaults in 7.31.UD2
Table jmiller.t9 Mode
HIGH Number of Bins 267 Bin size 1082 Sort
data 101.4 MB Sort memory granted
15.0 MB Estimated number of table scans
7 PASS 1 c9,c8,c10,c5,c7 PASS 2 c6,c1 PASS 3
c3 PASS 4 c2 PASS 5 c4