Title: ASE126 Opening Your Maintenance Window
1ASE126 Opening Your Maintenance Window
Kevin Sherlock Staff Systems Analyst, Dex Media
Inc. kevin.sherlock_at_teamsybase.com Eric
Miner - Senior Engineer, Sybase Inc.
eric.miner_at_sybase.com August 2003
2Making Update Statistics Run Faster
Ways to make it more efficient when you have to
run it
- From tuning the server to using a new feature
- Well talk about ASE 12.5.0.3s new feature
Sampling for Update Statistics - Well show you some simple things you can do now
to help speed up update statistics - All of these are designed to help make your
statistics maintenance easier and more efficient
3Shrinking Maintenance Windows
My maintenance window isnt big enough to run
update statistics!
- As datasets have grown so has the time it takes
to run update statistics - In some cases the maintenance window simply isnt
big enough to run update statistics whether its
needed or not - Your statistics maintenance needs to be as
efficient as possible to make the best use of
your window - You want to keep your statistics up to date but
its next to impossible to do in some cases - How can you speed up update statistics?
4What are the Statistics?
And why do we need them?
- The optimizer is cost based it needs to know
about tables. Indexes and the data - The optimizer makes its decisions based on the
estimated cost of a given access method - The statistics are used to estimate various
selectivity values - The selectivity values are used to estimate the
cost of a given access method - The statistics are the optimizers only view of
your objects and data - Without statistics the optimizer can only make
guesses about the objects and data
5There Are Two Types of Statistics
Both are used by the optimizer
- Table/Index level - describes a table and its
index(es) - Data Leaf page count, Data Leaf row count,
Empty data leaf page count, Forwarded row
count, Deleted row count, OAM allocation page,
First extent leaf pages , Data/Leaf row size,
Index height, Data/Index page cluster ratio, Data
Row cluster ratio - Some values are updated dynamically as DML occurs
- Page row counts, deleted rows, forwarded rows,
cluster ratios - The other values are static and updated by update
statistics - Space Utilization, Large I/O Efficiency are NOT
used by optimizer. Theyre For your information
only helps track fragmentation - Cannot be written directly or with optdiag
quickly overwritten - Stored in systabstats
- Column level - describes the data to the
optimizer - Histogram (distribution), density values, default
selectivity values - Static, need to be updated or written directly
- Stored in sysstatistics
- Updated via update statistics, inputting an
optdiag file or writing directly - Stored in sysstatistics
6Why Maintain the Statistics?
Do I have To Keep the Column Level Statistics Up
to Date?
- In most cases up to date statistics are needed
for the optimizer to produce the most efficient
query plans - In most cases update statistics if
- The distribution of data has changed, usually due
to a significant amount of DML occurring - There have been significant changes to row and
page counts - Its a waste of time to update statistics on
static data - Myth !! Update statistics guarantees good
performance - You may be able to open your maintenance window
by not running update statistics as often as you
have been - Some environments never need update statistics to
be run, some need it every 5 minutes. - Find out where you fall between these extremes
- Test to see if youre running update statistics
too often - Frequency of update statistics runs can become
habit or custom - Question whether or not you really need to run
update statistics as often as you do - Test changing your frequency of running update
statistics
7Sampling for Update Statistics
Dramatically decreases the time for update
statistics to run
- What is Sampling for Update Statistics?
- A new 12.5.0.3 option of update statistics that
reads random pages for statistics gathering - Sampling decreases the I/O required to gather the
statistics by not reading all pages - Percentage of pages to sample can be specified
- Why Sampling for Update Statistics?
- The speed of update statistics has been an issue
for a very long time - Making it run faster has been a top Customer
feature request for years - Opening the statistics maintenance window lowers
the cost of using ASE - Lets be honest.update statistics can be a real
pain to run
8Sampling for Update Statistics cont.
Sampling Time Test - Dramatically decreases the
time for update statistics to run
Integer Column 10,556,400 Rows
105,564 Pages
9Sampling for Update Statistics cont.
Dramatically decreases the time for update
statistics to run
- How does it work?
- The percentage of pages to read is specified in
the sampling extension of the update statistics
command - update statistics tab_name (col_name) with
sampling 10 percent - The specified percentage of pages will be
randomly read - Sampling rates are specified in percentages from
1 to 100 - Reading random pages is cheaper in I/O than
reading random rows, but its not a pure
sampling - Column values from sampled pages are still read
into a worktable, the worktable is sorted and the
statistics gathered from there - Since the number of pages is lower with sampling
the size of the worktable will be smaller and
less tempdb space will be used - The columns histogram will be based on the
values that are sampled
10Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax
- update statistics table_name index_name with
sampling N percent - Will create/update statistics on the leading
column(s) of all indexes on the table or the
specified index - Sampling will not be used, the specified sampling
rate will be ignored - Index pages are not currently read by sampling
this is at the top of the enhancements list - Index has 78780 leaf pages
- Physical Reads Logical Reads Pages Written
- -------------- ------------- --------------
- 14 79021 24
- Because no sort is needed there is a major
savings when scanning the index for the leading
column
11Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.
- update statistics table_name (column_name) with
sampling N percent - Will create/update statistics on the specified
column using the specified sampling rate - update statistics big_id(b)
- Full scan update statistics
- SQL Server elapsed time 939683 ms.
- Physical Reads Logical Reads Pages Written
- --------------- ------------- -------------
- 281243 11017457 119964
- Using a 10 sampling rate
- SQL Server elapsed time 71483 ms.
- Physical Reads Logical Reads Pages Written
- -------------- ------------- --------------
- 23065 1099327 10562
12Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.
- update index statistics table_nameindex_name
with samplingN percent -
- Sampling will not be used on the leading
column(s) of index(es) but will be used on all
inner columns of index(es) - update index statistics big_id abc_nc
- Full scan update statistics
- SQL Server elapsed time 1922336 ms.
- Physical Reads Logical Reads Pages Written
- ------------- ------------ ------------
- 661316 22134151 187666
- Using a 10 sampling rate
- SQL Server elapsed time 248860 ms.
- Physical Reads Logical Reads Pages Written
- -------------- ------------- -------------
- 149096 2297970 27660
13Sampling for Update Statistics cont.
If you must use update all statistics sampling
will make it less painful
- update all statistics table_name with sampling
N percent - Will use sampling to create/update statistics on
all columns in the table that are not leading
column of an index - If you find that you need statistics on all
columns of a table sampling will make update all
statistics much faster - Full scan update all statistics
- SQL Server elapsed time 1390596 ms.(thats 23.2
minutes) - Physical Reads Logical Reads Pages Written
- -------------- ------------- -------------
- 630335 11373699 127653
- Using a 10 sampling rate
- SQL Server elapsed time 531390 ms. (thats 8.86
minutes) - Physical Reads Logical Reads Pages Written
- ------------- ------------- -------------
- 373876 1457379 13812
14Sampling and Tempdb Usage
Smaller worktables needed for sorts in tempdb
- Using sampling with update statistics will
decrease the space needed in tempdb - Applies to columns that are not leading columns
of an index or when update statistics is run on a
specified column - Values from columns are read into a worktable,
sorted and read again to gather the statistics - With a full scan update statistics the worktable
used for the sort can become very large and
cause tempdb to run out of space - Full scan -
- Physical Reads Logical Reads Pages Written
- --------------- ------------- -------------
- 281243 11017457 119964
- Using a 10 sampling rate
- Physical Reads Logical Reads Pages Written
- -------------- ------------- -------------
- 23065 1099327 10562
15What To Watch For When Using Sampling
As with everything there are some trade offs
- Sampling, by its nature does not read all values
in the set - Since not all values are read there will
differences between a sampled histogram and one
obtain with a full scan - Its a balance of time saved running update
statistics vs. accuracy of the statistics - The boundary values and cell weights will vary
from those in a full scan histogram - Unique column
- Full scan
- 4 0.05263158 lt 166680
- 5 0.05263158 lt 222240
- 6 0.05263158 lt 277800
- 7 0.05263158 lt 333360
- 10 Sampling
- 4 0.05261801 lt 165378
- 5 0.05261801 lt 223117
- 6 0.05261801 lt 277841
- 7 0.05261801 lt 332565
- The lower the sampling rate the greater the
variations from the full scan
16What To Watch For When Using Sampling cont.
Min and Max values not in the histogram
- Columns min and max values may not be in the
sampled set - Full scan
- Step Weight Value
- 1 0.00000000 lt 0
- 2 0.05263158 lt 55560
-
- 19 0.05263158 lt 1000080
- 20 0.05263158 lt 1055640
- 10 Sampling
- Step Weight Value
- 1 0.00000000 lt 5390
- 2 0.05263158 lt 55286
- 19 0.05263158 lt 997766
- 20 0.05263158 lt 1053206
- If SARG values fall outside of the histogram the
optimizer makes assumptions about selectivity - If you query near either end of the histogram
check it from time to time to monitor and
effects from sampling
17What To Watch For When Using Sampling cont.
Other things to watch for
- Density values are not affected by sampling
- If there are existing statistics on the column
the density values will not be overwritten, if
there are no statistics a default value of 0.10
will be used - Most of your columns have statistics, if you put
statistics on a column that did previously have
them use a full scan first - This will be changed soon
- Different cell types may appear
- As the sampling rate decreases it is possible
that Frequency count and/or Range cells may
appear where they didnt exist prior to sampling - Highly duplicated values are more likely to be
seen in the histogram - The same pages will be re-sampled if the dataset
is static - and the same sampling rate used
- Dont be surprised
18What Sampling Rate Should You Use?
A rule of thumb based on tests
- Trial and error testing
- Youll need to test to determine what sampling
rate is most efficient for your columns - Check that there are no changes to your query
performance - Its a trade off between faster update statistics
and perfectly accurate statistics - Try starting with a sampling rate of between 20
and 10 - Use optdiag to monitor changes to the histograms
- Use showplan, traces 302/310 to check for changes
to query plans if you see any effects on
performance
19Upcoming Enhancements
We need to add improvements
- Enhancements will be made to sampling
- Will read index pages
- Will scale the density values based on the
sampled set of values - Will enable optdiag and optimizer analysis output
so that they record that sampling was used and
what the last sampling rate was
20Server Tuning
Leverage dynamic configuration options!
- Update statistics is disk io and sort intensive
- Configuration parameters to consider adjusting
- number of sort buffers controls how many
buffers to reserve in an objects cache for
sorting tuning this can depend on the size of
the column(s) being sorted sort runs should be
minimized - formula
- Number of large I/O buffers controls the number
of pages to be used during index creation - I/O polling process count For disk IO bound
operations, tune low for added throughput - Runable process search count Tuning to a very
high number, or zero keeps engines from yielding
to OS, promoting better throughput for task
switching operations - Disable disk mirroring Keeps engines from
having to request and be granted device
semaphores. This is unnecessary overhead which
can be avoided if you dont use ASE mirroring
(this option is enabled by default in 12.5.0.3) - Large IO pools
- Create the largest 8-page (16k, 32k, 64k, 128k)
IO pools as possible - Update statistics uses the
largest IO pools available in both the objects
cache, and tempdb/default cache. - ASE 12.5.0.3 LSA Worktables are created using
Large Scale Allocation. Only if large IO pool is
defined. - Procedure cache usage
- Used for sorting Increasing sort memory
requirements can cause procedures to age-out of
cache resulting in procedure cache misses.
21Tips For Opening Your Maintenance Window cont.
Tips, Techniques and Best Practices
- Avoid Duplicate statistics gathering
- Rebuilding indexes
- create index also collects statistics for index
and leading column (some bugs for sorted_data) - Update statistics lttablenamegt ltindexnamegt is not
necessary - Number of histogram steps for column survives
drop index - Two or more indexes which have common columns
- Leverage indexes which have leading column -
- clustered index i1 on authors(auid)
- nonclustered index i2 on authors(au_lname,auid)
- nonclustered index i3 on authors(auid, au_fname)
- Goal - Column stats for auid
- DO update statistics authors i3 / (not i1) /
- DONT update statistics authors(auid)
- DONT update index statistics authors i2
- Also, when given a choice, use non-clustered
indexes to gather column stats for a leading
column rather than clustered index
22Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill
- Which update statistics syntax do you really need
to run? - update statistics table_name index_name will
update statistics on the leading column of all
indexes on the table or on the specified index - This is the original update statistics syntax
- update index statistics table_name index_name
will update statistics on all columns that are in
indexes on the table or the specified index - More expensive in terms of maintenance, but is
likely to result in more efficient query plans
and better performance its a balance you need
to test - update statistics table_name (column_name) will
update statistics on the specified column only,
will not read index pages - update all statistics table_name will update
statistics on all columns in the table, and will
update partitions statistics. - This can be a real maintenance killer! Very
rarely do you need statistics on all columns of a
table - Test to determine which syntax gives you the most
efficient query plans while giving you a
reasonable maintenance window
23Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill cont.
- Sequential operations vs. Concurrent operations
- Replace single, multiple operation commands with
multiple connections and single operation
commands - Example 1
- c1gt update index statistics authors
- can be converted to
- c1gt update statistics authors ix1
- c2gt update statistics authors ix2
- c3gt update statistics authors(au_lname)
- Example 2
- c1gt update statistics authors
- can be converted to
- c1gt update statistics authors ix1
- c2gt update statistics authors ix2
- c3gt update statistics authors ix3
- Multiple concurrent commands leverage multiple
engines, multiple devices. Higher throughput.
24Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
- Use optdiag to get updated statistics from a copy
of your production data - Use replication or dump/load to create duplicate
data - Update statistics on the copy and get an optdiag
output file from there - Input the optdiag file into the production server
- Moving the data may take a while - use day-old
statistics if necessary - Write your own statistics
- You know your data best
- Dont need update statistics to tell you about
data distribution - Unique, evenly distributed and sequential
monotonic columns allow the column statistics to
be derived - Make sure to know your data
- Table/Index level statistics will not be updated
25Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
- CIS/Proxy Table Tricks
- Addresses problem using update statistics
lttablenamegt (column_list), or update index
statistics, or update all statistics - For minor column attributes, these use the
datapages (tablescan) to build the worktable for
gathering statistics! - Extra work needed to create/maintain views, and
proxy tables - Requires the use of optdiag input mode
- This technique can save LOTS of time
- Can spread IO around to other segments
- How using CIS and Proxy tables works
- Create a view on minor column attributes for each
non-clustered index which forces the use of the
index in the view definition - Create a proxy table on each view
- Run update statistics ltproxytablegt (column)
instead - Extract, and input column level stats into base
table using optdiag - Saves having to scan the base table for each
minor attribute!
26Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
- CIS/Proxy Table Example
- i1m_cl clustered, unique located on default
(unique2, onePercent, twenty) - i1m_nc_c_int nonclustered, unique located on
i1m_nc_c_int (skew7, twenty, userid) - index_name size reserved unused
- -------------------- ---------- ----------
---------- - i1m_cl 1920 KB 224186 KB 42 KB
- i1m_nc_c_int 19094 KB 19120 KB 26 KB
- name rowtotal reserved
data index_size unused - -------------------- ----------- ---------------
--------------- --------------- --------------- - oneMrows 1000000 336296 KB
222224 KB 113930 KB 142 KB - Isqlgt
- Create view i1m_nc_c_int_minor(twenty, userid)
- As SELECT twenty, userid
- FROM oneMrows (index i1m_nc_c_int)
- Create existing table i1m_nc_c_int_proxy(twenty
int, userid int) - External table at loopback.userdb.dbo.i1m_nc_c_in
t_minor
27Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
- CIS/Proxy Table Results
- Normal Runtime 111
- CIS/Proxy Table Runtime 011
28Deleting or Limiting Statistics
Sometimes no statistics are better than having
them
- Choosing not to have statistics on a column
- Much the same as deleting statistics, but youve
chosen not to add them to the column in the first
place or have removed them - A decision to allow the hard-coded default values
to be used - No maintenance required
- Requires some planning and testing though
- Without statistics the optimizer has to use a
small set of hard-coded values for its
selectivity estimate - If statistics give the optimizer an ugly
picture of the distribution the hard-coded
selectivity values may result in a lower
selectivity value - Statistics on a gender column 0.50 selectivity
- Default values will be better - 0.10, 0.25 or
0.33 depending on the SARG operator - Make sure maintenance doesnt add the statistics
again - Make sure to remove update statistics for these
columns from your scripts
29Tempdb Configuration and Considerations
Adjusting tempdb for update statistics sorting
- Worktables will be used when gathering statistics
on some columns - For leading columns of indexes there is no need
to sort since values are already in sorted order
only an index scan is done - For any other column the values must be read
(from the table) into a worktable (in tempdb),
sorted and read again for statistics gathering - The size of the worktable will depend on the
number of rows, the size of the rows, and if
sampling is used, the sampling rate used - You will usually see heavy tempdb usage during
update index
statistics, or when you run a series of
update statistics table_name
(column_name) - It is the worktable creation, sorting and reading
what makes update all statistics
painfully slow on larger tables
30Tempdb Configuration and Considerations cont.
Leverage dynamic configuration options!
- Tempdb Configuration
- Stripe on multiple physical disks
- Stripe size a multiple of logical extent size
Pay attention to OS specific device managers to
tune stripe sizes. - Supports concurrent sorting and worktable
activity - System segment focused
- Place system segment on fastest/dedicated
device - system segment is where worktables for
statistics gathering get created - Use multiple devices to support parallel sort
- Tempdb cache
- Optimize for write intensive tasks
- Largest possible large I/O pools
- Cache status HK ignore cache
- Leverage lazy write feature in ASE 12.5.0.3
31Where To Get More Information
- The Sybase Customer newsgroups
- http//support.sybase.com/newsgroups
- The Sybase list server
- SYBASE-L_at_LISTSERV.UCSB.EDU
- The external Sybase FAQ
- http//www.isug.com/Sybase_FAQ/
- Technical Documents at
- http//www.sybase.com/support/techdocs
- Join the ISUG, ISUG Technical Journal, feature
requests - http//www.isug.com
- Other utilities, scripts, stored procedures
- http//teamsybase.com/kevin.sherlock
32Sybase Developer Network (SDN)
- Additional Resources for Developers/DBAs
- Single point of access to developer software,
services, and up-to-date technical information - White papers and documentation
- Collaboration with other developers and Sybase
engineers - Code samples and beta programs
- Technical recordings
- Free software
- Join today www.sybase.com/developer
33SDN Presents CodeXchange
Share ASE Scripts and Tools
- New SDN feature enables community collaboration
- Download tools created by Sybase
- Leverage contributions of others to help
administer and monitor your servers - Contribute your own code or start your own
collaborative project with input from other ASE
experts - Any SDN member can participate
- Log in using your MySybase account via SDN
- Join the collaboration already underway
- http//ase.codexchange.sybase.com or via SDN at
www.sybase.com/developer