ASE126 Opening Your Maintenance Window

About This Presentation

Title:

ASE126 Opening Your Maintenance Window

Description:

Kevin Sherlock Staff Systems Analyst, Dex Media Inc. kevin.sherlock_at_teamsybase.com ... Know your choices of update statistics syntax and avoid overkill cont. ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 34

Provided by: fellenm

Category:

more less

Transcript and Presenter's Notes

Title: ASE126 Opening Your Maintenance Window

1
ASE126 Opening Your Maintenance Window
Kevin Sherlock Staff Systems Analyst, Dex Media
Inc. kevin.sherlock_at_teamsybase.com Eric
Miner - Senior Engineer, Sybase Inc.
eric.miner_at_sybase.com August 2003
2
Making Update Statistics Run Faster
Ways to make it more efficient when you have to
run it

From tuning the server to using a new feature
Well talk about ASE 12.5.0.3s new feature
Sampling for Update Statistics
Well show you some simple things you can do now
to help speed up update statistics
All of these are designed to help make your
statistics maintenance easier and more efficient

3
Shrinking Maintenance Windows
My maintenance window isnt big enough to run
update statistics!

As datasets have grown so has the time it takes
to run update statistics
In some cases the maintenance window simply isnt
big enough to run update statistics whether its
needed or not
Your statistics maintenance needs to be as
efficient as possible to make the best use of
your window
You want to keep your statistics up to date but
its next to impossible to do in some cases
How can you speed up update statistics?

4
What are the Statistics?
And why do we need them?

The optimizer is cost based it needs to know
about tables. Indexes and the data
The optimizer makes its decisions based on the
estimated cost of a given access method
The statistics are used to estimate various
selectivity values
The selectivity values are used to estimate the
cost of a given access method
The statistics are the optimizers only view of
your objects and data
Without statistics the optimizer can only make
guesses about the objects and data

5
There Are Two Types of Statistics
Both are used by the optimizer

Table/Index level - describes a table and its
index(es)
Data Leaf page count, Data Leaf row count,
Empty data leaf page count, Forwarded row
count, Deleted row count, OAM allocation page,
First extent leaf pages , Data/Leaf row size,
Index height, Data/Index page cluster ratio, Data
Row cluster ratio
Some values are updated dynamically as DML occurs
Page row counts, deleted rows, forwarded rows,
cluster ratios
The other values are static and updated by update
statistics
Space Utilization, Large I/O Efficiency are NOT
used by optimizer. Theyre For your information
only helps track fragmentation
Cannot be written directly or with optdiag
quickly overwritten
Stored in systabstats
Column level - describes the data to the
optimizer
Histogram (distribution), density values, default
selectivity values
Static, need to be updated or written directly
Stored in sysstatistics
Updated via update statistics, inputting an
optdiag file or writing directly
Stored in sysstatistics

6
Why Maintain the Statistics?
Do I have To Keep the Column Level Statistics Up
to Date?

In most cases up to date statistics are needed
for the optimizer to produce the most efficient
query plans
In most cases update statistics if
The distribution of data has changed, usually due
to a significant amount of DML occurring
There have been significant changes to row and
page counts
Its a waste of time to update statistics on
static data
Myth !! Update statistics guarantees good
performance
You may be able to open your maintenance window
by not running update statistics as often as you
have been
Some environments never need update statistics to
be run, some need it every 5 minutes.
Find out where you fall between these extremes
Test to see if youre running update statistics
too often
Frequency of update statistics runs can become
habit or custom
Question whether or not you really need to run
update statistics as often as you do
Test changing your frequency of running update
statistics

7
Sampling for Update Statistics
Dramatically decreases the time for update
statistics to run

What is Sampling for Update Statistics?
A new 12.5.0.3 option of update statistics that
reads random pages for statistics gathering
Sampling decreases the I/O required to gather the
statistics by not reading all pages
Percentage of pages to sample can be specified
Why Sampling for Update Statistics?
The speed of update statistics has been an issue
for a very long time
Making it run faster has been a top Customer
feature request for years
Opening the statistics maintenance window lowers
the cost of using ASE
Lets be honest.update statistics can be a real
pain to run

8
Sampling for Update Statistics cont.
Sampling Time Test - Dramatically decreases the
time for update statistics to run
Integer Column 10,556,400 Rows
105,564 Pages
9
Sampling for Update Statistics cont.
Dramatically decreases the time for update
statistics to run

How does it work?
The percentage of pages to read is specified in
the sampling extension of the update statistics
command
update statistics tab_name (col_name) with
sampling 10 percent
The specified percentage of pages will be
randomly read
Sampling rates are specified in percentages from
1 to 100
Reading random pages is cheaper in I/O than
reading random rows, but its not a pure
sampling
Column values from sampled pages are still read
into a worktable, the worktable is sorted and the
statistics gathered from there
Since the number of pages is lower with sampling
the size of the worktable will be smaller and
less tempdb space will be used
The columns histogram will be based on the
values that are sampled

10
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax

update statistics table_name index_name with
sampling N percent
Will create/update statistics on the leading
column(s) of all indexes on the table or the
specified index
Sampling will not be used, the specified sampling
rate will be ignored
Index pages are not currently read by sampling
this is at the top of the enhancements list
Index has 78780 leaf pages
Physical Reads Logical Reads Pages Written
-------------- ------------- --------------
14 79021 24
Because no sort is needed there is a major
savings when scanning the index for the leading
column

11
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.

update statistics table_name (column_name) with
sampling N percent
Will create/update statistics on the specified
column using the specified sampling rate
update statistics big_id(b)
Full scan update statistics
SQL Server elapsed time 939683 ms.
Physical Reads Logical Reads Pages Written
--------------- ------------- -------------
281243 11017457 119964
Using a 10 sampling rate
SQL Server elapsed time 71483 ms.
Physical Reads Logical Reads Pages Written
-------------- ------------- --------------
23065 1099327 10562

12
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.

update index statistics table_nameindex_name
with samplingN percent
Sampling will not be used on the leading
column(s) of index(es) but will be used on all
inner columns of index(es)
update index statistics big_id abc_nc
Full scan update statistics
SQL Server elapsed time 1922336 ms.
Physical Reads Logical Reads Pages Written
------------- ------------ ------------
661316 22134151 187666
Using a 10 sampling rate
SQL Server elapsed time 248860 ms.
Physical Reads Logical Reads Pages Written
-------------- ------------- -------------
149096 2297970 27660

13
Sampling for Update Statistics cont.
If you must use update all statistics sampling
will make it less painful

update all statistics table_name with sampling
N percent
Will use sampling to create/update statistics on
all columns in the table that are not leading
column of an index
If you find that you need statistics on all
columns of a table sampling will make update all
statistics much faster
Full scan update all statistics
SQL Server elapsed time 1390596 ms.(thats 23.2
minutes)
Physical Reads Logical Reads Pages Written
-------------- ------------- -------------
630335 11373699 127653
Using a 10 sampling rate
SQL Server elapsed time 531390 ms. (thats 8.86
minutes)
Physical Reads Logical Reads Pages Written
------------- ------------- -------------
373876 1457379 13812

14
Sampling and Tempdb Usage
Smaller worktables needed for sorts in tempdb

Using sampling with update statistics will
decrease the space needed in tempdb
Applies to columns that are not leading columns
of an index or when update statistics is run on a
specified column
Values from columns are read into a worktable,
sorted and read again to gather the statistics
With a full scan update statistics the worktable
used for the sort can become very large and
cause tempdb to run out of space
Full scan -
Physical Reads Logical Reads Pages Written
--------------- ------------- -------------
281243 11017457 119964
Using a 10 sampling rate
Physical Reads Logical Reads Pages Written
-------------- ------------- -------------
23065 1099327 10562

15
What To Watch For When Using Sampling
As with everything there are some trade offs

Sampling, by its nature does not read all values
in the set
Since not all values are read there will
differences between a sampled histogram and one
obtain with a full scan
Its a balance of time saved running update
statistics vs. accuracy of the statistics
The boundary values and cell weights will vary
from those in a full scan histogram
Unique column
Full scan
4 0.05263158 lt 166680
5 0.05263158 lt 222240
6 0.05263158 lt 277800
7 0.05263158 lt 333360
10 Sampling
4 0.05261801 lt 165378
5 0.05261801 lt 223117
6 0.05261801 lt 277841
7 0.05261801 lt 332565
The lower the sampling rate the greater the
variations from the full scan

16
What To Watch For When Using Sampling cont.
Min and Max values not in the histogram

Columns min and max values may not be in the
sampled set
Full scan
Step Weight Value
1 0.00000000 lt 0
2 0.05263158 lt 55560
19 0.05263158 lt 1000080
20 0.05263158 lt 1055640
10 Sampling
Step Weight Value
1 0.00000000 lt 5390
2 0.05263158 lt 55286
19 0.05263158 lt 997766
20 0.05263158 lt 1053206
If SARG values fall outside of the histogram the
optimizer makes assumptions about selectivity
If you query near either end of the histogram
check it from time to time to monitor and
effects from sampling

17
What To Watch For When Using Sampling cont.
Other things to watch for

Density values are not affected by sampling
If there are existing statistics on the column
the density values will not be overwritten, if
there are no statistics a default value of 0.10
will be used
Most of your columns have statistics, if you put
statistics on a column that did previously have
them use a full scan first
This will be changed soon
Different cell types may appear
As the sampling rate decreases it is possible
that Frequency count and/or Range cells may
appear where they didnt exist prior to sampling
Highly duplicated values are more likely to be
seen in the histogram
The same pages will be re-sampled if the dataset
is static
and the same sampling rate used
Dont be surprised

18
What Sampling Rate Should You Use?
A rule of thumb based on tests

Trial and error testing
Youll need to test to determine what sampling
rate is most efficient for your columns
Check that there are no changes to your query
performance
Its a trade off between faster update statistics
and perfectly accurate statistics
Try starting with a sampling rate of between 20
and 10
Use optdiag to monitor changes to the histograms
Use showplan, traces 302/310 to check for changes
to query plans if you see any effects on
performance

19
Upcoming Enhancements
We need to add improvements

Enhancements will be made to sampling
Will read index pages
Will scale the density values based on the
sampled set of values
Will enable optdiag and optimizer analysis output
so that they record that sampling was used and
what the last sampling rate was

20
Server Tuning
Leverage dynamic configuration options!

Update statistics is disk io and sort intensive
Configuration parameters to consider adjusting
number of sort buffers controls how many
buffers to reserve in an objects cache for
sorting tuning this can depend on the size of
the column(s) being sorted sort runs should be
minimized
formula
Number of large I/O buffers controls the number
of pages to be used during index creation
I/O polling process count For disk IO bound
operations, tune low for added throughput
Runable process search count Tuning to a very
high number, or zero keeps engines from yielding
to OS, promoting better throughput for task
switching operations
Disable disk mirroring Keeps engines from
having to request and be granted device
semaphores. This is unnecessary overhead which
can be avoided if you dont use ASE mirroring
(this option is enabled by default in 12.5.0.3)
Large IO pools
Create the largest 8-page (16k, 32k, 64k, 128k)
IO pools as possible - Update statistics uses the
largest IO pools available in both the objects
cache, and tempdb/default cache.
ASE 12.5.0.3 LSA Worktables are created using
Large Scale Allocation. Only if large IO pool is
defined.
Procedure cache usage
Used for sorting Increasing sort memory
requirements can cause procedures to age-out of
cache resulting in procedure cache misses.

21
Tips For Opening Your Maintenance Window cont.
Tips, Techniques and Best Practices

Avoid Duplicate statistics gathering
Rebuilding indexes
create index also collects statistics for index
and leading column (some bugs for sorted_data)
Update statistics lttablenamegt ltindexnamegt is not
necessary
Number of histogram steps for column survives
drop index
Two or more indexes which have common columns
Leverage indexes which have leading column -
clustered index i1 on authors(auid)
nonclustered index i2 on authors(au_lname,auid)
nonclustered index i3 on authors(auid, au_fname)
Goal - Column stats for auid
DO update statistics authors i3 / (not i1) /
DONT update statistics authors(auid)
DONT update index statistics authors i2
Also, when given a choice, use non-clustered
indexes to gather column stats for a leading
column rather than clustered index

22
Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill

Which update statistics syntax do you really need
to run?
update statistics table_name index_name will
update statistics on the leading column of all
indexes on the table or on the specified index
This is the original update statistics syntax
update index statistics table_name index_name
will update statistics on all columns that are in
indexes on the table or the specified index
More expensive in terms of maintenance, but is
likely to result in more efficient query plans
and better performance its a balance you need
to test
update statistics table_name (column_name) will
update statistics on the specified column only,
will not read index pages
update all statistics table_name will update
statistics on all columns in the table, and will
update partitions statistics.
This can be a real maintenance killer! Very
rarely do you need statistics on all columns of a
table
Test to determine which syntax gives you the most
efficient query plans while giving you a
reasonable maintenance window

23
Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill cont.

Sequential operations vs. Concurrent operations
Replace single, multiple operation commands with
multiple connections and single operation
commands
Example 1
c1gt update index statistics authors
can be converted to
c1gt update statistics authors ix1
c2gt update statistics authors ix2
c3gt update statistics authors(au_lname)
Example 2
c1gt update statistics authors
can be converted to
c1gt update statistics authors ix1
c2gt update statistics authors ix2
c3gt update statistics authors ix3
Multiple concurrent commands leverage multiple
engines, multiple devices. Higher throughput.

24
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics

Use optdiag to get updated statistics from a copy
of your production data
Use replication or dump/load to create duplicate
data
Update statistics on the copy and get an optdiag
output file from there
Input the optdiag file into the production server
Moving the data may take a while - use day-old
statistics if necessary
Write your own statistics
You know your data best
Dont need update statistics to tell you about
data distribution
Unique, evenly distributed and sequential
monotonic columns allow the column statistics to
be derived
Make sure to know your data
Table/Index level statistics will not be updated

25
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics

CIS/Proxy Table Tricks
Addresses problem using update statistics
lttablenamegt (column_list), or update index
statistics, or update all statistics
For minor column attributes, these use the
datapages (tablescan) to build the worktable for
gathering statistics!
Extra work needed to create/maintain views, and
proxy tables
Requires the use of optdiag input mode
This technique can save LOTS of time
Can spread IO around to other segments
How using CIS and Proxy tables works
Create a view on minor column attributes for each
non-clustered index which forces the use of the
index in the view definition
Create a proxy table on each view
Run update statistics ltproxytablegt (column)
instead
Extract, and input column level stats into base
table using optdiag
Saves having to scan the base table for each
minor attribute!

26
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics

CIS/Proxy Table Example
i1m_cl clustered, unique located on default
(unique2, onePercent, twenty)
i1m_nc_c_int nonclustered, unique located on
i1m_nc_c_int (skew7, twenty, userid)
index_name size reserved unused
-------------------- ---------- ----------
----------
i1m_cl 1920 KB 224186 KB 42 KB
i1m_nc_c_int 19094 KB 19120 KB 26 KB
name rowtotal reserved
data index_size unused
-------------------- ----------- ---------------
--------------- --------------- ---------------
oneMrows 1000000 336296 KB
222224 KB 113930 KB 142 KB
Isqlgt
Create view i1m_nc_c_int_minor(twenty, userid)
As SELECT twenty, userid
FROM oneMrows (index i1m_nc_c_int)
Create existing table i1m_nc_c_int_proxy(twenty
int, userid int)
External table at loopback.userdb.dbo.i1m_nc_c_in
t_minor

27
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics

CIS/Proxy Table Results
Normal Runtime 111
CIS/Proxy Table Runtime 011

28
Deleting or Limiting Statistics
Sometimes no statistics are better than having
them

Choosing not to have statistics on a column
Much the same as deleting statistics, but youve
chosen not to add them to the column in the first
place or have removed them
A decision to allow the hard-coded default values
to be used
No maintenance required
Requires some planning and testing though
Without statistics the optimizer has to use a
small set of hard-coded values for its
selectivity estimate
If statistics give the optimizer an ugly
picture of the distribution the hard-coded
selectivity values may result in a lower
selectivity value
Statistics on a gender column 0.50 selectivity
Default values will be better - 0.10, 0.25 or
0.33 depending on the SARG operator
Make sure maintenance doesnt add the statistics
again
Make sure to remove update statistics for these
columns from your scripts

29
Tempdb Configuration and Considerations
Adjusting tempdb for update statistics sorting

Worktables will be used when gathering statistics
on some columns
For leading columns of indexes there is no need
to sort since values are already in sorted order
only an index scan is done
For any other column the values must be read
(from the table) into a worktable (in tempdb),
sorted and read again for statistics gathering
The size of the worktable will depend on the
number of rows, the size of the rows, and if
sampling is used, the sampling rate used
You will usually see heavy tempdb usage during
update index
statistics, or when you run a series of
update statistics table_name
(column_name)
It is the worktable creation, sorting and reading
what makes update all statistics
painfully slow on larger tables

30
Tempdb Configuration and Considerations cont.
Leverage dynamic configuration options!

Tempdb Configuration
Stripe on multiple physical disks
Stripe size a multiple of logical extent size
Pay attention to OS specific device managers to
tune stripe sizes.
Supports concurrent sorting and worktable
activity
System segment focused
Place system segment on fastest/dedicated
device
system segment is where worktables for
statistics gathering get created
Use multiple devices to support parallel sort
Tempdb cache
Optimize for write intensive tasks
Largest possible large I/O pools
Cache status HK ignore cache
Leverage lazy write feature in ASE 12.5.0.3

31
Where To Get More Information

The Sybase Customer newsgroups
http//support.sybase.com/newsgroups
The Sybase list server
SYBASE-L_at_LISTSERV.UCSB.EDU
The external Sybase FAQ
http//www.isug.com/Sybase_FAQ/
Technical Documents at
http//www.sybase.com/support/techdocs
Join the ISUG, ISUG Technical Journal, feature
requests
http//www.isug.com
Other utilities, scripts, stored procedures
http//teamsybase.com/kevin.sherlock

32
Sybase Developer Network (SDN)

Additional Resources for Developers/DBAs
Single point of access to developer software,
services, and up-to-date technical information
White papers and documentation
Collaboration with other developers and Sybase
engineers
Code samples and beta programs
Technical recordings
Free software
Join today www.sybase.com/developer

33
SDN Presents CodeXchange
Share ASE Scripts and Tools

New SDN feature enables community collaboration
Download tools created by Sybase
Leverage contributions of others to help
administer and monitor your servers
Contribute your own code or start your own
collaborative project with input from other ASE
experts
Any SDN member can participate
Log in using your MySybase account via SDN
Join the collaboration already underway
http//ase.codexchange.sybase.com or via SDN at
www.sybase.com/developer