ASE126 Opening Your Maintenance Window - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

ASE126 Opening Your Maintenance Window

Description:

Kevin Sherlock Staff Systems Analyst, Dex Media Inc. kevin.sherlock_at_teamsybase.com ... Know your choices of update statistics syntax and avoid overkill cont. ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 34
Provided by: fellenm
Category:

less

Transcript and Presenter's Notes

Title: ASE126 Opening Your Maintenance Window


1
ASE126 Opening Your Maintenance Window
Kevin Sherlock Staff Systems Analyst, Dex Media
Inc. kevin.sherlock_at_teamsybase.com Eric
Miner - Senior Engineer, Sybase Inc.
eric.miner_at_sybase.com August 2003
2
Making Update Statistics Run Faster
Ways to make it more efficient when you have to
run it
  • From tuning the server to using a new feature
  • Well talk about ASE 12.5.0.3s new feature
    Sampling for Update Statistics
  • Well show you some simple things you can do now
    to help speed up update statistics
  • All of these are designed to help make your
    statistics maintenance easier and more efficient

3
Shrinking Maintenance Windows
My maintenance window isnt big enough to run
update statistics!
  • As datasets have grown so has the time it takes
    to run update statistics
  • In some cases the maintenance window simply isnt
    big enough to run update statistics whether its
    needed or not
  • Your statistics maintenance needs to be as
    efficient as possible to make the best use of
    your window
  • You want to keep your statistics up to date but
    its next to impossible to do in some cases
  • How can you speed up update statistics?

4
What are the Statistics?
And why do we need them?
  • The optimizer is cost based it needs to know
    about tables. Indexes and the data
  • The optimizer makes its decisions based on the
    estimated cost of a given access method
  • The statistics are used to estimate various
    selectivity values
  • The selectivity values are used to estimate the
    cost of a given access method
  • The statistics are the optimizers only view of
    your objects and data
  • Without statistics the optimizer can only make
    guesses about the objects and data

5
There Are Two Types of Statistics
Both are used by the optimizer
  • Table/Index level - describes a table and its
    index(es)
  • Data Leaf page count, Data Leaf row count,
    Empty data leaf page count, Forwarded row
    count, Deleted row count, OAM allocation page,
    First extent leaf pages , Data/Leaf row size,
    Index height, Data/Index page cluster ratio, Data
    Row cluster ratio
  • Some values are updated dynamically as DML occurs
  • Page row counts, deleted rows, forwarded rows,
    cluster ratios
  • The other values are static and updated by update
    statistics
  • Space Utilization, Large I/O Efficiency are NOT
    used by optimizer. Theyre For your information
    only helps track fragmentation
  • Cannot be written directly or with optdiag
    quickly overwritten
  • Stored in systabstats
  • Column level - describes the data to the
    optimizer
  • Histogram (distribution), density values, default
    selectivity values
  • Static, need to be updated or written directly
  • Stored in sysstatistics
  • Updated via update statistics, inputting an
    optdiag file or writing directly
  • Stored in sysstatistics

6
Why Maintain the Statistics?
Do I have To Keep the Column Level Statistics Up
to Date?
  • In most cases up to date statistics are needed
    for the optimizer to produce the most efficient
    query plans
  • In most cases update statistics if
  • The distribution of data has changed, usually due
    to a significant amount of DML occurring
  • There have been significant changes to row and
    page counts
  • Its a waste of time to update statistics on
    static data
  • Myth !! Update statistics guarantees good
    performance
  • You may be able to open your maintenance window
    by not running update statistics as often as you
    have been
  • Some environments never need update statistics to
    be run, some need it every 5 minutes.
  • Find out where you fall between these extremes
  • Test to see if youre running update statistics
    too often
  • Frequency of update statistics runs can become
    habit or custom
  • Question whether or not you really need to run
    update statistics as often as you do
  • Test changing your frequency of running update
    statistics

7
Sampling for Update Statistics
Dramatically decreases the time for update
statistics to run
  • What is Sampling for Update Statistics?
  • A new 12.5.0.3 option of update statistics that
    reads random pages for statistics gathering
  • Sampling decreases the I/O required to gather the
    statistics by not reading all pages
  • Percentage of pages to sample can be specified
  • Why Sampling for Update Statistics?
  • The speed of update statistics has been an issue
    for a very long time
  • Making it run faster has been a top Customer
    feature request for years
  • Opening the statistics maintenance window lowers
    the cost of using ASE
  • Lets be honest.update statistics can be a real
    pain to run

8
Sampling for Update Statistics cont.
Sampling Time Test - Dramatically decreases the
time for update statistics to run
Integer Column 10,556,400 Rows
105,564 Pages
9
Sampling for Update Statistics cont.
Dramatically decreases the time for update
statistics to run
  • How does it work?
  • The percentage of pages to read is specified in
    the sampling extension of the update statistics
    command
  • update statistics tab_name (col_name) with
    sampling 10 percent
  • The specified percentage of pages will be
    randomly read
  • Sampling rates are specified in percentages from
    1 to 100
  • Reading random pages is cheaper in I/O than
    reading random rows, but its not a pure
    sampling
  • Column values from sampled pages are still read
    into a worktable, the worktable is sorted and the
    statistics gathered from there
  • Since the number of pages is lower with sampling
    the size of the worktable will be smaller and
    less tempdb space will be used
  • The columns histogram will be based on the
    values that are sampled

10
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax
  • update statistics table_name index_name with
    sampling N percent
  • Will create/update statistics on the leading
    column(s) of all indexes on the table or the
    specified index
  • Sampling will not be used, the specified sampling
    rate will be ignored
  • Index pages are not currently read by sampling
    this is at the top of the enhancements list
  • Index has 78780 leaf pages
  • Physical Reads Logical Reads Pages Written
  • -------------- ------------- --------------
  • 14 79021 24
  • Because no sort is needed there is a major
    savings when scanning the index for the leading
    column

11
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.
  • update statistics table_name (column_name) with
    sampling N percent
  • Will create/update statistics on the specified
    column using the specified sampling rate
  • update statistics big_id(b)
  • Full scan update statistics
  • SQL Server elapsed time 939683 ms.
  • Physical Reads Logical Reads Pages Written
  • --------------- ------------- -------------
  • 281243 11017457 119964
  • Using a 10 sampling rate
  • SQL Server elapsed time 71483 ms.
  • Physical Reads Logical Reads Pages Written
  • -------------- ------------- --------------
  • 23065 1099327 10562

12
Sampling for Update Statistics cont.
How Does It Work cont. The Effects of Sampling
with Update Statistics Syntax cont.
  • update index statistics table_nameindex_name
    with samplingN percent
  • Sampling will not be used on the leading
    column(s) of index(es) but will be used on all
    inner columns of index(es)
  • update index statistics big_id abc_nc
  • Full scan update statistics
  • SQL Server elapsed time 1922336 ms.
  • Physical Reads Logical Reads Pages Written
  • ------------- ------------ ------------
  • 661316 22134151 187666
  • Using a 10 sampling rate
  • SQL Server elapsed time 248860 ms.
  • Physical Reads Logical Reads Pages Written
  • -------------- ------------- -------------
  • 149096 2297970 27660

13
Sampling for Update Statistics cont.
If you must use update all statistics sampling
will make it less painful
  • update all statistics table_name with sampling
    N percent
  • Will use sampling to create/update statistics on
    all columns in the table that are not leading
    column of an index
  • If you find that you need statistics on all
    columns of a table sampling will make update all
    statistics much faster
  • Full scan update all statistics
  • SQL Server elapsed time 1390596 ms.(thats 23.2
    minutes)
  • Physical Reads Logical Reads Pages Written
  • -------------- ------------- -------------
  • 630335 11373699 127653
  • Using a 10 sampling rate
  • SQL Server elapsed time 531390 ms. (thats 8.86
    minutes)
  • Physical Reads Logical Reads Pages Written
  • ------------- ------------- -------------
  • 373876 1457379 13812

14
Sampling and Tempdb Usage
Smaller worktables needed for sorts in tempdb
  • Using sampling with update statistics will
    decrease the space needed in tempdb
  • Applies to columns that are not leading columns
    of an index or when update statistics is run on a
    specified column
  • Values from columns are read into a worktable,
    sorted and read again to gather the statistics
  • With a full scan update statistics the worktable
    used for the sort can become very large and
    cause tempdb to run out of space
  • Full scan -
  • Physical Reads Logical Reads Pages Written
  • --------------- ------------- -------------
  • 281243 11017457 119964
  • Using a 10 sampling rate
  • Physical Reads Logical Reads Pages Written
  • -------------- ------------- -------------
  • 23065 1099327 10562

15
What To Watch For When Using Sampling
As with everything there are some trade offs
  • Sampling, by its nature does not read all values
    in the set
  • Since not all values are read there will
    differences between a sampled histogram and one
    obtain with a full scan
  • Its a balance of time saved running update
    statistics vs. accuracy of the statistics
  • The boundary values and cell weights will vary
    from those in a full scan histogram
  • Unique column
  • Full scan
  • 4 0.05263158 lt 166680
  • 5 0.05263158 lt 222240
  • 6 0.05263158 lt 277800
  • 7 0.05263158 lt 333360
  • 10 Sampling
  • 4 0.05261801 lt 165378
  • 5 0.05261801 lt 223117
  • 6 0.05261801 lt 277841
  • 7 0.05261801 lt 332565
  • The lower the sampling rate the greater the
    variations from the full scan

16
What To Watch For When Using Sampling cont.
Min and Max values not in the histogram
  • Columns min and max values may not be in the
    sampled set
  • Full scan
  • Step Weight Value
  • 1 0.00000000 lt 0
  • 2 0.05263158 lt 55560
  • 19 0.05263158 lt 1000080
  • 20 0.05263158 lt 1055640
  • 10 Sampling
  • Step Weight Value
  • 1 0.00000000 lt 5390
  • 2 0.05263158 lt 55286
  • 19 0.05263158 lt 997766
  • 20 0.05263158 lt 1053206
  • If SARG values fall outside of the histogram the
    optimizer makes assumptions about selectivity
  • If you query near either end of the histogram
    check it from time to time to monitor and
    effects from sampling

17
What To Watch For When Using Sampling cont.
Other things to watch for
  • Density values are not affected by sampling
  • If there are existing statistics on the column
    the density values will not be overwritten, if
    there are no statistics a default value of 0.10
    will be used
  • Most of your columns have statistics, if you put
    statistics on a column that did previously have
    them use a full scan first
  • This will be changed soon
  • Different cell types may appear
  • As the sampling rate decreases it is possible
    that Frequency count and/or Range cells may
    appear where they didnt exist prior to sampling
  • Highly duplicated values are more likely to be
    seen in the histogram
  • The same pages will be re-sampled if the dataset
    is static
  • and the same sampling rate used
  • Dont be surprised

18
What Sampling Rate Should You Use?
A rule of thumb based on tests
  • Trial and error testing
  • Youll need to test to determine what sampling
    rate is most efficient for your columns
  • Check that there are no changes to your query
    performance
  • Its a trade off between faster update statistics
    and perfectly accurate statistics
  • Try starting with a sampling rate of between 20
    and 10
  • Use optdiag to monitor changes to the histograms
  • Use showplan, traces 302/310 to check for changes
    to query plans if you see any effects on
    performance

19
Upcoming Enhancements
We need to add improvements
  • Enhancements will be made to sampling
  • Will read index pages
  • Will scale the density values based on the
    sampled set of values
  • Will enable optdiag and optimizer analysis output
    so that they record that sampling was used and
    what the last sampling rate was

20
Server Tuning
Leverage dynamic configuration options!
  • Update statistics is disk io and sort intensive
  • Configuration parameters to consider adjusting
  • number of sort buffers controls how many
    buffers to reserve in an objects cache for
    sorting tuning this can depend on the size of
    the column(s) being sorted sort runs should be
    minimized
  • formula
  • Number of large I/O buffers controls the number
    of pages to be used during index creation
  • I/O polling process count For disk IO bound
    operations, tune low for added throughput
  • Runable process search count Tuning to a very
    high number, or zero keeps engines from yielding
    to OS, promoting better throughput for task
    switching operations
  • Disable disk mirroring Keeps engines from
    having to request and be granted device
    semaphores. This is unnecessary overhead which
    can be avoided if you dont use ASE mirroring
    (this option is enabled by default in 12.5.0.3)
  • Large IO pools
  • Create the largest 8-page (16k, 32k, 64k, 128k)
    IO pools as possible - Update statistics uses the
    largest IO pools available in both the objects
    cache, and tempdb/default cache.
  • ASE 12.5.0.3 LSA Worktables are created using
    Large Scale Allocation. Only if large IO pool is
    defined.
  • Procedure cache usage
  • Used for sorting Increasing sort memory
    requirements can cause procedures to age-out of
    cache resulting in procedure cache misses.

21
Tips For Opening Your Maintenance Window cont.
Tips, Techniques and Best Practices
  • Avoid Duplicate statistics gathering
  • Rebuilding indexes
  • create index also collects statistics for index
    and leading column (some bugs for sorted_data)
  • Update statistics lttablenamegt ltindexnamegt is not
    necessary
  • Number of histogram steps for column survives
    drop index
  • Two or more indexes which have common columns
  • Leverage indexes which have leading column -
  • clustered index i1 on authors(auid)
  • nonclustered index i2 on authors(au_lname,auid)
  • nonclustered index i3 on authors(auid, au_fname)
  • Goal - Column stats for auid
  • DO update statistics authors i3 / (not i1) /
  • DONT update statistics authors(auid)
  • DONT update index statistics authors i2
  • Also, when given a choice, use non-clustered
    indexes to gather column stats for a leading
    column rather than clustered index

22
Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill
  • Which update statistics syntax do you really need
    to run?
  • update statistics table_name index_name will
    update statistics on the leading column of all
    indexes on the table or on the specified index
  • This is the original update statistics syntax
  • update index statistics table_name index_name
    will update statistics on all columns that are in
    indexes on the table or the specified index
  • More expensive in terms of maintenance, but is
    likely to result in more efficient query plans
    and better performance its a balance you need
    to test
  • update statistics table_name (column_name) will
    update statistics on the specified column only,
    will not read index pages
  • update all statistics table_name will update
    statistics on all columns in the table, and will
    update partitions statistics.
  • This can be a real maintenance killer! Very
    rarely do you need statistics on all columns of a
    table
  • Test to determine which syntax gives you the most
    efficient query plans while giving you a
    reasonable maintenance window

23
Tips For Opening Your Maintenance Window cont.
Know your choices of update statistics syntax and
avoid overkill cont.
  • Sequential operations vs. Concurrent operations
  • Replace single, multiple operation commands with
    multiple connections and single operation
    commands
  • Example 1
  • c1gt update index statistics authors
  • can be converted to
  • c1gt update statistics authors ix1
  • c2gt update statistics authors ix2
  • c3gt update statistics authors(au_lname)
  • Example 2
  • c1gt update statistics authors
  • can be converted to
  • c1gt update statistics authors ix1
  • c2gt update statistics authors ix2
  • c3gt update statistics authors ix3
  • Multiple concurrent commands leverage multiple
    engines, multiple devices. Higher throughput.

24
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
  • Use optdiag to get updated statistics from a copy
    of your production data
  • Use replication or dump/load to create duplicate
    data
  • Update statistics on the copy and get an optdiag
    output file from there
  • Input the optdiag file into the production server
  • Moving the data may take a while - use day-old
    statistics if necessary
  • Write your own statistics
  • You know your data best
  • Dont need update statistics to tell you about
    data distribution
  • Unique, evenly distributed and sequential
    monotonic columns allow the column statistics to
    be derived
  • Make sure to know your data
  • Table/Index level statistics will not be updated

25
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
  • CIS/Proxy Table Tricks
  • Addresses problem using update statistics
    lttablenamegt (column_list), or update index
    statistics, or update all statistics
  • For minor column attributes, these use the
    datapages (tablescan) to build the worktable for
    gathering statistics!
  • Extra work needed to create/maintain views, and
    proxy tables
  • Requires the use of optdiag input mode
  • This technique can save LOTS of time
  • Can spread IO around to other segments
  • How using CIS and Proxy tables works
  • Create a view on minor column attributes for each
    non-clustered index which forces the use of the
    index in the view definition
  • Create a proxy table on each view
  • Run update statistics ltproxytablegt (column)
    instead
  • Extract, and input column level stats into base
    table using optdiag
  • Saves having to scan the base table for each
    minor attribute!

26
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
  • CIS/Proxy Table Example
  • i1m_cl clustered, unique located on default
    (unique2, onePercent, twenty)
  • i1m_nc_c_int nonclustered, unique located on
    i1m_nc_c_int (skew7, twenty, userid)
  • index_name size reserved unused
  • -------------------- ---------- ----------
    ----------
  • i1m_cl 1920 KB 224186 KB 42 KB
  • i1m_nc_c_int 19094 KB 19120 KB 26 KB
  • name rowtotal reserved
    data index_size unused
  • -------------------- ----------- ---------------
    --------------- --------------- ---------------
  • oneMrows 1000000 336296 KB
    222224 KB 113930 KB 142 KB
  • Isqlgt
  • Create view i1m_nc_c_int_minor(twenty, userid)
  • As SELECT twenty, userid
  • FROM oneMrows (index i1m_nc_c_int)
  • Create existing table i1m_nc_c_int_proxy(twenty
    int, userid int)
  • External table at loopback.userdb.dbo.i1m_nc_c_in
    t_minor

27
Tips For Opening Your Maintenance Window cont.
Alternative ways to gather/create statistics
  • CIS/Proxy Table Results
  • Normal Runtime 111
  • CIS/Proxy Table Runtime 011

28
Deleting or Limiting Statistics
Sometimes no statistics are better than having
them
  • Choosing not to have statistics on a column
  • Much the same as deleting statistics, but youve
    chosen not to add them to the column in the first
    place or have removed them
  • A decision to allow the hard-coded default values
    to be used
  • No maintenance required
  • Requires some planning and testing though
  • Without statistics the optimizer has to use a
    small set of hard-coded values for its
    selectivity estimate
  • If statistics give the optimizer an ugly
    picture of the distribution the hard-coded
    selectivity values may result in a lower
    selectivity value
  • Statistics on a gender column 0.50 selectivity
  • Default values will be better - 0.10, 0.25 or
    0.33 depending on the SARG operator
  • Make sure maintenance doesnt add the statistics
    again
  • Make sure to remove update statistics for these
    columns from your scripts

29
Tempdb Configuration and Considerations
Adjusting tempdb for update statistics sorting
  • Worktables will be used when gathering statistics
    on some columns
  • For leading columns of indexes there is no need
    to sort since values are already in sorted order
    only an index scan is done
  • For any other column the values must be read
    (from the table) into a worktable (in tempdb),
    sorted and read again for statistics gathering
  • The size of the worktable will depend on the
    number of rows, the size of the rows, and if
    sampling is used, the sampling rate used
  • You will usually see heavy tempdb usage during
    update index
    statistics, or when you run a series of
    update statistics table_name
    (column_name)
  • It is the worktable creation, sorting and reading
    what makes update all statistics
    painfully slow on larger tables

30
Tempdb Configuration and Considerations cont.
Leverage dynamic configuration options!
  • Tempdb Configuration
  • Stripe on multiple physical disks
  • Stripe size a multiple of logical extent size
    Pay attention to OS specific device managers to
    tune stripe sizes.
  • Supports concurrent sorting and worktable
    activity
  • System segment focused
  • Place system segment on fastest/dedicated
    device
  • system segment is where worktables for
    statistics gathering get created
  • Use multiple devices to support parallel sort
  • Tempdb cache
  • Optimize for write intensive tasks
  • Largest possible large I/O pools
  • Cache status HK ignore cache
  • Leverage lazy write feature in ASE 12.5.0.3

31
Where To Get More Information
  • The Sybase Customer newsgroups
  • http//support.sybase.com/newsgroups
  • The Sybase list server
  • SYBASE-L_at_LISTSERV.UCSB.EDU
  • The external Sybase FAQ
  • http//www.isug.com/Sybase_FAQ/
  • Technical Documents at
  • http//www.sybase.com/support/techdocs
  • Join the ISUG, ISUG Technical Journal, feature
    requests
  • http//www.isug.com
  • Other utilities, scripts, stored procedures
  • http//teamsybase.com/kevin.sherlock

32
Sybase Developer Network (SDN)
  • Additional Resources for Developers/DBAs
  • Single point of access to developer software,
    services, and up-to-date technical information
  • White papers and documentation
  • Collaboration with other developers and Sybase
    engineers
  • Code samples and beta programs
  • Technical recordings
  • Free software
  • Join today www.sybase.com/developer

33
SDN Presents CodeXchange
Share ASE Scripts and Tools
  • New SDN feature enables community collaboration
  • Download tools created by Sybase
  • Leverage contributions of others to help
    administer and monitor your servers
  • Contribute your own code or start your own
    collaborative project with input from other ASE
    experts
  • Any SDN member can participate
  • Log in using your MySybase account via SDN
  • Join the collaboration already underway
  • http//ase.codexchange.sybase.com or via SDN at
    www.sybase.com/developer
Write a Comment
User Comments (0)
About PowerShow.com