Analyze That Database Now - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Analyze That Database Now

Description:

... in the statement's WHERE clause. The FROM clause if SAMPLE or SAMPLE BLOCK ... In Oracle8i, the MONITORING clause in the CREATE (or ALTER) TABLE statement is ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 40
Provided by: ValuedGate2302
Category:
Tags: analyze | clause | database | now

less

Transcript and Presenter's Notes

Title: Analyze That Database Now


1
Analyze That Database Now! (Using
DBMS_STATS) Kent Hinckley IOUG LIVE! 2004
Systems Plus Paper 515 gkhinckley_at_aol.com
2
Introduction
  • Statistics can be generated to quantify data
    distribution and storage characteristics of
    tables, columns, indexes, and partitions
  • The cost-based optimizer uses these statistics to
    determine
  • The statements execution plan
  • The optimal join order and join method
  • Developers design frequency requirements for
    statistics gathering
  • Based on frequency of modifications which change
    data volumes or column values
  • DBAs then gather statistics based on the design
  • Essential both developers and DBAs understand the
    process

3
Understanding the Cost-Based Optimizer
  • There is almost always more than one way for
    Oracle to execute a DML statement (SELECT,
    INSERT, UPDATE, MERGE, DELETE)
  • Objective of the optimizer is to choose the most
    efficient method
  • Optimizer tasks include
  • Possible transformation of the statement into an
    equivalent, more efficient statement
  • Decides how to access the data stored in the
    database
  • For tables that access more than one table,
    decides how to perform the necessary joins
  • The Cost-Based Optimizer (CBO) chooses an access
    path based on several factors
  • The available access paths for the statement

4
Understanding the Cost-Based Optimizer (continued)
  • Determines which access paths are available by
    examining
  • The conditions in the statements WHERE clause
  • The FROM clause if SAMPLE or SAMPLE BLOCK are
    used
  • Generates possible execution plans
  • Estimates the cost of each plan using the
    statistics for the index, columns, and tables
    accessible to the statement which may change over
    time
  • Finally, chooses the execution plan with the
    lowest estimated cost
  • Numeric value of cost is an arbitrary measurement
  • Does not correspond to any real measurement such
    as milliseconds or I/Os
  • Use for comparative measurement, not quantitative

5
Understanding Statistics
  • Statistics quantify the data distribution and
    storage characteristics of tables, columns,
    indexes, and partitions
  • Cost-based optimization uses these statistics to
    estimate the cost of each execution plan
  • Gather statistics periodically
  • Statistics become stale over time, due to
    changing data volumes or data distribution
  • Statistics should be gathered after a schema
    objects data or structure are modified in ways
    that make the previous statistics inaccurate
  • After loading a significant number of rows into a
    table
  • After heavily updating data in a table
  • Existing statistics are updated the next time
    statistics are gathered

6
Understanding Statistics (continued)
  • Any currently parsed SQL statements that access
    the object are invalidated removes the
    statement from the shared pool
  • Optimizer then automatically chooses a new
    execution plan based on the new statistics
  • The cost-based optimizer requires accurate
    information to create efficient execution plans
  • Remember to analyze all tables and indexes
  • Optimizer makes its worst choices on a join when
    one table is analyzed and the other is not
  • Uses Oracle-defined default values when
    statistics do not exist
  • Times when statistics are most likely to be
    forgotten
  • A table or index is added, rebuilt or moved
  • Moving from development to production or
    vice-versa

7
Analyze Command
  • Prior to Oracle8i, statistics were gathered using
    the ANALYZE command
  • Oracle strongly recommends using DBMS_STATS to
    collect statistics
  • The cost-based optimizer will eventually use only
    statistics that have been collected by DBMS_STATS
  • ANALYZE COMPUTE and ESTIMATE are supported for
    backward compatibility only
  • Use ANALYZE for VALIDATE and LIST CHAINED ROWS
    only

8
Gathering Statistics
  • Statistics should be gathered with the DBMS_STATS
    package
  • Used to gather, modify, view, export, import, and
    delete statistics
  • Gather for individual indexes, tables, columns,
    and partitions
  • For all objects in a schema or the entire
    database
  • Cannot be gathered for clusters but may gather
    for individual tables in a cluster
  • Gathers only statistics needed by the cost-based
    optimizer
  • Does not gather number of chained rows, average
    free space, or unused data blocks which were
    previously gathered by ANALYZE
  • Enhanced in Oracle9i and again in 10g
  • To determine which tables, indexes, or partitions
    have been analyzed, reference the LAST_ANALYZED
    column in the data dictionary views

9
Statistics Collected by DBMS_STATS
  • Table statistics
  • Number of rows
  • Number of blocks used (always exact)
  • Average row length
  • Column statistics
  • Number of distinct values in column
  • Number of nulls in column
  • Data distribution

10
Statistics Collected by DBMS_STATS (continued)
  • Index statistics
  • Index level (always exact)
  • Number of leaf blocks
  • Clustering factor
  • Minimum / maximum key values
  • System statistics
  • I/O and CPU performance and utilization

11
Statistics Storage
  • Statistics are stored in the data dictionary
  • DBA_TABLES DBA_TAB_COL_STATISTICS
  • DBA_INDEXES DBA_CLUSTERS
  • DBA_TAB_PARTITIONS DBA_TAB_SUBPARTITIONS
  • DBA_IND_PARTITIONS DBA_IND_SUBPARTITIONS
  • DBA_PART_COL_STATISTICS DBA_SUBPART_COL_STATISTICS
  • New in Oracle 10g
  • DBA_TAB_STATISTICS DBA_OBJECT_TABLES
  • DBA_IND_STATISTICS DBA_TAB_HISTOGRAMS
  • DBA_PART_HISTOGRAMS DBA_SUBPART_HISTOGRAMS
  • Or the equivalent ALL_ or USER_ views

12
Using the DBMS_STATS Package
  • Use to gather, modify, view, export, import, and
    delete statistics
  • Basic procedures in the DBMS_STATS package
    include
  • Procedure  Description
  • GATHER_INDEX_STATS   Gathers index statistics
  • GATHER_TABLE_STATS   Gathers table, column, and
    index statistics
  • GATHER_SCHEMA_STATS Gathers statistics for all
    objects in schema
  • GATHER_DATABASE_STATS   Gathers statistics for
    all objects in database
  • GATHER_SYSTEM_STATS   Gathers statistics on I/O
    and CPU utilization
  • GATHER_DICTIONARY_STATS Gathers statistics for
    all tables in the
  • data dictionary
  • GATHER_FIXED_OBJECTS_STATS Gathers statistics for
    fixed objects
  • (V views)

13
General Syntax
  • Syntax for GATHER_TABLE_STATS follows
  • Similar syntax for the other procedures
  • DBMS_STATS.GATHER_TABLE_STATS (
  • ownname VARCHAR2,
  • tabname VARCHAR2,
  • partname VARCHAR2 DEFAULT NULL,
  • estimate_percent NUMBER DEFAULT NULL,
  • block_sample BOOLEAN DEFAULT FALSE,
  • method_opt VARCHAR2 DEFAULT 'FOR ALL COLUMNS
    SIZE 1',
  • degree NUMBER DEFAULT NULL,
  • granularity VARCHAR2 DEFAULT 'AUTO',
  • cascade BOOLEAN DEFAULT TRUE,
  • stattab VARCHAR2 DEFAULT NULL,
  • statid VARCHAR2 DEFAULT NULL,
  • statown VARCHAR2 DEFAULT NULL,
  • no_invalidate BOOLEAN DEFAULT FALSE)

14
Generating Statistics
  • Generate statistics using
  • Estimation based on random data sampling
  • Exact computation
  • Oracle recommends estimating statistics by
    setting ESTIMATE_PERCENT gt DBMS_STATS.AUTO_SAMPLE
    _SIZE
  • Allows Oracle to determine the best sample size
  • Achieves necessary statistical accuracy

15
Generating Statistics Example
  • To collect table and column statistics for all
    tables in USER1 schema with auto-sampling
  • EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS
  • (OWNNAME gt'USER1',
  • ESTIMATE_PERCENT gt DBMS_STATS.AUTO_SAMPLE_SIZE
    ,
  • CASCADE gt TRUE)
  • CASCADE gt TRUE causes all indexes for the tables
    to be analyzed
  • Use CASCADE gt AUTO_CASCADE to let Oracle decide
  • whether or not new index statistics are needed

16
Estimating Statistics
  • To estimate statistics, Oracle selects a random
    sample of data
  • Can specify a user-defined sampling percentage to
    perform either row sampling, or block sampling
  • Row sampling reads rows without regard to
    physical placement on disk
  • Provides the most random data for estimates
  • To use, specify ESTIMATE_PERCENT gt 5
  • Valid values from 0.000001 to 100
  • A null value causes an exact computation (reads
    all rows )
  • A value of 5 usually forces the optimizer to
    make the same decision as an exact computation
  • A value greater than 15 will generally result in
    100 of the rows being read and analyzed

17
Estimating Statistics (continued)
  • Block sampling reads a random sample of blocks
    and uses all of the rows in those blocks for
    estimates
  • Reduces the amount of I/O activity for a given
    sample size
  • Can reduce the randomness of the sample
  • Specify the BLOCK_SAMPLE parameter with an
    appropriate percent
  • Can speed the gathering of statistics by
    gathering in parallel
  • Use the parameter DEGREE gt n where n is the
    number of parallel processes
  • Set DEGREE to DBMS_STATS.AUTO_DEGREE to let
    Oracle select the degree of parallelism

18
Individual Table or Index Statistics
  • Sampling can also be done on individual tables
  • Used for large tables to decrease the time
    required for the sampling process
  • Example
  • EXECUTE DBMS_STATS.GATHER_TABLE_STATS
  • (OWNNAME gt 'USER1',
  • TABNAME gt 'EMPLOYEE',
  • ESTIMATE_PERCENT gt 5)
  • To gather statistics for an individual index
  • EXECUTE DBMS_STATS.GATHER_INDEX_STATS
  • (OWNNAME gt 'USER1',
  • INDNAME gt 'EMPLOYEE_PK',
  • BLOCK_SAMPLE gt 5)

19
Statistics for the Data Dictionary
  • Beginning with Oracle Database 10g, Oracle
    recommends gathering statistics for the data
    dictionary tables
  • Objective is to enhance performance for queries
  • Basically, two types of objects in the data
    dictionary
  • Normal dictionary base tables
  • Gathered using the GATHER_DICTIONARY_STATS
    procedure
  • May also be be gathered using GATHER_SCHEMA_STATS
    with OWNNAME parameter set to SYS
  • Oracle recommends gathering at the same frequency
    as your other database schemas

20
Statistics for the Data Dictionary (continued)
  • Fixed objects or in-memory objects
  • The V views on the X tables
  • Gathered using the GATHER_FIXED_OBJECT_STATS
    procedure
  • Oracle recommends gathering these statistics less
    frequently than for the normal dictionary tables
  • The initial collection of statistics is normally
    sufficient
  • Repeat only if workload characteristics have
    changed dramatically
  • Must have the SYSDBA or ANALYZE ANY DICTIONARY
    and ANALYZE ANY system privileges to execute
    these procedures

21
Transferring Statistics
  • May save statistics into a statistics table and
    then load them into the data dictionary
  • May want to save current statistics before
    gathering new ones
  • Useful to restore old statistics in the event
    performance degrades
  • For example, when upgrading Oracle to a new
    release
  • May want to copy statistics to another instance
  • For example, from production database to a test
    database
  • Then use the smaller database for testing but
    will see the execution plans that would occur on
    the larger production database
  • May also be used to maintain and experiment with
    sets of statistics

22
Procedures for Transferring Statistics
  • Use the following DBMS_STATS procedure
  • CREATE_STAT_TABLE creates the statistics
    table
  • EXPORT_TABLE_STATISTICS copies statistics into
    the table
  • IMPORT_TABLE_STATISTICS copies the statistics
    into the
  • data dictionary of the other
  • instance
  • Can also export / import column, index, schema,
    database or system statistics

23
Saving Original Statistics
  • Occasionally, gathering new statistics may cause
    a degradation in performance
  • For specific, critical tables, may want to save
    the old statistics before gathering new ones
  • Could then revert to the old statistics if
    necessary while analyzing and resolving the
    reasons for the performance change
  • BEGIN
  • DBMS_STATS.CREATE_STAT_TABLE
  • (OWNNAME gt 'USER1',
  • TABNAME gt 'SAVESTATS')
  • DBMS_STATS.GATHER_TABLE_STATS
  • (OWNNAME gt 'USER1',
  • TABNAME gt 'EMPLOYEE',
  • STATTAB gt 'SAVESTATS')
  • END

24
Saving Original Statistics (continued)
  • If the new statistics are causing the optimizer
    to generate poor plans, restore the original
    stats while resolving the problem
  • BEGIN
  • DBMS_STATS.DELETE_TABLE_STATS
  • (OWNNAME gt 'USER1',
  • TABNAME gt 'EMPLOYEE')
  • DBMS_STATS.IMPORT_TABLE_STATS
  • (OWNNAME gt 'USER1',
  • TABNAME gt 'EMPLOYEE',
  • STATTAB gt 'SAVESTATS')
  • END

25
Restoring and Purging Statistics History
  • Statistics may be restored as of a specified
    timestamp
  • Useful in case newly collected statistics leads
    to sub-optimal execution plans
  • To revert to a previous set of statistics for a
    specific table, use the RESTORE_TABLE_STATS
    procedure
  • May also be done at the schema or database level
  • May also be done for fixed objects, dictionary,
    and system statistics
  • Whenever statistics in the dictionary are
    modified, old versions are automatically saved
  • Old statistics are automatically purged at
    regular intervals based on a configurable
    retention setting
  • May also be purged beyond a specified time stamp
    using the PURGE_STATS procedure

26
Column Histograms
  • The optimizer needs to be aware of columns that
    have severe data skewness or it may make bad
    decisions
  • Histograms are used to store detailed information
    about data distributions which are not uniform
  • A column which has non-uniform data distribution
  • EMPLOYEE_LNAME column where there may be more
    SMITHs than ZABRISKIEs
  • A column where a large percentage of the rows
    have the same value for a given column
  • 60 of the employees were hired in the last 2
    years

27
Column Histograms (continued)
  • Histograms can result in significant performance
    improvements
  • Does require a knowledge of the application and
    the data distribution
  • To develop histograms only for indexed columns
  • METHOD_OPT gt 'FOR ALL INDEXED COLUMNS SIZE 10'
  • Size defines the number of histogram buckets

28
Automating Histogram Decisions
  • METHOD_OPT parameter provides additional options
    to automate
  • Decisions on which columns need histograms
  • Definition of the appropriate number of histogram
    buckets
  • AUTO creates histograms based on data
    distribution and the way columns are being
    accessed by the application
  • Requires the application to run for an amount of
    time to allow the workload to be captured
  • METHOD_OPT gt 'FOR ALL INDEXED COLUMNS SIZE
    AUTO'

29
Automating Histogram Decisions (continued)
  • SKEWONLY creates histograms solely based on data
    distribution
  • Useful while time is allowed to capture the
    workload
  • Expensive in terms of resource
  • Restrict to initial collection of statistics and
    then use AUTO
  • REPEAT refreshes existing column statistics
    including histograms
  • If a column does not have histograms, new ones
    will not be created

30
Automating Statistics Collection
  • The DBMS_STATS procedures can be executed inside
    PL/SQL procedures
  • For example
  • CREATE OR REPLACE PROCEDURE gather_stats
  • BEGIN
  • DBMS_STATS.GATHER_SCHEMA_STATS
  • (OWNNAME gt 'USER1',
  • ESTIMATE_PERCENT gt
  • DBMS_STATS.AUTO_SAMPLE_SIZE)
  • END
  • Then schedule the procedure(s)
  • Could use Oracles scheduling tools, DBMS_JOB in
    Oracle8i and Oracle9i or the Scheduler in Oracle
    10g, or use operating system or third-party
    scheduling tools

31
Automatically Collecting Statistics on Tables
  • In Oracle8i, the MONITORING clause in the CREATE
    (or ALTER) TABLE statement is set to start or
    stop the collection of modification statistics
  • ALTER TABLE table_name MONITORINGNOMONITOR
    ING
  • May be controlled using DBMS_STATS to enable or
    disable monitoring
  • In Oracle9i, may be set at the schema or database
    level
  • For an individual schema using ALTER_SCHEMA_TABLE_
    MONITORING
  • For the entire database using ALTER_DATABASE_TABLE
    _MONITORING
  • Can then allow Oracle to determine which objects
    need new statistics gathered and how to
    gather those statistics

32
Automatically Collecting Statistics on
Tables(continued)
  • For example
  • BEGIN
  • DBMS_STATS.GATHER_SCHEMA_STATS
  • (OWNNAME gt 'USER1',
  • OPTIONS gt 'GATHER AUTO')
  • END
  • Returns a list of objects found to be stale and
    for which statistics were gathered

33
Automatically Collecting Statistics on
Tables(continued)
  • All of the above monitoring options are obsolete
    in 10g
  • May still be used, however, will have no affect
  • Beginning with Oracle 10g, the STATISTICS_LEVEL
    initialization parameter sets table monitoring
    at the database level
  • Defaults to TYPICAL which ensures collection of
    all major statistics
  • Set to BASIC to disable monitoring
  • Monitoring tracks the approximate number of
    INSERT, UPDATE, and DELETE operations for the
    table since the last time statistics were
    gathered
  • Information is maintained in the SGA

34
Automatically Collecting Statistics on
Tables(continued)
  • Periodically (about every three hours) SMON
    incorporates the data into the data dictionary
    view DBA_TAB_MODIFICATIONS
  • Oracle uses this view to identify tables with
    stale statistics
  • Considered stale when 10 of the rows have
    changed
  • Automated statistics gathering is then invoked by
    setting up a recurring job
  • Use the parameter OPTIONS gt 'GATHER STALE'
  • Set the job to execute at an appropriate interval
    for your application

35
Locking Statistics
  • Statistics may be locked
  • All statistics on the table, including table,
    column, histograms and on all dependent indexes
    are considered locked
  • The GATHER_, IMPORT_, and DELETE_ procedures
    will raise an error when the statistics are
    locked
  • Procedures that operate on multiple objects
    (GATHER_SCHEMA_STATS) will skip modifying the
    statistics of an object if it is locked
  • Can be used to freeze the current set of
    statistics or to keep the statistics empty if
    required
  • Set using the procedures LOCK_SCHEMA_STATS
    and/or LOCK_TABLE_STATS
  • Use UNLOCK_ procedures to unlock the statistics

36
Gathering System Statistics
  • System statistics analyze system activity over a
    period of time
  • The DBMS_STATS.GATHER_SYSTEM_STATS routine
    gathers system statistics in a user-defined time
    frame
  • Must have DBA privileges to gather system
    statistics
  • For example
  • BEGIN
  • DBMS_STATS.GATHER_SYSTEM_STATS
  • (GATHERING_MODE gt 'INTERVAL',
  • INTERVAL gt 480,
  • STATTAB gt 'MYSTATS'
  • STATID gt 'OLTP')
  • END 
  • Statistics are gathered for 480 minutes, placed
    in a table called MYSTATS, and given an
    identifier of OLTP

37
Gathering System Statistics (continued)
  • Different statistics could be gathered for
    daytime operation and for nighttime operation
  • Statistics are then activated by importing the
    appropriate set of statistics for daytime or, in
    this case, nighttime activity
  • BEGIN
  • DBMS_STATS.IMPORT_SYSTEM_STATS
  • (STATTAB gt 'MYSTATS',
  • STATID gt 'BATCH')
  • END
  • Again, could be scheduled using any of the
    scheduling tools
  • Unlike table, index, or column statistics, Oracle
    does not invalidate already parsed SQL statements
  • New SQL statements are parsed using new
    statistics

38
Summary
  • As of Oracle Database 10g, Release 1, there are
    nearly 70 procedures and functions in the
    DBMS_STATS package and loads of additional
    options
  • Discussed the ones that are generally most useful
  • All are well documented in the Oracle manuals
  • PL/SQL Packages and Types Reference, 10g Release
    1 (10.1)
  • Oracle Database, Performance Tuning Guide, 10g
    Release 1 (10.1)
  • Most individuals involved in tuning (and Oracle)
    recommend using DBMS_STATS rather than the
    ANALYZE command
  • Still using ANALYZE?
  • Time to make the switch!

39
??? Questions ??? THANKS FOR ATTENDING Please
complete the session evaluations Analyze That
Database Now! (Using DBMS_STATS) Paper 515
Kent Hinckley gkhinckley_at_aol.com
Write a Comment
User Comments (0)
About PowerShow.com