Title: Analyze That Database Now
1 Analyze That Database Now! (Using
DBMS_STATS) Kent Hinckley IOUG LIVE! 2004
Systems Plus Paper 515 gkhinckley_at_aol.com
2Introduction
- Statistics can be generated to quantify data
distribution and storage characteristics of
tables, columns, indexes, and partitions - The cost-based optimizer uses these statistics to
determine - The statements execution plan
- The optimal join order and join method
- Developers design frequency requirements for
statistics gathering - Based on frequency of modifications which change
data volumes or column values - DBAs then gather statistics based on the design
- Essential both developers and DBAs understand the
process
3Understanding the Cost-Based Optimizer
- There is almost always more than one way for
Oracle to execute a DML statement (SELECT,
INSERT, UPDATE, MERGE, DELETE) - Objective of the optimizer is to choose the most
efficient method - Optimizer tasks include
- Possible transformation of the statement into an
equivalent, more efficient statement - Decides how to access the data stored in the
database - For tables that access more than one table,
decides how to perform the necessary joins - The Cost-Based Optimizer (CBO) chooses an access
path based on several factors - The available access paths for the statement
4Understanding the Cost-Based Optimizer (continued)
- Determines which access paths are available by
examining - The conditions in the statements WHERE clause
- The FROM clause if SAMPLE or SAMPLE BLOCK are
used - Generates possible execution plans
- Estimates the cost of each plan using the
statistics for the index, columns, and tables
accessible to the statement which may change over
time - Finally, chooses the execution plan with the
lowest estimated cost - Numeric value of cost is an arbitrary measurement
- Does not correspond to any real measurement such
as milliseconds or I/Os - Use for comparative measurement, not quantitative
5Understanding Statistics
- Statistics quantify the data distribution and
storage characteristics of tables, columns,
indexes, and partitions - Cost-based optimization uses these statistics to
estimate the cost of each execution plan - Gather statistics periodically
- Statistics become stale over time, due to
changing data volumes or data distribution - Statistics should be gathered after a schema
objects data or structure are modified in ways
that make the previous statistics inaccurate - After loading a significant number of rows into a
table - After heavily updating data in a table
- Existing statistics are updated the next time
statistics are gathered
6Understanding Statistics (continued)
- Any currently parsed SQL statements that access
the object are invalidated removes the
statement from the shared pool - Optimizer then automatically chooses a new
execution plan based on the new statistics - The cost-based optimizer requires accurate
information to create efficient execution plans - Remember to analyze all tables and indexes
- Optimizer makes its worst choices on a join when
one table is analyzed and the other is not - Uses Oracle-defined default values when
statistics do not exist - Times when statistics are most likely to be
forgotten - A table or index is added, rebuilt or moved
- Moving from development to production or
vice-versa
7Analyze Command
- Prior to Oracle8i, statistics were gathered using
the ANALYZE command - Oracle strongly recommends using DBMS_STATS to
collect statistics - The cost-based optimizer will eventually use only
statistics that have been collected by DBMS_STATS - ANALYZE COMPUTE and ESTIMATE are supported for
backward compatibility only - Use ANALYZE for VALIDATE and LIST CHAINED ROWS
only
8Gathering Statistics
- Statistics should be gathered with the DBMS_STATS
package - Used to gather, modify, view, export, import, and
delete statistics - Gather for individual indexes, tables, columns,
and partitions - For all objects in a schema or the entire
database - Cannot be gathered for clusters but may gather
for individual tables in a cluster - Gathers only statistics needed by the cost-based
optimizer - Does not gather number of chained rows, average
free space, or unused data blocks which were
previously gathered by ANALYZE - Enhanced in Oracle9i and again in 10g
- To determine which tables, indexes, or partitions
have been analyzed, reference the LAST_ANALYZED
column in the data dictionary views
9Statistics Collected by DBMS_STATS
- Table statistics
- Number of rows
- Number of blocks used (always exact)
- Average row length
- Column statistics
- Number of distinct values in column
- Number of nulls in column
- Data distribution
10Statistics Collected by DBMS_STATS (continued)
- Index statistics
- Index level (always exact)
- Number of leaf blocks
- Clustering factor
- Minimum / maximum key values
- System statistics
- I/O and CPU performance and utilization
11Statistics Storage
- Statistics are stored in the data dictionary
- DBA_TABLES DBA_TAB_COL_STATISTICS
- DBA_INDEXES DBA_CLUSTERS
- DBA_TAB_PARTITIONS DBA_TAB_SUBPARTITIONS
- DBA_IND_PARTITIONS DBA_IND_SUBPARTITIONS
- DBA_PART_COL_STATISTICS DBA_SUBPART_COL_STATISTICS
- New in Oracle 10g
- DBA_TAB_STATISTICS DBA_OBJECT_TABLES
- DBA_IND_STATISTICS DBA_TAB_HISTOGRAMS
- DBA_PART_HISTOGRAMS DBA_SUBPART_HISTOGRAMS
- Or the equivalent ALL_ or USER_ views
12Using the DBMS_STATS Package
- Use to gather, modify, view, export, import, and
delete statistics - Basic procedures in the DBMS_STATS package
include - Procedure Description
- GATHER_INDEX_STATS Gathers index statistics
- GATHER_TABLE_STATS Gathers table, column, and
index statistics - GATHER_SCHEMA_STATS Gathers statistics for all
objects in schema - GATHER_DATABASE_STATS Gathers statistics for
all objects in database - GATHER_SYSTEM_STATS Gathers statistics on I/O
and CPU utilization - GATHER_DICTIONARY_STATS Gathers statistics for
all tables in the - data dictionary
- GATHER_FIXED_OBJECTS_STATS Gathers statistics for
fixed objects - (V views)
-
13General Syntax
- Syntax for GATHER_TABLE_STATS follows
- Similar syntax for the other procedures
- DBMS_STATS.GATHER_TABLE_STATS (
- ownname VARCHAR2,
- tabname VARCHAR2,
- partname VARCHAR2 DEFAULT NULL,
- estimate_percent NUMBER DEFAULT NULL,
- block_sample BOOLEAN DEFAULT FALSE,
- method_opt VARCHAR2 DEFAULT 'FOR ALL COLUMNS
SIZE 1', - degree NUMBER DEFAULT NULL,
- granularity VARCHAR2 DEFAULT 'AUTO',
- cascade BOOLEAN DEFAULT TRUE,
- stattab VARCHAR2 DEFAULT NULL,
- statid VARCHAR2 DEFAULT NULL,
- statown VARCHAR2 DEFAULT NULL,
- no_invalidate BOOLEAN DEFAULT FALSE)
14Generating Statistics
- Generate statistics using
- Estimation based on random data sampling
- Exact computation
- Oracle recommends estimating statistics by
setting ESTIMATE_PERCENT gt DBMS_STATS.AUTO_SAMPLE
_SIZE - Allows Oracle to determine the best sample size
- Achieves necessary statistical accuracy
-
15Generating Statistics Example
- To collect table and column statistics for all
tables in USER1 schema with auto-sampling - EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS
- (OWNNAME gt'USER1',
- ESTIMATE_PERCENT gt DBMS_STATS.AUTO_SAMPLE_SIZE
, - CASCADE gt TRUE)
- CASCADE gt TRUE causes all indexes for the tables
to be analyzed - Use CASCADE gt AUTO_CASCADE to let Oracle decide
- whether or not new index statistics are needed
-
16Estimating Statistics
- To estimate statistics, Oracle selects a random
sample of data - Can specify a user-defined sampling percentage to
perform either row sampling, or block sampling - Row sampling reads rows without regard to
physical placement on disk - Provides the most random data for estimates
- To use, specify ESTIMATE_PERCENT gt 5
- Valid values from 0.000001 to 100
- A null value causes an exact computation (reads
all rows ) - A value of 5 usually forces the optimizer to
make the same decision as an exact computation - A value greater than 15 will generally result in
100 of the rows being read and analyzed
17Estimating Statistics (continued)
- Block sampling reads a random sample of blocks
and uses all of the rows in those blocks for
estimates - Reduces the amount of I/O activity for a given
sample size - Can reduce the randomness of the sample
- Specify the BLOCK_SAMPLE parameter with an
appropriate percent - Can speed the gathering of statistics by
gathering in parallel - Use the parameter DEGREE gt n where n is the
number of parallel processes - Set DEGREE to DBMS_STATS.AUTO_DEGREE to let
Oracle select the degree of parallelism
18Individual Table or Index Statistics
- Sampling can also be done on individual tables
- Used for large tables to decrease the time
required for the sampling process - Example
- EXECUTE DBMS_STATS.GATHER_TABLE_STATS
- (OWNNAME gt 'USER1',
- TABNAME gt 'EMPLOYEE',
- ESTIMATE_PERCENT gt 5)
- To gather statistics for an individual index
- EXECUTE DBMS_STATS.GATHER_INDEX_STATS
- (OWNNAME gt 'USER1',
- INDNAME gt 'EMPLOYEE_PK',
- BLOCK_SAMPLE gt 5)
19Statistics for the Data Dictionary
- Beginning with Oracle Database 10g, Oracle
recommends gathering statistics for the data
dictionary tables - Objective is to enhance performance for queries
- Basically, two types of objects in the data
dictionary - Normal dictionary base tables
- Gathered using the GATHER_DICTIONARY_STATS
procedure - May also be be gathered using GATHER_SCHEMA_STATS
with OWNNAME parameter set to SYS - Oracle recommends gathering at the same frequency
as your other database schemas -
20Statistics for the Data Dictionary (continued)
- Fixed objects or in-memory objects
- The V views on the X tables
- Gathered using the GATHER_FIXED_OBJECT_STATS
procedure - Oracle recommends gathering these statistics less
frequently than for the normal dictionary tables - The initial collection of statistics is normally
sufficient - Repeat only if workload characteristics have
changed dramatically - Must have the SYSDBA or ANALYZE ANY DICTIONARY
and ANALYZE ANY system privileges to execute
these procedures
21Transferring Statistics
- May save statistics into a statistics table and
then load them into the data dictionary - May want to save current statistics before
gathering new ones - Useful to restore old statistics in the event
performance degrades - For example, when upgrading Oracle to a new
release - May want to copy statistics to another instance
- For example, from production database to a test
database - Then use the smaller database for testing but
will see the execution plans that would occur on
the larger production database - May also be used to maintain and experiment with
sets of statistics -
22Procedures for Transferring Statistics
- Use the following DBMS_STATS procedure
- CREATE_STAT_TABLE creates the statistics
table - EXPORT_TABLE_STATISTICS copies statistics into
the table - IMPORT_TABLE_STATISTICS copies the statistics
into the - data dictionary of the other
- instance
- Can also export / import column, index, schema,
database or system statistics
23Saving Original Statistics
- Occasionally, gathering new statistics may cause
a degradation in performance - For specific, critical tables, may want to save
the old statistics before gathering new ones - Could then revert to the old statistics if
necessary while analyzing and resolving the
reasons for the performance change - BEGIN
- DBMS_STATS.CREATE_STAT_TABLE
- (OWNNAME gt 'USER1',
- TABNAME gt 'SAVESTATS')
- DBMS_STATS.GATHER_TABLE_STATS
- (OWNNAME gt 'USER1',
- TABNAME gt 'EMPLOYEE',
- STATTAB gt 'SAVESTATS')
- END
24Saving Original Statistics (continued)
- If the new statistics are causing the optimizer
to generate poor plans, restore the original
stats while resolving the problem - BEGIN
- DBMS_STATS.DELETE_TABLE_STATS
- (OWNNAME gt 'USER1',
- TABNAME gt 'EMPLOYEE')
- DBMS_STATS.IMPORT_TABLE_STATS
- (OWNNAME gt 'USER1',
- TABNAME gt 'EMPLOYEE',
- STATTAB gt 'SAVESTATS')
- END
25Restoring and Purging Statistics History
- Statistics may be restored as of a specified
timestamp - Useful in case newly collected statistics leads
to sub-optimal execution plans - To revert to a previous set of statistics for a
specific table, use the RESTORE_TABLE_STATS
procedure - May also be done at the schema or database level
- May also be done for fixed objects, dictionary,
and system statistics - Whenever statistics in the dictionary are
modified, old versions are automatically saved - Old statistics are automatically purged at
regular intervals based on a configurable
retention setting - May also be purged beyond a specified time stamp
using the PURGE_STATS procedure
26Column Histograms
- The optimizer needs to be aware of columns that
have severe data skewness or it may make bad
decisions - Histograms are used to store detailed information
about data distributions which are not uniform - A column which has non-uniform data distribution
- EMPLOYEE_LNAME column where there may be more
SMITHs than ZABRISKIEs - A column where a large percentage of the rows
have the same value for a given column - 60 of the employees were hired in the last 2
years
27Column Histograms (continued)
- Histograms can result in significant performance
improvements - Does require a knowledge of the application and
the data distribution - To develop histograms only for indexed columns
- METHOD_OPT gt 'FOR ALL INDEXED COLUMNS SIZE 10'
- Size defines the number of histogram buckets
-
28Automating Histogram Decisions
- METHOD_OPT parameter provides additional options
to automate - Decisions on which columns need histograms
- Definition of the appropriate number of histogram
buckets - AUTO creates histograms based on data
distribution and the way columns are being
accessed by the application - Requires the application to run for an amount of
time to allow the workload to be captured - METHOD_OPT gt 'FOR ALL INDEXED COLUMNS SIZE
AUTO' -
29Automating Histogram Decisions (continued)
- SKEWONLY creates histograms solely based on data
distribution - Useful while time is allowed to capture the
workload - Expensive in terms of resource
- Restrict to initial collection of statistics and
then use AUTO - REPEAT refreshes existing column statistics
including histograms - If a column does not have histograms, new ones
will not be created -
30Automating Statistics Collection
- The DBMS_STATS procedures can be executed inside
PL/SQL procedures - For example
- CREATE OR REPLACE PROCEDURE gather_stats
- BEGIN
- DBMS_STATS.GATHER_SCHEMA_STATS
- (OWNNAME gt 'USER1',
- ESTIMATE_PERCENT gt
- DBMS_STATS.AUTO_SAMPLE_SIZE)
- END
- Then schedule the procedure(s)
- Could use Oracles scheduling tools, DBMS_JOB in
Oracle8i and Oracle9i or the Scheduler in Oracle
10g, or use operating system or third-party
scheduling tools
31Automatically Collecting Statistics on Tables
- In Oracle8i, the MONITORING clause in the CREATE
(or ALTER) TABLE statement is set to start or
stop the collection of modification statistics - ALTER TABLE table_name MONITORINGNOMONITOR
ING - May be controlled using DBMS_STATS to enable or
disable monitoring - In Oracle9i, may be set at the schema or database
level - For an individual schema using ALTER_SCHEMA_TABLE_
MONITORING - For the entire database using ALTER_DATABASE_TABLE
_MONITORING - Can then allow Oracle to determine which objects
need new statistics gathered and how to
gather those statistics
32Automatically Collecting Statistics on
Tables(continued)
- For example
- BEGIN
- DBMS_STATS.GATHER_SCHEMA_STATS
- (OWNNAME gt 'USER1',
- OPTIONS gt 'GATHER AUTO')
- END
- Returns a list of objects found to be stale and
for which statistics were gathered
33Automatically Collecting Statistics on
Tables(continued)
- All of the above monitoring options are obsolete
in 10g - May still be used, however, will have no affect
- Beginning with Oracle 10g, the STATISTICS_LEVEL
initialization parameter sets table monitoring
at the database level - Defaults to TYPICAL which ensures collection of
all major statistics - Set to BASIC to disable monitoring
- Monitoring tracks the approximate number of
INSERT, UPDATE, and DELETE operations for the
table since the last time statistics were
gathered - Information is maintained in the SGA
34Automatically Collecting Statistics on
Tables(continued)
- Periodically (about every three hours) SMON
incorporates the data into the data dictionary
view DBA_TAB_MODIFICATIONS - Oracle uses this view to identify tables with
stale statistics - Considered stale when 10 of the rows have
changed - Automated statistics gathering is then invoked by
setting up a recurring job - Use the parameter OPTIONS gt 'GATHER STALE'
- Set the job to execute at an appropriate interval
for your application
35Locking Statistics
- Statistics may be locked
- All statistics on the table, including table,
column, histograms and on all dependent indexes
are considered locked - The GATHER_, IMPORT_, and DELETE_ procedures
will raise an error when the statistics are
locked - Procedures that operate on multiple objects
(GATHER_SCHEMA_STATS) will skip modifying the
statistics of an object if it is locked - Can be used to freeze the current set of
statistics or to keep the statistics empty if
required - Set using the procedures LOCK_SCHEMA_STATS
and/or LOCK_TABLE_STATS - Use UNLOCK_ procedures to unlock the statistics
36Gathering System Statistics
- System statistics analyze system activity over a
period of time - The DBMS_STATS.GATHER_SYSTEM_STATS routine
gathers system statistics in a user-defined time
frame - Must have DBA privileges to gather system
statistics - For example
- BEGIN
- DBMS_STATS.GATHER_SYSTEM_STATS
- (GATHERING_MODE gt 'INTERVAL',
- INTERVAL gt 480,
- STATTAB gt 'MYSTATS'
- STATID gt 'OLTP')
- END
- Statistics are gathered for 480 minutes, placed
in a table called MYSTATS, and given an
identifier of OLTP
37Gathering System Statistics (continued)
- Different statistics could be gathered for
daytime operation and for nighttime operation - Statistics are then activated by importing the
appropriate set of statistics for daytime or, in
this case, nighttime activity - BEGIN
- DBMS_STATS.IMPORT_SYSTEM_STATS
- (STATTAB gt 'MYSTATS',
- STATID gt 'BATCH')
- END
- Again, could be scheduled using any of the
scheduling tools - Unlike table, index, or column statistics, Oracle
does not invalidate already parsed SQL statements - New SQL statements are parsed using new
statistics
38Summary
- As of Oracle Database 10g, Release 1, there are
nearly 70 procedures and functions in the
DBMS_STATS package and loads of additional
options - Discussed the ones that are generally most useful
- All are well documented in the Oracle manuals
- PL/SQL Packages and Types Reference, 10g Release
1 (10.1) - Oracle Database, Performance Tuning Guide, 10g
Release 1 (10.1) - Most individuals involved in tuning (and Oracle)
recommend using DBMS_STATS rather than the
ANALYZE command - Still using ANALYZE?
- Time to make the switch!
-
39??? Questions ??? THANKS FOR ATTENDING Please
complete the session evaluations Analyze That
Database Now! (Using DBMS_STATS) Paper 515
Kent Hinckley gkhinckley_at_aol.com