Title: Interpreting Wait Events To Boost System Performance
1Interpreting Wait Events To Boost System
Performance
Roger Schrag Database Specialists,
Inc. www.dbspecialists.com
NoCOUG Winter Conference February 21, 2002
2Session Objectives
- Define wait events
- Discuss how to use the wait event interface
- Walk through four examples of how wait event
information was used to troubleshoot production
problems
3Wait Event Defined
- We say an Oracle process is busy when it wants
CPU time. - When an Oracle process is not busy, it is waiting
for something to happen. - There are only so many things an Oracle process
could be waiting for, and the kernel developers
at Oracle have attached names to them all. - These are wait events.
4Wait Event Examples
- An Oracle process waiting for the client
application to submit a SQL statement waits on a
SQLNet message from client event. - An Oracle process waiting on another session to
release a row-level lock waits on an enqueue
event.
5Wait Event Interface
- Each Oracle process identifies the event it is
waiting for each time a wait begins. - The instance collects cumulative statistics about
events waited upon since instance startup. - You can access this information through v views
and tracing events. - These make up the wait event interface.
6Viewing Wait Events
http//dbrx.dbspecialists.com/pls/dbrx/view_report
7Why Wait Event Information Is Useful
- Wait events touch all areas of Oraclefrom I/O to
latches to parallelism to network traffic. - Wait event data can be remarkably detailed.
Waited 0.02 seconds to read 8 blocks from file
42 starting at block 18042. - Analyzing wait event data will yield a path
toward a solution for almost any problem.
8Important Wait Events
- There were 102 wait events in Oracle 7.3.
- There are 217 wait events in Oracle 8i Release 3
(8.1.7). - Most come up infrequently or are rarely
significant for troubleshooting performance. - Different wait events are significant in
different environments, depending on which Oracle
features have been deployed.
9A Few Common Wait Events
enqueue log file sequential read
library cache pin log file parallel write
library cache load lock log file sync
latch free db file scattered read
buffer busy waits db file sequential read
control file sequential read db file parallel read
control file parallel write db file parallel write
log buffer space direct path read / write
10Idle Events
- Sometimes an Oracle process is not busy simply
because it has nothing to do. - In this case the process will be waiting on an
event that we call an idle event. - Idle events are usually not interesting from the
tuning and troubleshooting perspective.
11Common Idle Events
client message parallel query dequeue
dispatcher timer rdbms ipc message
Null event SQLNet message from client
smon timer SQLNet message to client
PX Idle Wait SQLNet more data from client
pipe get wakeup time manager
PL/SQL lock timer virtual circuit status
pmon timer lock manager wait for remote message
12Accounted For By The Wait Event Interface
- Time spent waiting for something to do (idle
events) - Time spent waiting for something to happen so
that work may continue (non-idle events)
13Not Accounted For By The Wait Event Interface
- Time spent using a CPU
- Time spent waiting for a CPU
- Time spent waiting for virtual memory to be
swapped back into physical memory
Note that logical reads from the buffer cache do
not appear in the wait event interface.
14Timed Statistics
The wait event interface will not collect timing
information unless timed statistics are enabled.
- Enable timed statistics dynamically at the
instance or session level - ALTER SYSTEM SET timed_statistics TRUE
- ALTER SESSION SET timed_statistics TRUE
- Enable timed statistics at instance startup by
setting the instance parameter - timed_statistics true
15The Wait Event Interface
- Dynamic performance views
- vsystem_event
- vsession_event
- vevent_name
- vsession_wait
- System event 10046
16The vsystem_event View
- Shows one row for each wait event name, along
with cumulative statistics since instance
startup. Wait events that have not occurred at
least once since instance startup do not appear
in this view. - Column Name Data Type
- -------------------------- ------------
- EVENT VARCHAR2(64)
- TOTAL_WAITS NUMBER
- TOTAL_TIMEOUTS NUMBER
- TIME_WAITED NUMBER
- AVERAGE_WAIT NUMBER
17Columns In vsystem_event
- EVENT The name of a wait event
- TOTAL_WAITS Total number of times a process has
waited for this event since instance startup - TOTAL_TIMEOUTS Total number of timeouts while
waiting for this event since instance startup - TIME_WAITED Total time waited for this wait
event by all processes since instance startup (in
centiseconds) - AVERAGE_WAIT The average length of a wait for
this event since instance startup (in
centiseconds)
18Sample vsystem_event Query
- SQLgt SELECT event, time_waited
- 2 FROM vsystem_event
- 3 WHERE event IN ('smon timer',
- 4 'SQLNet message from client',
- 5 'db file sequential read',
- 6 'log file parallel write')
- EVENT TIME_WAITED
- --------------------------------- -----------
- log file parallel write 159692
- db file sequential read 28657
- smon timer 130673837
- SQLNet message from client 16528989
19The vsession_event View
- Shows one row for each wait event name within
each session, along with cumulative statistics
since session start. - Column Name Data Type
- -------------------------- ------------
- SID NUMBER
- EVENT VARCHAR2(64)
- TOTAL_WAITS NUMBER
- TOTAL_TIMEOUTS NUMBER
- TIME_WAITED NUMBER
- AVERAGE_WAIT NUMBER
- MAX_WAIT NUMBER
20Columns In vsession_event
- SID The ID of a session (from vsession)
- EVENT The name of a wait event
- TOTAL_WAITS Total number of times this session
has waited for this event - TOTAL_TIMEOUTS Total number of timeouts while
this session has waited for this event - TIME_WAITED Total time waited for this event by
this session (in centiseconds) - AVERAGE_WAIT The average length of a wait for
this event in this session (in centiseconds) - MAX_WAIT The maximum amount of time the session
had to wait for this event (in centiseconds)
21Sample vsession_event Query
- SQLgt SELECT event, total_waits, time_waited
- 2 FROM vsession_event
- 3 WHERE SID
- 4 (SELECT sid FROM vsession
- 5 WHERE audsid
- 6 USERENV ('sessionid') )
- Â
- EVENT WAITS TIME_WAITED
- --------------------------- ----- -----------
- db file sequential read 552 240
- db file scattered read 41 31
- SQLNet message to client 73 0
- SQLNet message from client 72 339738
22The vevent_name View
- Shows one row for each wait event name known to
the Oracle kernel, along with names of up to
three parameters associated with the wait event. - Column Name Data Type
- -------------------------- ------------
- EVENT NUMBER
- NAME VARCHAR2(64)
- PARAMETER1 VARCHAR2(64)
- PARAMETER2 VARCHAR2(64)
- PARAMETER3 VARCHAR2(64)
23Columns In vevent_name
- EVENT An internal ID
- NAME The name of a wait event
- PARAMETERn The name of a parameter associated
with the wait event
24Sample vevent_name Query
- SQLgt SELECT
- 2 FROM vevent_name
- 3 WHERE name 'db file scattered read'
- Â
- EVENT NAME
- ---------- ------------------------------
- PARAMETER1 PARAMETER2 PARAMETER3
- ------------- ------------- -------------
- 95 db file scattered read
- file block blocks
25The vsession_wait View
- Shows one row for each session, providing
detailed information about the current or most
recent wait event. - Column Name Data Type
- -------------------------- ------------
- SID NUMBER
- SEQ NUMBER
- EVENT VARCHAR2(64)
- P1TEXT VARCHAR2(64)
- P1 NUMBER
- P1RAW RAW(4)
- P2TEXT VARCHAR2(64)
- P2 NUMBER
- P2RAW RAW(4)
- P3TEXT VARCHAR2(64)
- P3 NUMBER
- P3RAW RAW(4)
- WAIT_TIME NUMBER
- SECONDS_IN_WAIT NUMBER
- STATE VARCHAR2(19)
26Columns In vsession_wait
- SID The ID of a session
- SEQ A number that increments by one on each new
wait - STATE An indicator of the session status
- WAITING The session is currently waiting, and
details of the wait event are provided. - WAITED KNOWN TIME The session is not waiting,
but information about the most recent wait is
provided. - WAITED SHORT TIME or WAITED UNKNOWN TIME The
session is not waiting, but partial information
about the most recent wait is provided.
27Columns In vsession_wait (continued)
- EVENT The name of a wait event
- PnTEXT The name of a parameter associated with
the wait event - Pn The value of the parameter in decimal form
- PnRAW The value of the parameter in raw form
- WAIT_TIME Length of most recent wait (in
centiseconds) if STATE WAITED KNOWN TIME - SECONDS_IN_WAIT How long current wait has been
so far if STATE WAITING
28Sample vsession_wait Query
- SQLgt SELECT FROM vsession_wait WHERE sid 16
- Â
- SID SEQ EVENT
- ---- ----- ------------------------------
- P1TEXT P1 P1RAW P2TEXT P2 P2RAW
- ------ ---- -------- ------ ---- --------
- P3TEXT P3 P3RAW WAIT_TIME SECONDS_IN_WAIT
- ------ ---- -------- --------- ---------------
- STATE
- -------------------
- 16 303 db file scattered read
- file 17 00000011 block 2721 00000AA1
- blocks 8 00000008 -1 0
- WAITED SHORT TIME
29System Event 10046
Setting event 10046 enables SQL trace, and can
optionally include wait event information and
bind variable data in trace files as well.
- Methods for setting system events
- event instance parameter
- dbms_system.set_ev
- oradebug
- ALTER SESSION SET events
30System Event 10046 Settings
- ALTER SESSION SET events
- '10046 trace name context forever, level N
Value of N Effect
1 Enables ordinary SQL trace
4 Enables SQL trace with bind variable values included in trace file
8 Enables SQL trace with wait event information included in trace file
12 Equivalent of level 4 and level 8 together
31Sample Trace Output
-
- PARSING IN CURSOR 1 len80 dep0 uid502 oct3
lid502 - tim2293771931 hv2293373707 ad'511dca20'
- SELECT / FULL / SUM (LENGTH(notes))
- FROM customer_calls
- WHERE status x
- END OF STMT
- PARSE 1c0,e0,p0,cr0,cu0,mis1,r0,dep0,og
0,tim2293771931 - BINDS 1
- bind 0 dty2 mxl22(22) mal00 scl00 pre00
oacflg03 oacfl20 - size24 offset0
- bfp09717724 bln22 avl02 flg05
- value43
- EXEC 1c0,e0,p0,cr0,cu0,mis0,r0,dep0,og4
,tim2293771931 - WAIT 1 nam'SQLNet message to client' ela 0
p1675562835 p21 p30 - WAIT 1 nam'db file scattered read' ela 3
p117 p2923 p38 - WAIT 1 nam'db file scattered read' ela 1
p117 p2931 p38 - WAIT 1 nam'db file scattered read' ela 2
p117 p2939 p38 - WAIT 1 nam'db file sequential read' ela 0
p117 p2947 p31
32Using Wait Event Information
- Four examples of how wait event information was
used to diagnose production problems
33Example 1 A Slow Web Page
- A dynamic web page took several seconds to come
up. Developers tracked the bottleneck down to one
query. The execution plan showed that the query
was using an index, so the developers thought
there might be a database problem.
34The Slow Query
- SELECT COUNT ()
- FROM customer_inquiries
- WHERE status_code b1
- AND status_date gt b2
- Â
- Execution Plan
- --------------------------------------------------
-------- - 0 SELECT STATEMENT OptimizerCHOOSE
- 1 0 SORT (AGGREGATE)
- 2 1 TABLE ACCESS (BY INDEX ROWID) OF
'CUSTOMER_INQUIRIES' - 3 2 INDEX (RANGE SCAN) OF
'CUSTOMER_INQUIRIES_N2' (NON-UNIQUE)
The CUSTOMER_INQUIRIES_N2 index was a
concatenated index with status_code as its first
column. The status_date column was not indexed.
35Wait Events For One Users Session
- A query against vsession_event after the query
ran in isolation yielded - TOTAL TIME
- EVENT WAITS WAITED
- ------------------------------ ----- ------
- db file scattered read 15 3
- db file sequential read 6209 140
- latch free 2 1
- SQLNet message to client 8 0
- SQLNet message from client 7 21285
36The Path To Problem Resolution
- What we learned from wait event information
- The query performed a large number of index
lookups. - 1.40 seconds were spent waiting on the index
lookups, plus any CPU overhead. - Areas to research further
- Was the database server CPU starved?
- Was the index lookup selective?
- Idea Modify the query to use a full table scan
instead of the index.
37Research Results
- The database server was CPU starved. The run
queue length often exceeded twice the number of
CPUs on the server. - Using just the status_code column of the
CUSTOMER_INQUIRIES_N2 index made for a very
unselective index lookup. Over 90 of the rows
in the table had a status code of 12. - A full table scan against CUSTOMER_INQUIRIES
appeared to run faster than using the index.
38Problem Resolution
- A query against vsession_event after the
modified query ran in isolation yielded - Â
- TOTAL TIME
- EVENT WAITS WAITED
- ------------------------------ ----- ------
- db file scattered read 460 13
- db file sequential read 3 1
- latch free 1 0
- SQLNet message to client 10 0
- SQLNet message from client 9 18317
39Analyzing The Results
- The rule of thumb that a full table scan is
better than a scan of an unselective index is
true. - I/O systems can perform a few multi-block I/O
requests much faster than many single-block I/O
requests. - Physical reads require a small amount of CPU
time. Lack of available CPU can make an I/O
intensive statement run even slower, although the
wait event interface will not show this.
40Example 2 Slow Batch Processing
- An additional data feed program was added to the
nightly batch processing job queue, and the
overnight processing no longer finished before
the morning deadline. More CPUs were added to the
database server, but this did not improve
processing speed significantly.
41Summarizing Wait Events During A Period Of Time
- vsystem_event shows wait event totals since
instance startup. - vsession_event shows wait event totals since the
beginning of a session. - You can capture view contents at different points
in time and compute the delta in order to get
wait event information for a specific period of
time. - Statspack and many third-party tools can do this
for you.
42Simple Script To See Wait Events During A 30
Second Time Period
- CREATE TABLE previous_events AS
- SELECT SYSDATE timestamp, vsystem_event.
- FROM vsystem_event
- Â
- EXECUTE dbms_lock.sleep (30)
- Â
- SELECT A.event,
- A.total_waits
- - NVL (B.total_waits, 0) total_waits,
- A.time_waited
- - NVL (B.time_waited, 0) time_waited
- FROM vsystem_event A, previous_events B
- WHERE B.event () A.event
- ORDER BY A.event
43Wait Events During 30 Seconds Of Batch Processing
- EVENT TOTAL_WAITS
TIME_WAITED - ------------------------------ -----------
----------- - LGWR wait for redo copy 115
41 - buffer busy waits 53
26 - control file parallel write 45
44 - db file scattered read 932
107 - db file sequential read 76089
6726 - direct path read 211
19 - direct path write 212
15 - enqueue 37
5646 - free buffer waits 11
711 - latch free 52
44 - log buffer space 2
8 - log file parallel write 4388
1047 - log file sequential read 153
91 - log file single write 2
6 - log file switch completion 2
24 - write complete waits 6
517
44The Path To Problem Resolution
- What we learned from wait event information
- There appeared to be significant lock contention.
- In 30 seconds of elapsed time, sessions spent
over 56 seconds waiting for locks. - Areas to research further
- What type of locks are being waited on? Row-level
locks? Table-level locks? Others? - If the locks are table-level or row-level, then
which database tables are experiencing
contention? Which SQL statements are causing the
contention?
45Tracing Waits In A Session
- The following commands were used to enable wait
event tracing in the process with Oracle PID 13 - Â
- SQLgt oradebug setorapid 13
- Unix process pid 19751,
- image oracle_at_dbserver.acme.com (TNS V1-V3)
- SQLgt oradebug session_event
- gt 10046 trace name context forever, level 8
- Statement processed.
- SQLgt
46Trace File Contents
- EXEC 5c0,e0,p3,cr2,cu1,mis0,r1,dep1,og4
,tim2313020980 - XCTEND rlbk0, rd_only0
- WAIT 1 nam'write complete waits' ela 11 p13
p22 p30 - WAIT 4 nam'db file sequential read' ela 4
p110 p212815 p31 - WAIT 4 nam'db file sequential read' ela 1
p110 p212865 p31 - WAIT 4 nam'db file sequential read' ela 5
p13 p2858 p31 -
- PARSING IN CURSOR 4 len65 dep1 uid502 oct6
lid502 - tim2313021001 hv417623354 ad'55855844'
- UPDATE CUSTOMER_CALLS SET ATTR_3 b1 WHERE
CUSTOMER_CALL_IDb2 - END OF STMT
- EXEC 4c1,e10,p3,cr2,cu3,mis0,r1,dep1,og
4,tim2313021001 - WAIT 4 nam'db file sequential read' ela 0
p110 p25789 p31 - WAIT 4 nam'enqueue' ela 307 p11415053318
p2196705 p36744 - WAIT 4 nam'enqueue' ela 307 p11415053318
p2196705 p36744 - WAIT 4 nam'enqueue' ela 53 p11415053318
p2196705 p36744 - WAIT 4 nam'db file sequential read' ela 0
p110 p2586 p31 - WAIT 4 nam'db file sequential read' ela 1
p13 p2858 p31 - EXEC 4c0,e668,p3,cr5,cu3,mis0,r1,dep1,og
4,tim2313021669
47Understanding The enqueue Wait Event
- SQLgt SELECT parameter1,parameter2,parameter3
- 2 FROM vevent_name
- 3 WHERE name 'enqueue'
- Â
- PARAMETER1 PARAMETER2 PARAMETER3
- ------------ ------------ ------------
- namemode id1 id2
- Â
- SQLgt SELECT CHR (1415053318/65536/256)
- 2 CHR (MOD (1415053318/65536, 256)),
- 3 MOD (1415053318, 65536) lock_mode
- 4 FROM SYS.dual
- Â
- CH LOCK_MODE
- -- ----------
- TX 6
48Analyzing The Results
- Contention for exclusive locks on rows in the
customer_calls table was responsible for
substantial delays in processing. - Looking at the row_wait_obj and row_wait_row
columns in vsession would have identified the
exact rows undergoing contention.
49Problem Resolution
- Multiple programs were attempting to update the
same rows in tables at the same time. Contention
could be reduced by doing one or more of the
following - Running conflicting programs separately
- Reducing lock scope
- Reducing lock duration
50Example 3 A Slow Client/Server Application
- A client/server application was taking several
seconds to bring up a certain screen. The delay
was occurring during startup before the user had
a chance to kick off a query. The only thing
happening in the form on startup was some
fetching of basic reference data. All of the SQL
had been tuned and was known to run very quickly.
51Manipulating timed_statistics
- The timed_statistics parameter can be changed at
any time at the session level with the following
commands - Â
- ALTER SESSION SET timed_statistics TRUE
- ALTER SESSION SET timed_statistics FALSE
- Â
- Manipulate timed_statistics to collect wait event
times during certain specific points of
processing for debugging purposes.
52Wait Events During Form Startup
- Modifying the form to disable timed_statistics at
the end of the form startup logic yielded the
following information in vsession_event - TOTAL TIME
- EVENT WAITS WAITED
- ------------------------------ ----- ------
- SQLNet message to client 18520 6
- SQLNet message from client 18519 1064
- Â
- vsesstat showed the following
- Â
- NAME VALUE
- ------------------------------ ----------
- session logical reads 9295
- CPU used by this session 82
- physical reads 0
53The Path To Problem Resolution
- What we learned from wait event information
- There were over 18,000 network roundtrips during
form startup, almost exactly two for every
logical read. - The Oracle process spent over 10 seconds waiting
for activity from the client. Since timed
statistics were disabled at the end of the form
startup logic, this does not include time spent
waiting on the end user. - Areas to research further
- How many rows of data does the form read from the
database during the startup phase? - Does the form really need to fetch all of this
data? - Is the form fetching one row at a time or is it
using Oracles array processing interface?
54Research Results
- The form was fetching 9245 rows of reference data
during startup. - All of this data was necessary none could be
eliminated. - All data was fetched one row at a time.
55Problem Resolution
- The startup logic of the form was modified to
fetch 100 rows at a time. This yielded the
following information in vsession_event - Â
- TOTAL TIME
- EVENT WAITS WAITED
- ------------------------------ ----- ------
- SQLNet message to client 200 0
- SQLNet message from client 199 28
- Â
- vsesstat showed the following
- Â
- NAME VALUE
- ------------------------------ ----------
- session logical reads 135
- CPU used by this session 3
- physical reads 0
56Analyzing The Results
- Fetching rows 100 at a time instead of one at a
time dramatically reduced network roundtrips. - Reducing network roundtrips reduced time spent
waiting on the network. - Fetching rows 100 at a time also significantly
reduced the number of logical reads, and
therefore the amount of CPU time required.
57Example 4 A Floundering Database Server
- The DBA group discovered that one of the database
servers was completely overwhelmed. Connecting to
the database took a few seconds, selecting from
SYS.dual took more than a second. Everything on
the system ran very slowly.
58Longest Waits In vsystem_event
- EVENT TIME_WAITED
- ------------------------------ ----------------
- log file sync 326284
- write complete waits 402284
- control file parallel write 501697
- db file scattered read 612671
- db file sequential read 2459961
- pmon timer 31839833
- smon timer 31974216
- db file parallel write 1353916234
- rdbms ipc message 6579264389
- latch free 8161581692
- SQLNet message from client 15517359160
59The Path To Problem Resolution
- What we learned from wait event information
- Most of the waits involved idle events or I/O
events. - A large amount of time was spent waiting on
latches. - Areas to research further
- How long has the instance been up?
- Which latches are experiencing contention?
60Research Results
- The instance had been up for about seven days.
- The latch contention was in the shared pool and
library cache, as evidenced by a query against
vlatch_misses
PARENT_NAME SUM(LONGHOLD_COUNT)
------------------------------
------------------- enqueue hash chains
614 enqueues
637 Checkpoint queue latch
790 session allocation
1131 messages
1328 session idle bit
2106 latch wait list
5977 modify parameter
values 6242 cache buffers
chains 9876 row cache
objects 38899 cache
buffers lru chain
125352 shared pool
4041451 library cache
4423229
61Further Research Results
- The shared pool was 400 Mb in size.
- There were over 36,000 statements in the shared
pool, almost all executed exactly once. - The application was not using bind variables.
- Modifying the application to use bind variables
was not an option. - Setting the cursor_sharing parameter to FORCE was
also not an option.
62Problem Resolution
- Bigger is not always better!
- Â
- Reducing the shared pool to 100 Mb provided
plenty of space for sharable statements while
reducing the effort required by Oracle to
maintain the library cache LRU list. This reduced
latch contention and boosted performance.
63A Summary Of Wait Event Techniques
- Isolating a statement and analyzing its wait
events - Collecting wait event data for a session or the
entire instance at two different times and
computing the difference to find the wait events
during a specific period of time - Enabling wait event tracing in a session
64A Summary Of Wait Event Techniques (continued)
- Enabling and disabling timed statistics
dynamically to measure wait event times for a
specific section of code - Ranking cumulative wait event data in order to
see which wait events account for the most wait
time
65In Conclusion
- The wait event interface gives you access to a
detailed accounting of how Oracle processes spend
their time. - Wait events touch all aspects of the Oracle
database server. - The wait event interface will not always give you
the answer to every performance problem, but it
will just about always give you insights that
guide you down the proper path to problem
resolution.
66The White Paper
- A companion white paper to this presentation is
available for free download from my companys
website at - www.dbspecialists.com/present.html
67Resources from Database Specialists
- The Specialist newsletter
- www.dbspecialists.com/specialist.html
- Database Rx
- dbrx.dbspecialists.com/guest
- Provides secure, automated monitoring, alert
notification, and analysis of your Oracle
databases
68Contact Information
- Roger Schrag
- Database Specialists, Inc.
- 388 Market Street, Suite 400
- San Francisco, CA 94111
- Tel 415/344-0500
- Email rschrag_at_dbspecialists.com
- Web www.dbspecialists.com