Title: Handling Common Oracle Wait Events
1Handling Common Oracle Wait Events
tzf_at_fk_at_
????? ??????????
Bienvenido
??
??????
??
Welcome
Willkomen
Selamat Datang
Bienvenue
Richmond Shee Senior Database Architect Sprint
Corporation richmond.j.shee_at_mail.sprint.com
2 I received this book on Tuesday and I literally
could not put it down. I consumed it like a great
thriller. It contains a great deal of information
that can be found no where else in print.
Performance monitoring and tuning with the Oracle
Wait Interface is still new to many Oracle DBAs,
even seasoned ones, precisely because the details
of how to gather and interpret the information
have been difficult to come by until now...
John Smiley
http//shop.osborne.com/cgi-bin/oraclepress/007222
729X.html
3 Where this book excels is bridging the gap
between the perfect measurement and implementable
solutions by explaining why's and how's of the
problem. I was amazed how the authors put
together rational explanations of common wait
events like latch free, bolstered by the
elaboration of internals like hash buckets, cache
buffer chains and how to rectify those - it all
seems so simple when it comes out in the book.
Whether you are a veteran DBA who have seen all
the battles since the Civil War or a rookie just
starting out, this book is for you, a vital
weapon in your arsenal, especially the scripts
for identifying trouble spots. If I'm allowed to
keep only one book on Oracle - this will be
it. - Arup Nanda
http//www.amazon.com/exec/obidos/tg/detail/-/0072
22729X
4Agenda
- OWI Monitoring and data capture
- Handling Common Oracle Wait Events Going Beyond
P1, P2, and P3
Intermediate level
OWI monitoring data collection Root cause
analysis
Novice level
Separating symptoms from problems
Beginner level
Event attributes p1,p2,p3, Event
classification, Idle and non-idle events OWI
views
New Convert
Paradigm shift
5Objectives Scope
- Take Home
- So that you can discover the root cause of
performance problems and answer the 64-thousand
dollar questions - Why did the job run so slowly?
- Why did the job run so quickly?
- Scope
- OWI Monitoring Oracle7 to Oracle9i Database
- Handling OWI Oracle7 to Oracle Database 10g
6 7OWI Monitoring Data Capture
- Q1) Why is historical performance data important?
- Q2) Where is the best source of performance data?
- VSYSTEM_EVENT?
- VSESSION_EVENT?
- VSESSION_WAIT?
- Q3) What is a good data capture method and
sampling frequency? - Trace event 10046?
- Statspack?
8OWI Monitoring Data Capture
The importance of historical performance data
- Users expect their DBAs to be omniscient.
- DBAs are expected to be aware of performance
issues 24x7. - You need a history of all foreground processes
that ran in the instance.
9OWI Monitoring Data Capture
Determine the best source data
- VSYSTEM_EVENT
- Pros
- Cons system-level data
VSESSION_EVENT Pros session-level
granularity Cons session-level granularity
- VSESSION_WAIT (XKSUSECST)
- Pros Fine-grain data
- Cons
- Changes quickly
- Data requires translation
- High volume
10OWI Monitoring Data Capture
- Determine the best data capture method and
sampling frequency - Requirement A performance data collector that
is capable of monitoring all foreground processes
on a 24x7 basis. - Desired features
- Wait-based philosophy
- Low overhead
- Always-on
- Repositories (wait events, runtime statistics,
SQL statements, and SQL plans)
11OWI Monitoring Data Capture
- Consider the trace event 10046
- Oracles most comprehensive trace facility. It
captures wait events, SQL statements, bind
variables. - Fine-grain data is best for performance, but
requires a lot of disk space. - Disk space and overhead limitations prevent
instance-wide monitoring. - Trace fileis not user friendly WAIT 12
nam'db file scattered read' ela 0 p1106
p260227 p38has no cross referencing WAIT
1 nam'enqueue' ela 3007483 p11415053318
p2393259 p3149is bug prone WAIT 0 nam'db
file parallel write' ela 2 p1-144 p21 p30 - Documentation for interpreting trace file is
seldom available to the public and is often
out-of-date. - Trace event 10046 may add significant overhead to
the RDBMS and further degrade the performance of
an already slow running process.
12OWI Monitoring Data Capture
- Summary - Trace event 10046
13OWI Monitoring Data Capture
- Consider the database logoff trigger
- Excellent for session-level summary.
- Great for benchmarking.
- Instance-wide monitoring capability.
- Trigger overhead depends mainly on the body code.
Only apply when a session logs off. - Some PL/SQL coding is necessary.
- Disk space requirement is generally low depends
on the logoff rate. - Only available in Oracle8i and above.
- Not suitable for root cause analysis which
requires fine-grain data.
14OWI Monitoring Data Capture
An application of the database logoff trigger
- CATEGORY SUBCATEGORY WAIT_EVENT
VALUE PERCENT - ---------- ----------- ---------------------------
-- ---------- ---------- - CPU OTHER Fetch, Execute, Lookups,
etc 41089 34.37 - PARSE parse time cpu
14139 11.83 - RECURSIVE recursive cpu usage
552 .46 - DISK I/O DIRECT I/O direct path read
0 0 - Direct path write
0 0 - FULL SCANS db file scattered read
854 .71 - NORMAL I/O db file sequential read
3645 3.05 - LATENCY COMMITS log file sync
1469 1.23 - FILE OPS file open
1 0 - LATCH latch free
57470 48.07 - LOG FILE log file switch completion
6 .01 - NETWORK SQLNet message to client
179 .15 - SQLNet more data from
client 2 0 - SQLNet more data to
client 36 .03 - OTHER buffer busy waits
107 .09
15OWI Monitoring Data Capture
- Summary Database logoff trigger
16OWI Monitoring Data Capture
- Consider Statspack
- Report has a lot of information that allows you
to examine performance from several perspectives. - Instance-level snapshots offer coarse-grain
information that roughly indicates there is a
problem but not specifically where the problem is
- No different than querying vsystem_event,
vsysstat, vlatch, etc. - Session-level snapshots? How are you going to
automate it? Even if session-level snapshot
automation is not an issue, the data is still too
coarse. No different than querying
vsession_event and vsesstat. - Difficulty in determining the best sampling
frequency.
17OWI Monitoring Data Capture
18OWI Monitoring Data Capture
- Problem There is none suitable. Until Oracle10g,
you have to develop your own tool or purchase
very expensive 3rd party tools.
Too expensive
Too coarse
Too coarse
19OWI Monitoring Data Capture
- BYOT Build your own data capture tool using
PL/SQL
- Three major areas to consider
- Sampling frequency
- Repository
- Events to monitor
20OWI Monitoring Data Capture
- BYOT Build your own data capture tool using
PL/SQL
- Source data VSESSION_WAIT (XKSUSECST)
- Determine the sampling frequency
- Affects the quantity and granularity of data, as
well as the data capture method (Unix Shell
script, PL/SQL procedure, Unix Cron, SNP
background process, etc.)
21OWI Monitoring Data Capture
- BYOT Build your own data capture tool using
PL/SQL
Repository minimum two repositories (wait
event and SQL statement).
- SQL statements help set the context and get you
closer to the problem. - Event Buffer busy waits P1P2 FOO table P3
220 - Also helps developers to locate the right module.
22OWI Monitoring Data Capture
Events to ignore KXFX Execution Message Dequeue
Slave PX Deq Execution Msg KXFQ kxfqdeq -
normal deqeue PX Deq Table Q Normal Wait for
credit - send blocked PX Deq Credit send
blkd Wait for credit - need buffer to send PX Deq
Credit need buffer Wait for credit - free
buffer PX Deq Credit free buffer parallel query
dequeue wait PX Deque wait Parallel Query Idle
Wait Slaves PX Idle Wait slave wait dispatcher
timer virtual circuit status pipe get rdbms ipc
message rdbms ipc reply pmon timer smon
timer PL/SQL lock timer WMON goes to
sleep client message SQLNet message from client
( debatable) Null event ( debatable)
Events to monitor db file sequential read db
file scattered read latch free direct path
read direct path write Enqueue library cache
pin buffer busy waits free buffer waits
23OWI Monitoring Data Capture
BYOT Build your own data capture tool using
PL/SQL
- 24x7 monitoring.
- Wait event history.
- Immediate answer to why a certain process runs
like molasses. - Proactive performance management.
- SQL statement and plan repositories.
- Jobs elapsed time can be determined from the
sampling intervals. - Low disk space requirement.
- Extensive PL/SQL coding.
- Overhead depends on the quality of code.
- Not suitable for short-running jobs.
24OWI Monitoring Data Capture
25OWI Monitoring Data Capture
- Chapter 4 contains a detailed discussion of OWI
monitoring and data capture, and sample codes.
26- Last agenda item
- Handling Common Oracle Wait Events
- Going beyond P1, P2, and P3
27Handling Wait Events
- Db file sequential read Db file scattered read
- At what point do these wait events become a
problem? - What are they a symptom of?
- Low cache hit ratio
- Slow I/O subsystem
- Physical I/O calls
- Small block size
- Small buffer cache
- Handling these wait events requires you to know-
- The amount of time the events are costing the
process. - The SQL statement that is associated with the
events.
Solution SQL tuning
28Handling Wait Events
Latch Free
- Latch Free contention is symptomatic of?
- Low SPIN_COUNT.
- Inefficient SQL statements.
- Concurrency coupled with high demands for
resources. - Insufficient number of latches.
- Insufficient or slow CPU.
- Handling the latch free contention requires you
to know- - The type of latch sessions are competing for (28
individual latch wait events in Oracle10g Release
1). - The amount of time a session spent waiting on
latches. - The SQL statement that is associated with the
event.
29Handling Wait Events
Latch Free Shared Pool Library Cache
- Contention for the Shared Pool Library Cache
latches is symptomatic of? - Hard parses literal SQL statements.
- Soft parses.
- Oversized shared pool.
- High version count.
- Bad application.
- Solution If not symptom c), the real fix has to
come from the application. - Workarounds
- Set CURSOR_SHARING FORCE
- Set SESSION_CACHED_CURSORS
30Handling Wait Events Latch Free Cache Buffers
Chains
A Working Set
Hash Latch
LRU
LRUW
Hash Bucket
Buffer Header
Hash Chain
Buffers Memory
31Handling Wait Events
Latch Free Cache Buffers Chains
- Contention for the CBC latch is symptomatic of?
- Inefficient SQL statement.
- Hot blocks.
- Long hash chains.
- Insufficient number of latches.
- Handling the CBC latch contention requires you to
know- - If the contention is widespread or localized to a
particular latch. - The SQL statements that participate in the
competition.
32Handling Wait Events
Latch Free Cache Buffers Chains
- Solutions
- Tune the application and SQL statements.
- Reduce the level of concurrency.
- Workarounds
- Spread the hot blocks across multiple CBC
latches. - Increase the _SPIN_COUNT (9i and above, use
_LATCH_CLASS and _LATCH_CLASSES). - Increase _DB_BLOCK_HASH_BUCKETS.
- Increase _DB_BLOCK_HASH_LATCHES.
33Handling Wait Events Buffer Busy Waits
A Working Set
Hash Latch
LRU
LRUW
Hash Bucket
Buffer Header
Hash Chain
Buffers Memory
34Handling Wait Events
Buffer Busy Waits
- BBW contention is symptomatic of?
- Read/read, read/write, or write/write contention.
- Corrupted buffer pin.
- Insufficient INITRANS.
- Large block size.
- Handling the BBW contention requires you to
know- - The amount of time a session spent waiting on the
event. - The reason code that represents why a process
fails to get a buffer pin. - The class of block that the buffer busy waits
event is for. - The SQL statements that are associated with the
event. - The segment that the buffer belongs to.
35Handling Wait Events
Buffer Busy Waits
- Solutions depend on the class of block and reason
code - BBW contention for data block class (class 1),
reason code 130 - Reduce the level of concurrency or change the way
the work is partitioned between the parallel
threads. - Optimize the SQL statement to reduce the number
of physical and logical reads. - Increase the number of FREELISTS and FREELIST
GROUPS. - BBW contention for data block class (class 1),
reason code 220 - Reduce the level of concurrency or change the
partitioning method. - Reduce the number of rows in the block.
- Rebuild the object in another tablespace with a
smaller block size (Oracle9i and above).
36Handling Wait Events
Buffer Busy Waits
- Solutions depend on the class of block and reason
code - BBW contention for data segment header (class 4)
- Increase the number of FREELISTS and FREELIST
GROUPS of the identified object. - Ensure the gap between PCTFREE and PCTUSED is not
too small. - Ensure the next extent size is not too small.
- BBW contention for undo segment header (class
17) - Applies to rollback segment, not the
system-managed undo. - Create additional rollback segments.
- Ensure the next extent size is not too small.
- BBW contention for undo blocks (class 18)
- Application tuning.
37Handling Wait Events
Free Buffer Waits
- Free Buffer Waits wait is symptomatic of?
- Small buffer cache.
- Insufficient number of DBWR processes.
- Inefficient SQL statement.
- Slow I/O subsystem.
- Delayed block cleanout.
- Handling the Free Buffer Waits event requires you
to know- - The amount of time a session spent waiting on the
event. - The SQL statements that are associated with the
event. - The number of DBWR processes.
- The I/O operation and database storage system.
38Handling Wait Events
Free Buffer Waits
- Solutions
- Optimize the SQL statements.
- Increase the number of DBWR processes.
- Use appropriate I/O operation (async or sync).
- Lower the FAST_START_MTTR_TARGET value.
- Reduce the buffer cache size.
- Increase the buffer cache size.
- Pre-scan the table after each load.
39Handling Wait Events
Log File Sync
- Log File Sync wait is symptomatic of?
- Oversized log buffer.
- High commit frequency.
- Bad application.
- Slow LGWR process.
- Handling the Log File Sync event requires you to
know- - The amount of time a session spent waiting on the
event. - The type of job (batch or OLTP) that is
associated with the event. - Solution
- Reduce the commit frequency.
- Workarounds
- Reduce the log buffer size or lower the
_LOG_IO_SIZE. - Increase LGWR I/O throughput.
40Handling Wait Events
Enqueue
- Enqueue contention is symptomatic of?
- Concurrent access to the DBMS_AQ package.
- Concurrent transactions with incompatible lock
requests for a database resource. - Concurrent transactions with incompatible lock
requests for a latch. - Poor application design.
- Handling the Enqueue contention requires you to
know- - The type and mode of enqueue the sessions are
competing for (All enqueues have independent wait
event names in Oracle10g). - The amount of time a session spent waiting on
enqueues. - The SQL statement that is associated with the
event.
41Handling Wait Events
TX enqueue in mode 6
- Contention for the TX enqueue in mode 6 is for
row-level locks. - In Oracle10g, this enqueue name is enq TX row
lock contention.
- Solutions
- Commit or rollback the transaction holding the
lock. - Fix the application so that sessions dont go
after the same rows. - Workaround
- None
42Handling Wait Events
TX enqueue in mode 4
- Contention for the TX enqueue in mode 4 can be
due to- - ITL shortage (Oracle10g enq TX allocate ITL
entry) - Unique key enforcement
- Bitmap index entry
- Solution depends on the object of contention
- Increase the number of INITRANS.
- Prevent multiple sessions from inserting the same
key value into a table. - Dont use bitmap indexes.
43Handling Wait Events
TM enqueue in mode 3,4,5
- Contention for the TM enqueue in mode 3,4,5 is
normally due to unindexed foreign key columns.
- Solution
- Index the foreign key columns of the object
identified by the TM enqueue.
44OWI Monitoring Data Capture
- Chapters 5, 6, and 7 contain a detailed
discussion of how to handle common Oracle wait
events.
45Questions
Q
- Are you able to take home one practical
application from this session?
46Thank You!
Handling Common Oracle Wait Events
- Richmond Shee
- Senior Database Architect
- Sprint Corporation
- richmond.j.shee_at_mail.sprint.com