Title: S311441%20Practical%20Performance%20Management%20for%20Oracle%20Real%20Application%20Clusters
1(No Transcript)
2S311441 Practical Performance Management for
Oracle Real Application Clusters
- Michael Zoll, Consulting Member of Technical
Staff - Barb Lundhild, Product Manager, Oracle Real
Application Clusters
3The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into
any contract. It is not a commitment to deliver
any material, code, or functionality, and should
not be relied upon in making purchasing
decisions.The development, release, and timing
of any features or functionality described for
Oracles products remains at the sole discretion
of Oracle.
4Agenda
ltInsert Picture Heregt
- Oracle RAC Infrastructure and Technical
Fundamentals - Application and Database Design
- Common Problems and Symptoms
- Diagnostics and Problem Determination
- Appendix
5Objective
- Convey a few simple and fundamental concepts of
Oracle RAC performance - Summarize application level performance and
scalability information - Provide some simple sizing hints
- Give exemplary overview of common problems and
solutions - Builds on similar presentation from OOW 2008
- http//www.oracle.com/technology/products/database
/clustering/pdf/s298716_oow2008_perf.pdf
6ltInsert Picture Heregt
Oracle RAC Infrastructure Technical
Fundamentals, Sizing and Configuration
7Oracle RAC Architecture
//
public network
VIPn
VIP2
VIP1
Service
Service
Service
Node 2
Node n
Listener
Node1
Listener
Listener
SCAN_Listener
SCAN_Listener
SCAN_Listener
instance 1
instance 2
instance n
ASM
ASM
ASM
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Operating System
Operating System
Operating System
shared storage
Redo / Archive logs all instances
Database / Control files
Managed by ASM
OCR and Voting Disks
8Global Cache and Global Enqueue ServiceProcesses
and Functions
SGA
Runs in Real Time Priority
Library Cache
Dictionary Cache
Global Resource Directory
Global Enqueue Service
Global Cache Service
Oracle Process
LGWR
DBW0
LMON
LMD0
LMSx
Oracle Process
Cluster Private High Speed Network
9Cache Hierarchy Local Cache, Global Cache and
Disk
10Global Cache Access
LGWR
Shadow process
Receive
11Basic Performance Facts
- Global Cache access is 100 - 500 usecs (
roundtrip ) - Data immediately served from remote instances via
private, high speed interconnect - Redo may have to be written to log file before
send if data was changed and has not been
committed yet - Performance varies with network infrastructure
and network protocol - Maximum network hops is 3 messages
- For clusters with more than 2 nodes, independent
of total cluster size - CPU cost per OLTP transaction
- Dependent on locality of access , I.E. messages
per tx
12Basic Performance Facts Latency (UDP/GbE and
RDS/IB )
Block size RT (ms) 2K 4K 8K 16K
UDP/GE 0.30 0.31 0.36 0.46
RDS/IB 0.12 0.13 0.16 0.20
Lower CPU cost relative to protocols and network
infrastructure
- Actual interconnect latency is generally not the
problem unless you have exceeded capacity or you
are experiencing errors
13Causes of Latency
- CPU cycles for protocol and process scheduling is
80 of latency - LMS is critical resource
- process concurrency and context switching
increaseCPU path length - Other influences on latency and CPU cost and
latency - Load factors ( CPU utilization )
- Total bandwidth
- Network frame size
- NIC Offload capabilities
14Private Interconnect
- Network between the nodes of an Oracle RAC
cluster MUST be private/dedicated to traffic
between Oracle RAC nodes - Large ( Jumbo ) Frames for GbE recommended
- Avoids fragmentation and reassembly ( 8K / 1500
MTU 6 fragments ) - Interconnect bandwidth should be tested with
non-Oracle utilities ( e.g. iPerf ) - No packet loss at 75 - 80 of bandwidth
utilization
15 Interconnect Bandwidth
- Generally, 1Gb/sec sufficient for performance
and scalability in OLTP. - DSS/DW systems should be designed with gt 1Gb/sec
capacity categorically - Prediction of interconnect traffic is difficult
- Depends on transaction instruction length per
message - Empirical rule of thumb 1Gb/sec per 32 CPU Cores
- Infiniband and 10GbE are supported for scale-out
16ltInsert Picture Heregt
Performance and Scalability of Applications and
Database Design with RAC
17General Scalability
- Scaling OLTP workloads, DML intensive
- Scale well, if contention is little and
database/working set size scales ( I.E. add node
when demand grows) - Read intensive workloads scale predictably and
linearly - Bigger cache when adding more nodes
- Faster read access to global cache than to disk,
less disk IO - If cluster-size and database size growth are
balanced, system will perform and scale well
18Performance and Scaling in Application and
Database Design
- Response Time Impact
- Index contention on INSERTS when index is
right-growing - system generated artificial keys such as
consecutive order numbers or natural keys such
as dates - UPDATES or DELETES to rows in a small working
set - Session logging and tracking
- First-in first-out queues
- State of messages in queues
- Bulk INSERTS of large amounts of data
- LOBS
19DML Contention and Serialization
Modification intensive operations on small set of
( cached) blocks
busy blocks
busy blocks
Table T
Table T
Index I
Index I
INSERT INTO I WHERE Key sequence UPDATE T SET
WHERE row in blocks1..n and n is a small
number
20Performance and Scaling in Application and
Database Design
- CPU Cost due to Inter-Instance Messaging and
non-linear scaling - In-memory databases
- Working set spans multiple buffer caches
- Frequent modifications and reads of recent
modifications - Working set fits into memory of one instance
- Locality of access worsens when node are added
and users are load balanced - Scale as long as sufficient CPU power is
available
21DML on Small Working Set
Frequent modification of a non-scaling data set ,
blocks move around often
Working Set could be cache in 1 instance but is
modified on all instances CPU intensive, when
nodes are added, the rate of block transfers may
increase Because the working set does not scale
with it
22Read-intensive
Buffer Cache 32GB
Buffer Cache 32GB
Cache Transfer
Disk Transfer
Read
Read
Working Set on Disk 64GB
Eventually all blocks cached, Larger read cache
No messages in 11g
23Performance and Scalability
- Good linear or near-linear scaling out of box
- IO and CPU intensive applications with large
working sets and low proximity of access - Self-Service Web Applications ( Shopping Carts
etc. ) - CRM
- Document storage and retrieval
- Business Analytics and Data Warehousing
24Performance and Scalability
- Partitioning or load direction may optimize
performance - High proximity of access , e.g. adding and
removing from message queues - Advanced Queuing and Workflow
- Batch and bulk processes
- Order processing and Inventory
- Payroll processing
25Identifying Performance and Scaling Bottlenecks
in Database Design
- The Golden Rules
- 1 For first approximation, disregard
read-mostly objects and focus on the INSERT,
UPDATE and DELETE intensive indexes and
tablespace - 2 If DML access to data is random, no worries
if CPU is not an issue - 3 Standard SQL and schema tuning solves gt 80
of performance problems. There is usually only a
few problem SQL and Tables. - 4 Almost everything can be scaled out quickly
with load-direction and load balancing
26Identifying Performance and Scaling Bottlenecks
in Database Design
-
- Look for indexes with right-growing
characteristics - Keys comprising DATE columns or keys generated by
sequence numbers - Find frequent updates of small and compact
tables - smallfits into a single buffer cache
- Identify frequently and concurrently modified LOBs
27HOW ?
- Look at segment and SQL statistics in the
Automatic Workload Repository - Use Oracle Enterprise Manager Access Advisories
and Automatic Database Diagnostics Monitor (ADDM) - Instrumentation with MODULE and ACTION helps
identify and quantify components of the workload
28Quick FixesWithout modifying Application
-
- Indexes with right-growing characteristics
- Cache sequence numbers per instance
- Hash or range partition table with LOCAL indexes
- Frequent updates of small and compact tables
- Reduce block size ( 2K ) and row density of
blocks (PCTFREE 99 ) - Frequently modified LOBS
- Hash partitions ( 128 256 )
- FREE POOLS
29Quick Fixes
-
- Application Modules which may not scale or cannot
be quickly reorganized can be directed to
particular nodes via cluster managed services - For Administrator Managed and older releases
create service with 1 preferred node and the rest
available - For Policy Managed databases use a singleton
service - Some large scale and high performance
applications may be optimized by Data
Partitioning ( range, hash, or composites) and
routing per partitioning key in application
server tier - E.g. hash by CLIENT_ID, REGION etc.
30Leverage Connection Pools UCP Load Balancing
and Affinity
31Performance and Scalability Enhancements in
11.1 and 11.2
- Read Mostly
- Automatic policy detects read and disk IO
intensive tables - No interconnect messages when policy kicks in -gt
CPU savings - Direct reads for large ( serial and parallel )
scans - No locks , no buffer cache contention
- Good when IO subsystem is fast or IO processing
is offloaded to storage caches or servers ( e.g.
Exadata ) - Fusion Compression
- Reduces message sizes and therefore CPU cost
- Dynamic policies to make trade-off between disk
IO and global cache transfers
32ltInsert Picture Heregt
Performance Diagnostics and Checks Metrics and
Method
33Normal Behaviour
ltInsert Picture Heregt
- It is normal to see time consumed in
- CPU
- Db file sequential/scattered read
- Direct read
- Gc cr/current block 2-way/3-way ( Transfer from
remote cache ) - Gc cr/current grant 2-way ( Correlates with
buffered disk IOs ) - Average latencies should be within baseline
parameters - Most problems boil down to CPU, IO, network
capacity or applications issues
34Normality, Baselines and Significance
Most significant response time component
gc current block 3-way 3,289,371 4,019
1 8.0
gc buffer busy acquire 373,777 3,272
9 6.5
gc current block 2-way 3,982,284 3,192
1 6.3
gc current block busy 125,595 2,931 4
GC waits are influenced by interconnect or remote
effects which are not always obvious
Contention
Avg lt 1 ms
35Distributed Cause and Effect
Example Cluster-wide Impact of a Log File IO
Problem
Node 2
Node 2
ROOT CAUSE
Node 1
Node 1
Disk Capacity Disk or Controller Bottleneck
36Global Metrics View
WORKLOAD REPOSITORY report for
Instance OOW8
Host oowdb8
Local Symptom
Event gc current block busy 23ms
WORKLOAD REPOSITORY report for
Instance OOW4
Host oowdb4
Log file paralel write
20ms
Cause
Global Cache Transfer Stats
Avg global cache current block flush time (ms)
21 ms
Inst
Busy 4 data block 114,426
95.9 24.1
Remote instance table
37Investigate Serialization
gc buffer busy
9 ms
Waits for
gc current block busy
4 ms
Not OK!
38Example Segment Statistics
Segments by Global Cache Buffer Busy
ES_BILLING TABLE 97.41
Segments by Current Blocks Received
ES_BILLING TABLE 85.81
ANALYSIS TABLE ES_BILLING is frequently read
and modified on all nodes. The majority of
global cache accesses and serialization can be
attributed to this .
39Comprehensive Cluster-wide Analysis via Global
ADDM
Courtesy of Cecilia Gervasio, Oracle Server
Technologies, Diagnostics and Manageability
40ltInsert Picture Heregt
Common Problems and Symptoms
41Common Problems and Symptoms
ltInsert Picture Heregt
- Interconnect or Switch Problems
- Slow or bottlenecked disks
- High Log file Sync Latency
- System load and scheduling
42Symptoms of Interconnect Problems
Serialization
High latencies
Capacity Limit Congestion Dropped Packets
ROOT CAUSE
43Symptoms of an Interconnect Problem
Top 5 Timed Events
Avg Total wait
Call Event Waits Time(s)(ms) Time
Wait Class -------------------------------------
--------------------------------------------------
------------- log file sync 286,038 49,872
174 41.7 Commit gc buffer busy 177,315
29,021 164 24.3 Cluster gc cr block busy
110,348 5,703 52 4.8 Cluster gc cr block
lost 4,272 4,953 1159 4.1 Cluster cr
request retry 6,316 4,668 739 3.9
Other
Should never be here
Always a severe performance problem
44 Interconnect or IPC problems
Applications Oracle
gc blocks lost
Protocol processingIP,UDP
netstat s
Device Drivers
Ifconfig -a
NIC2
NIC1
Ports
Queues
Switch
45Causes and Diagnostics
- ifconfig a
- eth0 Link encapEthernet HWaddr
000BDB4BA204 - inet addr130.35.25.110 Bcast130.35.27.255
Mask255.255.252.0 - UP BROADCAST RUNNING MULTICAST MTU1500
Metric1 - RX packets21721236 errors135 dropped0
overruns0 frame95 - TX packets273120 errors0 dropped27
overruns0 carrier0 -
netstat s Ip 84884742 total packets
received 1201 fragments dropped after timeout
3384 packet reassembles failed
46Cluster-wide Impact of a Database File IO Problem
ROOT CAUSE
Node 2
Node 1
Disk Capacity Disk or Controller Bottleneck IO
intensive Queries
47Cluster-Wide Disk I/O Impact
Node 1
Top 5 Timed Events
Avg Total
wait Call Event
Waits Time(s)(ms) Time ---------------------
--------- ------------ ----------- ------
------ log file sync 286,038 49,872
174 41.7 gc buffer busy 177,315
29,021 164 24.3 gc cr block busy
110,348 5,703 52 4.8
CAUSE
Expensive Query in Node 2 Causes IO bottleneck
Node 2
1. IO on disk group containing redo logs is slow
Load Profile Per Second
--------------- Redo size
40,982.21 Logical reads 81,652.41 Physical
reads 51,193.37
2. Block shipping for frequently modified blocks
is delayed by log flush IO
3. Serialization builds up
48Log File Sync Latency Causes and Symptoms
Courtesy of Vinay Srihari, Oracle Server
Technologies, Recovery
49Causes of High Commit Latency
- Symptom of Slow Log Writes
- I/O service time spike may last only seconds or
minutes - Threshold-based warning message in LGWR trace
file - Warning log write elapsed time xx ms, size
xxKB - Dumped when write latency gt 500ms
- Large log_buffer makes a bad situation worse.
- Fixes
- Smooth out log file IO on primary system and
standby redo apply I/O pattern - Primary and Standby storage subsystem should be
configured for peaks - Apply bug fixes in appendix
Courtesy of Vinay Srihari, Oracle Server
Technologies, Recovery
50Block Server Process Busy or Starved
Node 2
Node 1
ROOT CAUSE
Too few LMSs LMS not in High Prio Memory Problems
( Swapping)
51Block Server Process Busy or Starved
Top 5 Timed Events
Avg Total
wait Call Event
Waits Time (s) (ms) Time Wait
Class ------------------------------ -
----------- ----------- ------ ------
---------- gc cr grant congested
26,146 28,761 1100 39.1 Cluster gc
current block congested 13,237 13,703
1035 18.6 Cluster gc cr grant 2-way
340,281 12,810 38 17.4
Cluster gc current block 2-way
119,098 4,276 36 5.8 Cluster
gc buffer busy
8,109 3,460 427 4.7 Cluster
On remote note
Avg message sent queue time (ms) 16.1
Congested LMS could not dequeue messages
fast enough
52 Block Server Processes Busy
- Increase LMS based on
- Occurrence of congested wait events
- Heuristics 75 80 busy is ok
- Avg send q time gt 1ms
- Caveat of CPUs should always be gt of LMS
to avoid starvation - On NUMA architectures and CMT
- Bind LMS to NUMA board or cores in processor set
- Fence off Hardware interrupts from the processor
sets
53High Latencies in Global Cache
ROOT CAUSE
54High Latencies
Event Waits Time (s)
AVG (ms) Call Time ----------------
------ ---------- ---------- ---------
-------- gc cr block 2-way 317,062
5,767 18 19.0 gc current block
2-way 201,663 4,063 20 13.4
----------------------------------------------
---------
Unexpected To see gt 1 ms (AVG ms should be
around 1 ms)
- Additional Diagnostics VSESSION_WAIT_HISTOGRAM
for events - Check network configuration ( private ? bandwidth
? ) - Check for high CPU consumption
- Runaway or spinning processes
55ltInsert Picture Heregt
Transient Problems and Hangs
56Temporary Slowness and Hang
- Can affect one or more instances in cluster
- Can be related
- IO issues at log switch time ( checkpoint or
archiver slow) - Process stuck waiting for IO
- Connection storm
- Hard to establish causality with AWR statistics
- Use Oracle Enterprise Manager and Active Session
History
57Temporary Cluster Wait Spike
Spike in Global Cache Reponse Time
SQL with High Global Cache Wait Time
Courtesy of Cecilia Gervasio, Oracle Server
Technologies, Diagnostics and Manageability
58Active Session History
Write 1 out of 10 samples
Direct-path INSERTS
Variable length rows
- In 10.2 , can identify local blocker for a hang
- In 11g , can identify global blocker
Courtesy of Graham Wood, Oracle Server
Technologies, Architect
59Temporary Slowness or Hang
Slowdown from 500-530
ORACLE_HOME/rdbms/admin/ashrpt.sql
60Additional Diagnostics
- For all slowdown with high averages in gc wait
time - Active Session History report ( all nodes )
- Set event 10708 on selected processes
- Event 10708 trace name context forever, level 7
- Collect trace files
- Set event 10899 system-wide
- Threshold based , I.E. no cost
- Continuous OS Statistics
- Cluster Health Monitor (IPD/OS)
- LMS, LMD, LGWR trace files
- DIA0 trace files
- Hang Analysis
61ltInsert Picture Heregt
Conclusions
62Golden Rules For Performance and Scalability in
Oracle RAC
- Thorough configuration and testing of
infrastructure is basis for stable performance - Anticipation of application and database
bottleneck and their possible magnified impact in
Oracle RAC is relatively simple - Enterprise Manager provides monitoring and quick
diagnosis of cluster-wide issues - Basic intuitive and empirical guidelines to
approach performance problems suffice for all
practical purposes
63A
http//otn.oracle.com/rac
64Recommended Sessions
DATE and TIME SESSION
Tuesday, October 13 100 PM Next Generation Database Grid - Moscone Sourt 104
Tuesday, October 13 230 PM Single Instance Oracle Real Application Clusters - Better Virtualization for Databases Moscone South 300
Wednesday, October 14 1145 AM Understanding Oracle Real Application Clusters Intenals - Moscone South 104
Thursday, October 15 900 AM Oracle ACFS The Awaited Missing Feature Moscone South 305
Visit us in the Moscone West Demogrounds Booth
W-037
65ltInsert Picture Heregt
Appendix
66References
- http//www.oracle.com/technology/products/database
/clustering/pdf/s298716_oow2008_perf.pdf - http//otn.oracle.com/rac
67Log File Sync Issues
- Missed post from LGWR to foreground
- commit is dependent on log file sync timeout
value - log file sync timeout was 1s, 100ms in 11.2 and
backports - One-offs/bundles available for 10.2.X, 11.1.0.7
- Broadcast On Commit(BOC) ack delays
- Missed post of LGWR by BOC ack receiver ( LMS )
- Incorrect bookkeeping of multiple outstanding
acks - 8850801, 7610362, 8220734
- One-offs/bundles available for 10.2.X, 11.1.0.7
- SYNC Standby log write latency
- bug 7643632 , bug 7646105
68(No Transcript)