Title: Tomas E' Canty ServerCare, Inc'
1Tomas E. CantyServerCare, Inc.
Oracle Real Application Clusters Best Practices
Tuning in 11g
2Speaker Qualifications
- Thomas E. Canty, President, ServerCare, Inc.
- Nearly 20 years of Oracle experience, starting
with version 5 - Has presented at IOUG, OpenWorld, NoCOUG, and
IASA - Has been a DBA, Developer, Architect, and IT
Manager - Has worked with Fortune 100 companies in
Healthcare, Technology, Pharmaceuticals, and
Telecom, as well as Major Universities - Has been running ServerCare for almost 10 years,
specializing in on-shore Remote DBA, Remote
Sysadmin, Implementation, and Consulting
888-918-6309 http//www.ServerCare.com
3Todays Agenda
- RAC Overview, Architecture, Cache Fusion
- Configuration Monitoring
- Tuning Areas
- Best Practices
- New Features Other Notes
- Conclusion
4Objectives
- Identify areas requiring tuning
- How to obtain metrics
- How to resolve bottlenecks
- Cover tuning areas with the most benefit
- Only RAC specific items will be covered
5RAC Overview
- RAC is highly availability, flexible, and
scalable - Shared database is accessible from all nodes in
the cluster - Can runs on commodity hardware such as
Linux-based x86 - Requires proper monitoring and tuning
- Capabilities and limitations should be understood
6Glossary Of Terms
- ADDM - Automatic Database Diagnostic Monitor,
tuning advice - AWR - Automatic Workload Repository, performance
statistics - Cache Fusion - Shares data in memory across nodes
- GCS - Global Cache Service, guarantees cache
coherency - GRD - Global Resource Directory, maps data in
memory - HBA Host bus adapter, connects host to network
and storage
7Glossary Of Terms (cont.)
- Interconnect - High speed, low latency private
network - Jumbo Frames - Network Maximum Transfer Unit
(MTU) - LMS - Lock Manager Service, transports blocks
across nodes - NIC Bonding - Logically combining 2 or more
physical NICs - UDP - User Datagram Protocol, supported for the
Interconnect - VIP - Virtual Internet Protocol, allows failover
for high availability - Well see more as we go
8RAC Architecture
- Clustered nodes CRS
- crsd
- ocssd
- evmd
- Shared storage
- OCRs (1 or 2)
- Voting Disks (3 or more)
- Network
- Public IPs
- Private Interconnect
- VIPs
9Overview of Cache Fusion
- Major component of RAC, and a key difference from
OPS - Enables sharing of data in memory across nodes
- Performed by Lock Manager Service (LMS)
- Maintained in Global Resource Directory (GRD)
- Guarantees cache coherency, read consistency
10SGA Structure Processes
- Details of Interconnect and Cache Fusion processes
11 Configuration Monitoring
12Interconnect
- Interconnect is non-routable, private network
- Dedicated switch, gigabit or faster
- Protocols UDP (RDS new for use in 10.2.0.3 and
higher) - Typical bandwidth utilization
- Normal 20 30
- Saturated gt70
13Verify Interconnect IP Addresses
- Ensure Interconnect IP is not using public
network - oracle_at_rac1 oifcfg getif
- bond0 10.10.10.0 global
cluster_interconnect - eth0 172.16.150.0 global public
- oracle_at_rac2 oifcfg getif
- bond0 10.10.10.0 global
cluster_interconnect - eth0 172.16.150.0 global public
- Database instance alert log posts Interconnect
and protocol - Query from the database
- vcluster_interconnects, vconfigured_interconnec
ts
14Network Statistics
- Use ifconfig -a
- Check configuration, RX TX errors, overruns
-
- oracle_at_rac2 /sbin/ifconfig -a
- bond0 Link encapEthernet HWaddr
001125A86C35 - inet addr10.10.10.2 Bcast10.10.10.3
Mask255.255.255.252 - . . . . . . . . . .
- RX packets657830061 errors0 dropped0
overruns0 frame0 - TX packets527418621 errors0 dropped0
overruns0 carrier0 - collisions0 txqueuelen0
- RX bytes579340506510 (539.5 GiB) TX
bytes430094970294 (400.5 GiB) - eth0 Link encapEthernet HWaddr
001125A86C34
15Network Packet Info by Protocol
- Use netstat s
- Contains details of network with packet
information - oracle_at_rac1 netstat -s
- Ip . . . .
- Tcp . . . .
- Udp
- 137338287 packets received
- 7376 packets to unknown port received.
- 0 packet receive errors
- 148822392 packets sent
- Use the ping utility to determine packet loss and
timing
16Verify Cluster Configuration
- Make sure cluster connection configuration is
correct - oracle_at_rac1 cluvfy comp nodecon n rac1
- Verifying node connectivity...
- Checking node connectivity...
- Node connectivity check passed for subnet
10.10.10.0 with nodes(s) rac1. - Node connectivity check passed for subnet
172.16.150.0 with node(s) rac1.
17System Monitoring
- CPU utilization top, mpstat
- Disk I/O times iostat
- Memory free
- Kernel messages - /var/log/messages,
/var/log/dmesg - Obtain cluster statistics crs_stat, srvctl
18Tuning
19General
- Stress test application on single instance
database first - Simulate I/O load (tools such as Orion and
Database Replay (11g)) - Modify OS parameters
- Modify Clusterware parameters
- Modify Database parameters
20AWR Report
- Global Cache Load Profile
- Global Cache Efficiency Percentages
- Messaging Statistics
- Consistent Read (CR) and Current Block Segments
- Concentrate on top 5 wait events
- Use AWR (and Statspack is still useful too)
21Cache Fusion data block messaging traffic
- Global Cache Load Profile
-
-
Per Second Per Transaction -
---------------- --------------------
- - Global Cache blocks received
4.30 3.65 - Global Cache blocks served
23.44 19.90 - GCS/GES messages received 133.03
112.96 - GCS/GES messages sent
78.61 66.75 - DBWR Fusion writes
0.11 0.10 - Est Interconnect traffic (KB)
263.20 -
-
- Calculate Network Traffic from AWR report
- Network traffic received Global Cache blocks
received DB block size 4.3 8192 .01
Mb/sec - Network traffic generated Global Cache blocks
served DB block size 23.44 8192 .20
Mb/sec
22- Global Cache Efficiency Percentages
- Data blocks retrieved from local cache or remote
instance - Global Cache Efficiency Percentages (Target
localremote 100)? -
- Buffer access - local cache 99.12
- Buffer access - remote cache 0.75
- Buffer access - disk 0.13
- Messaging Statistics
- Statistics on messages sent
- Should be less than 1 millisecond
- Global Cache and Enqueue Services - Messaging
Statistics -
- Avg message sent queue time (ms)
0.4 - Avg message sent queue time on ksxp
(ms) 0.2 - Avg message received queue time (ms)
0.0
23- Segments by CR Blocks Received
-
- -gt Total CR Blocks Received 329
- -gt Captured Segments account for 84.2 of
Total -
CR - Tablespace
Subobject Obj. Blocks - Owner Name Object Name
Name Type Received Total - ---------- -------- -------------------
---------- --------- ------------ ---------- - PAYMENTECH DATA BATCH
TABLE 90 27.36 - SYS SYSTEM SMON_SCN_TIME
TABLE 25 7.60 - PAYMENTECH DATA IDX_BATCH_ORDER_ID
INDEX 21 6.38 - SYS SYSAUX SYS_IOT_TOP_8782
INDEX 16 4.86 - SYS SYSAUX WRI_ADV_PARAMETERS_
INDEX 16 4.86 - Segments by Current Blocks Received
- -gt Total Current Blocks Received 2,667
- -gt Captured Segments account for
96.7 of Total
- Current
24RAC Wait Events
- Broader category called Cluster Wait Class
- Characterized as Current or CR
- Current - blocks read into memory for the first
time - Consistent Read (CR) - denotes block for read
access
25GC Current Block 2-way
- Occurs during cache fusion process
- Instance A requests block from master instance B
- If the block is available on B then it is sent to
A
26GC Current Block 3-way
- Maximum three hops, not dependent on number of
nodes in cluster - Instance A requests block from master instance B
- B does not have block, directs to instance
holding block or - B directs request to disk
27More Global Cache Waits
- GC CR/current block congested
- LMS not keeping up under heavy load
- Block transfer process delayed, indicates low CPU
resources - GC CR/current block busy
- Delay before block is sent, indicates write
contention - GC current grant busy
- Permission to access block is granted, but is
blocked - GC CR/current block request
- Placeholder event, active while waiting for a
block
28Block Access Cost
- Cost of retrieving the block, made up of the
following - Message propagation delay
- Inter process CPU
- Block Server Load
29Block Access Latency
- Factors affecting request processing time
- Operating System
- Oracle processing time
- Available Interconnect network throughput
- CPU load on other nodes
30Operating System
- Block latency related to CPU utilization
- LMS process is CPU intensive
- Typically one LMS for every 2 CPUs
- Waits - GC CR/current block congested
- Apply OS and kernel patches
31I/O Capacity
- High I/O can be a result of
- Node addition, increased usage, database size
- Bad queries always keep a watch for
- Dissimilar disks within disk group - group
similar disks together - Wait event gc cr block busy is an indicator
- (Global Cache Consistent Read)
32Best Practices
33General
- Ensure adequate resources on surviving nodes
- Benchmark cluster configuration
- Load test on single instance first
- Avoid serialization in application design. Make
use of Parallelization - Apply few changes at a time
- Sequences require a little extra thought in RAC
34Network
- Use Jumbo Frames for Interconnect, increased MTU
- JF lowers CPU utilization, reduces bonding
overhead - Fewer frames needed for large I/Os
- All components in network must support JF
- Utilize NIC bonding/pairing for redundancy
performance - Monitor dropped packets, timeouts, buffer
overflows, transmit and receive errors - Verify that interconnect is not going over public
network
35Hardware
- Redundancy at all levels - server, storage,
network components - As nodes are added, add HBA cards, switches, disk
array controllers - Load balance LUNs across HBA ports
- Enable hyperthreading at the OS level
- Use asynchronous I/O
- Set aio-max-size to 1,048,576, aio-max-ns to
56k - If using NFS, OCFS2, QFS, VxFS etc., use direct
I/O as well - Check setting for filesystem_io_option. Should be
setall -
36Monitoring Tuning
- Use OEM Database Control or Grid Control
- Recommend getting the Tuning and Diagnostics
packs - View overall system status, status of cluster,
alert logs - Monitor throughput across Interconnect
- Make decisions to add or redistribute resources
- Tune SQL plans and schemas for better
optimization
37New Features Other Notes
38New Features in 10gR2 11g
- FAN Fast Application Notification, relies on
ONS messages to be aware of current cluster
configuration, connects only to instances able to
respond - ASM options sysadm role, new ASMCMD commands
- AWM Automatic Workload Manager, manages
distribution for optimal performance, services
restored onto surviving nodes - Extended distance (stretch) clusters, physically
separate - CRS - Now provides HA for non-Oracle applications
- LBA Load Balancing Advisor determines load.
Works with FAN/ONS
39Other Notes
- OCFS Oracle Clustered File System available on
Linux (OCFS2) and Windows (OCFS (v1)) - Commonly used for spfile, archive logs, backups,
a controlfile, even voting disk OCRs - When it has problems, it can fence the node.
Known bugs prior to 10.1.0.4 patch. - Recommend using the private interconnect to
minimize fencing - When using bonding, recommend doubling
idle_timeout from 30 to 60 seconds (probably a
good idea even without bonding) - In a future release, ASM will be able to provide
a clustered filesystem (on any platform), so
recommend using OCFS2 only for non-critical
things such as backups, archive logs, etc.
40Other Notes (cont.)
- Services Can define a workload type and
partition that workload. For example, allow HR
to use only nodes A and B. Future versions will
use Policies. - Storage LOTS of options, but RAID-01 still
king. Storage is cheap per GB, so the argument
for RAID-3/5/6 is not strong. Fault Tolerant
(parity) is not the same as Redundant (mirroring) - TAF FCF are primarily client-side options for
failover. You need to research the pros cons of
using them for your environment. - For the moment, hardware mirroring still beats
ASM mirroring. That may change. However, if you
can afford multiple arrays, ASM mirroring allows
100 redundancy throughout the hardware stack. - But third voting disk needs to go somewhere NFS
is supported
41Other Notes (cont.)
- Licensing Oracle raised license fees in June
08 - RAC is an add-on in Enterprise Edition
- RAC is no extra charge in Standard Edition, but
limited to 4 sockets. Most common config is 2
nodes with 2 dual-cores in each node - Cores do not count in SE except for current Intel
quad cores, because they are 2x2s bridged
together. AMDs are not affected by this, and the
Intel quads coming out later this year will not
be either. Multi-core sparcs have some special
rules too. - RAC is not available in Standard Edition One,
Personal Oracle, or Oracle Express - RAC can be setup in a virtualized environment
(including now Oracles own Virtual Machines)
without additional cost.
42Conclusion
43Items Learned in this Session
- RAC databases are complex in nature
- Scalability, availability start with initial
configuration - Proper configuration is essential
- Monitoring and tuning requires RAC skills and
knowledge - DBA needs specialized training, experience
44Where to Find More Information
- Additional sessions here at OpenWorld 2008
- Plenty of information available on the internet
- Oracle Technology Network
- http//www.oracle.com/technology/index.html
- Ask Diane or me
- tom.canty_at_servercare.com
- dpetersen_at_servercare.com
- 1-888-918-6309
45Questions?
- Covered many RAC topics today
- Additional questions, please contact me
- tom.canty_at_servercare.com
- 1-888-918-6309
46Oracle Real Application Clusters Best Practices
Tuning in 11g
- Thank You!
- Please fill out surveys!
tom.canty_at_servercare.com 1-888-918-6309 www.Serve
rCare.com