Title: Oracle 9i RAC By Ramesh Malayappan
1Oracle 9i RAC ByRamesh Malayappan Gautam
Mekala
2Overview
- Overview of OS (Linux)
- Overview of Oracle9i Real Application Clusters
- Oracle9i RAC on Linux
- Tuning Tips
- Issues to deal in RAC
- Going Forward (Oracle 10G)
- Q A
3Benefits of Real Application Clusters
- New Shared Cache Architecture
- Exploits New Hardware and Software Technologies
- Most Flexible Clustering technology
- Provides scalability and high availability
4Real Application Clusters
- Real Application Clusters (RAC)
- Cache Fusion
- True scalability
- Transparent Scalability
- All Applications Scale No tuning required
- No Physical Data Partitioning required
(Application user partitioning is needed though) - ISV Applications Scale out of the box
- High Availability Loss of single node on
cluster will not stop the database - Ability to add additional hardware transparently
to users
5Overview of Oracle9i RAC
- Many instances of Oracle running on many nodes
- All instances share a single physical database
and have common data control files - Each instance has its own log files and rollback
segments - All instances can simultaneously execute
transactions against the single database - Caches are synchronized using Oracles Global
Cache Management technology (Cache Fusion) - No Single Point of Failure (Server side)
6Oracle9i RAC on Linux
- Clustering consists of 2 Oracle-supplied
components - Cluster Manager (oracm)
- Provides consistent view of Oracle instances
- Accepts registration of Oracle instances
- Responsible for process level cluster status
- Hangcheck-timer
- New in 9.2.0.2, replaces watchdogd
- Monitors the Linux kernel for system hangs
- Implemented as a kernel module so it much less
affected by system load - Resets node from within kernel if abnormal hangs
occur
7Internal Workings Of RAC..
- Multi-Instance with Single Database
- Cache Fusion (aggregation of cache from each
node) - Inter-Instance Transfers
- GES and GCS
- Resources Co-ordination
- Ownership (conversions)
- Status and Roles
- RAC Processes
- Fail-over Recovery
Cache Fusion
database
8Contents of SGA
- The shared pool portion of the SGA
- Library Cache
- Dictionary cache
- Buffers for parallel exec mesg and control
structures. - Library Cache
- Shared SQL areas, private SQL areas (MTS), PL/SQL
procedures and packages, and control - structures such as locks and library cache
handles. - Data Dictionary
- Collection of database tables and views
containing reference information about the
database, its structures, and its users.
Shared Pool
Buffer Cache
Redo Buffer
Large Pool
Shared Across Instances
Synchronized Across Instances
PGA
Remains Local to each Instance
9Cache Fusion
- Cache Fusion is a fundamental component of Real
Application Cluster - Cache Fusion allows individual nodes to share the
contents of their buffer caches through the
inter-connect cluster Interprocess Communication
(IPC) eliminating the need for extra disk I/Os. - This greatly improves the performance and
scalability characteristics of shared-disk
clusters - Cache fusion only works with the default resource
control scheme. If GC_FILES_TO_LOCKS is set, the
old pre-cache fusion behavior is utilized. In
other words, forced disk-writes will be used.
10Dirty Blocks Past Image
- In a non-RAC instance
- User A selects say 10 rows (10 blocks)
- User B selects same 10 rows (10 blocks)
- User B updates those 10 rows (10 blocks)
- Dirtied blocks
- Not committed
- User C selects same 10 rows
- Rollback segment buffer provide read consistent
image - Now User B performs Commit
- Now User D updates same blocks
- gets the same dirtied blocks
- If it is a RAC
- When user D updates on second Instance, PAST
Image is created for those blocks sent out
11Cache Coherency Lock Management
- Transfer of blocks among the individual node
caches - Global Concurrency of the data blocks / pages
- Global Control mechanism
- Cluster Interconnects
- Connect nodes
- Can be a simple private network connection
- Can be a specialized cables with Hub/Switch
- Functions
- Monitors Health, Status of nodes, Accessing
remote file systems - Cluster alias routing
- Application-specific traffic
- Distributed lock manager (DLM) messages / GCS
messages
12Cluster Interconnects
- Essential Requirements
- Low latency for short messages
- High speed and sustained data rates for large
messages - Low Host-CPU utilization per message.
- Flow Control, Error Control and Heart-beat
Continuity monitoring - Host Interfaces to interact directly with host
processes (OS bypass) - Switch Networks that scale well
13Resources and Coordination
Resources
- Synchronization
- Data Blocks and Enqueues
- Nodes acquire and release ownership of resources
- Co-ordination of concurrent tasks within shared
cache
Enqueue is a shared memory structure Serializes
access to database resources Associated with a
session or transaction.. E.g. Update to a row
Local Concurrency Controls Latches, Row Locks,
Local Enqueues
14Resource Coordination
- Resources have
- Roles Locally Managed and Globally Managed
- Modes Null , Shared, Exclusive
Most important Resource DATA BLOCK
- Global Resource Directory
- Data Block Identifiers - DBA
- Location of most current status
- Modes of Data Blocks
- Roles of the Blocks
Past Image When a dirty block is sent to other
node using CF, it keeps a copy (data integrity in
case of failures) Consistent Record
(CR) Consistent snapshot at a previous point in
time
15Resource Modes and Roles
When referring to a lock mode in RAC, there are
three characters to distinguish E.g. ABC A
Represents lock mode with values Null, Shared,
Exclusive B Represents Lock Role, Local,
Global C Shows if Past Image exists or not
(1) PI exists , (0) No PI exists
NL0 Null Local and No past Images SL0 Shared
Local with no past image XL0 Exclusive Local with
no past image NG0 Null Global - Instance owns
current block image SG0 Global Shared Lock -
Instance owns current image XG0 Global Exclusive
Lock - Instance own current image NG1 Global
Null - Instance Owns the Past mage
Block. SG1 Shared Global - Instance owns past
Image XG1 Global Exclusive Lock - Instance owns
Past Image.
16Global Enqueue Service
Controls Library Cache Library Cache Locks
during parsing of SQL, DML, DDL, PL/SQL Controls
Data Dictionary Cache (Table Locks etc) Manages
synchronization through latches Handles the
message between instances (for changes)
- Oracle Processes
- LMON Monitors the enqueues and resources
- LMD Lock agent process
- GSD Diagnosability Daemon
- LCK manages global eqnueue requests
- LMSn GCS processes - handles blocking
interrupts from the remote instance, cross
instance calls - Usual Process like SMON, PMON, LGWR, CKPT, DBWR
etc
17Failover Basics
- Detection of failure, by way of its LMON process
- One of the Instances (Recovering Instance)
controls the recovery of the failed instance by
taking over its redo log files. - All in-progress transactions are rolled back
(transaction recovery) - Instance recovery does not include restarting the
failed instance - Only the resources mastered by GSC are re-built
- SMON process of a surviving Instance performs
recovery of failed instance
18Fusion Recovery
- Recovery
- The instance, or instances dies
- Failure detected by cluster manager or GCS.
- Reconfiguration occurs and all locks owned by the
departed instance are remastered and the first
pass read of threads of failed instances done by
SMON - SMON claims locks needed to recover blocks found
by the first pass read. - Locks are obtained and second pass of redo theads
of failed instances is performed and blocks
become available as they have been recovered. - Predecessor blocks can be in past image block in
a different instance or on disk.
19Client Connectivity Server Fail
- Add failover options manually to TNS
configuration files - They are part of the CONNECT_DATA section of a
connect descriptor - Failover options include
- TYPE Identify the nature of TAF, if any
- METHOD Configure how quickly failover can occur
- BACKUP Identify an alternate net service name
- RETRIES Limit the number of times a reconnection
will be attempted - DELAY Specify how long to wait between
reconnection attempts
20Oracle9i RAC on Linux (cont.)
Verification of Hardware and Software
Configure Kernel Parameters
Configure Start Cluster Manager
Install Oracle9i RAC Option
Enable rsh rcp on each node
Create DBA group and Oracle Account
Start GSD Configure Listener
Remove IBM Java Package
Configure Network
Create database
Install Cluster Manager
Configure Storage
21Linux kernel parameters
- Set /proc/sys/kernel/shmmax to 3GB
- Using multiple DBWRs with async I/O is usually
better than using I/O slaves - Must re-link to use libaio i.e. ASYNC I/O
- make -f ins_rdbms.mk async_on
- init.ora disk_asynch_iotrue by default
- init.ora filesystemio_optionsasynch set this as
well if datafiles are on a filesystem (e.g. ext2) - 2 DBWRs is a good default for a large buffer
cache areas - If large read sizes occur, increase
/proc/sys/fs/aio-max-size to the largest read
size (default is 128KB)
22Larger Buffer Cache
- Oracle has the capability to use an extended
buffer cache greater than 4GB - Using Indirect Data Buffers has some overhead, so
use this option only if you have enough RAM to
create a buffer cache greater than 4GB - Steps to enable Indirect Data Buffers (from
Oracle9i Administrators Reference, Rel 2 for
Linux) - mount -t shm -o size8g shmfs /dev/shm (can put
this in /etc/fstab) - init.ora use_indirect_data_bufferstrue
- init.ora use only db_block_buffers and
db_block_size (no db_cache_size) - For OLTP Apps, small blocks (e.g. 2KB) typically
work better
23Increasing Address Space
- Oracle defaults to use about 1.7GB of address
space for its SGA - Its possible to increase the SGA address space
to about 2.6GB (Note 200266.1) - genksms -s 0x15000000 gtksms.s
- make -f ins_rdbms.mk ksms.o
- make -f ins_rdbms.mk ioracle
- echo 268435456 gt/proc/ltpidgt/mapped_base (as
root), where ltpidgt is the pid of the session
running SQLPlus
24Increasing Address Space (cont.)
Default
After Relink
0xFFFFFFFF
0xFFFFFFFF
Reserved for kernel
Reserved for kernel
0xC0000000
0xC0000000
Variable SGA
Variable SGA
DB Buffers (SGA)
DB Buffers (SGA)
sga_base (relink Oracle)
0x50000000
mapped_base (/proc/ltpidgt/mapped_base)
0x40000000
Code, etc.
0x15000000
0x10000000
Code, etc.
0x00000000
0x00000000
25Bigpages
- It is a feature in Red Hat Advance Server that
provides applications access to large memory
pages on Intel 32-bit CPUs - The default memory page size is 4KB.
- Requires OS support to enable
- Large pages used for the SGA reduces the number
of page table entries that Linux and the CPU need
to keep track of - Reduces the CPUs Translation Look-aside Buffer
(TLB) miss rate - Bigpage settings
- /proc/sys/kernel/shm-use-bigpages0 bigpage
pool is not used - /proc/sys/kernel/shm-use-bigpages1 bigpage
memory is useable by Oracle except in the
Indirect Data Buffers case - /proc/sys/kernel/shm-use-bigpages2 same as 1,
but memory is also useable in the Indirect Data
Buffers case
26Real Cluster Design Issues
Result Component Effect of Failure
Ok CPU panic / crash Node Failed, other node still active
Ok Memory crash Node Failed, other node still active
Ok Interconnect With dual Interconnects, OK
Down Interconnect Switch Nodes can not communicate
Ok OS failure / freeze Node Failed, other node still active
Down Cluster Manager s/w Custer freezes, all nodes go down
Ok DB Instance Crash Instance running on other node provides database service
Ok Control File (Corrupt / Lost) Multiplexed control file will be used
Ok Redo log file Multiplexed redo file
Down Lost Data File Requires Media recovery
Down Human Error Depends on type of mistake
Down Dropped Object DB is available but applications stall
Down DB software bug DB may stall on all instances.
27Performance Monitoring
- There are many views that help to monitor the
inter-instance transfers and RAC performance - vclass_cache_transfer, vcache_transfer,
vcache, vlock_activity, vges_statistics, vbh
, vsysstat and VSYSTEM_EVENT - The above views help diagnose the following
issues - The most significant statistics are in vsysstat
- Cache-related statistics such as consistent gets,
db block gets, and db block changes - Cache Fusion related statistics, such as global
cache current block receive time or global cache
current block send time, global cache lock open - Convert requests, and global cache wait times,
such as global cache gets, global cache converts,
and waits for events such as Null-to-X
conversions - I/O statistics such as physical reads, physical
writes, DBWR cross-instance writes, and wait
times for reads and writes
28Oracle Parallel Execution ..
- RAC can engage multiple processors from different
nodes for a given task execution - Achieve additional parallelism, not possible by a
single SMP node. - For instance, in a two node ORAC, set up a
parallel query with Parallel Hint to utilize
the CPUs from the both the instances. - SELECT / FULL(nydata) PARALLEL(nydata, 3,2) /
count() FROM nysales - In this example, Degree of Parallelism (DOP) is 3
and use Two instances. It is executed with total
6 processes, 3 on each instance
29Issues Faced
- There were multiple issues since we started to
work on RAC. Many of them have been resolved
through upgrades and minor patches. - Major issues
- NTP Issues- Problem appears to have been due to
NTP (Network Time Protocol) settings on the
server that allowed the time to be automatically
set backwards by the NTP server. This time change
caused it to look like a checkin had been missed.
Changing NTP settings so that setting the time
backwards is disallowed appears to have resolved
the problem - Split-brain condition - This should never happen
but we ran into this issues also . The Cluster
software should take care of this issue. - Fork-Process Hanging - we started seeing unable
to receive acknowledgement from forked process
in alert log
3010G RAC Features
- Dynamic affinity policy enhancements for
optimizing the Cache Fusion protocol to enhance
the performance of several kinds of workloads - Better Workload management
- Improvements to adding a node. Oracle introduces
portable cluster-ware that makes adding a node
easier. - Cluster application availability subsystem
31Q A