Title: Real%20Application%20Cluster%20(RAC)%20%20Kishore%20A
1Real Application Cluster (RAC)Kishore A
2Oracle10g - RAC
- What is all the hype about grid computing?
- Grid computing is intended to allow
businesses to move away from the idea of many
individual servers, each of which is dedicated to
a small number of applications. When configured
in this manner, applications often either do not
fully utilize the servers available hardware
resource such as memory, CPU and disk or short of
these resources during peak usage. - Grid computing addresses these problems by
providing an adaptive software infrastructure
that makes efficient use of low-cost servers and
modular storage, which balances work- loads more
effectively and provides capacity on demand - By scaling out with small servers in small
increments, you get performance and reliability
at low-cost. New unified management allows you to
manage everything cheaply and simply in the grid.
3- WHAT IS ENTERPRISE GRID COMPUTING?
- Implement One from Many.
- Grid computing coordinates the use of clusters
of machines to create a single logical entity,
such as a database or an - application server.
- By distributing work across many servers, grid
computing exhibits benefits of availability,
scalability, and performance using low-cost
components. - Because a single logical entity is implemented
across many machines, companies can add or remove
capacity in small increments, online. - With the capability to add capacity on demand
to a particular function, companies get more
flexibility for adapting to peak loads, thus
achieving better hardware utilization and better
business responsiveness. -
4- Benefits of Enterprise Grid Computing
- The primary benefit of grid computing to
businesses is achieving high quality of service
and flexibility at lower cost. - Enterprise grid computing lowers costs by
- Increasing hardware utilization and resource
sharing - Enabling companies to scale out incrementally
with low- - cost components
- Reducing management and administration
requirements
5- New Trends in Hardware
- Much of what makes grid computing possible
today are the innovations in hardware. For
example, - Processors. New low-cost, high volume Intel
Itanium 2, Sun SPARC, and IBM PowerPC 64-bit
processors now deliver performance equal to or
better than exotic processors used in high-end
SMP servers. - Blade servers. Blade server technology reduces
the cost of hardware - and increases the density of servers, which
further reduces expensive - data center real estate requirements.
- Networked storage. Disk storage costs continue
to plummet even - faster than processor costs. Network
storage technologies such as - Network Attached Storage (NAS) and Storage
Area Networks (SANs) further reduce these costs
by enabling sharing of storage across systems. - Network interconnects. Gigabit Ethernet and
Infiniband interconnect technologies are driving
down the cost of connecting servers into clusters.
6- Oracle Database 10g
- Oracle Database 10g builds on the success of
Oracle9i Database, and adds many new
grid-specific capabilities. - Oracle Database 10g is based on Real
Application Clusters, introduced in Oracle9i. - There are more than 500 production customers
running Oracles clustering technology, helping
to prove the validity of Oracles grid
infrastructure.
7- Real Application Clusters
- Oracle Real Application Clusters enables a
single database to run across multiple clustered
nodes in a grid, pooling the processing resources
of several standard machines. - In Oracle 10g, the database can immediately
begin balancing workload across a new node with
new processing capacity as it gets re-provisioned
from one database to another, and can relinquish
a machine when it is no longer needed-- - this is capacity on demand. Other databases
cannot grow and shrink while running and,
therefore, cannot utilize hardware as
efficiently. - Servers can be easily added and dropped to an
Oracle cluster with no downtime.
8RAC 10g Architecture
//
public network
VIP1
VIP2
VIPn
Service
Service
Service
Node 2
Node n
Node1
Listener
Listener
Listener
instance 2
instance n
instance 1
ASM
ASM
ASM
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Operating System
Operating System
Operating System
shared storage
Redo / Archive logs all instances
Managed by ASM
Database / Control files
RAW Devices
OCR and Voting Disks
9Under the Covers
Cluster Private High Speed Network
LMON
LMD0
DIAG
LMON
LMD0
DIAG
LMON
LMD0
DIAG
Global Resource Directory
Global Resource Directory
Global Resource Directory
Instance 2
Instance n
Instance 1
Dictionary Cache
Dictionary Cache
Dictionary Cache
SGA
SGA
SGA
Library Cache
Library Cache
Library Cache
LCK0
LGWR
DBW0
LCK0
LGWR
DBW0
LCK0
LGWR
DBW0
SMON
PMON
LMS0
SMON
PMON
LMS0
SMON
PMON
LMS0
Node 1
Node n
Node 2
Data Files and Control Files
10Global Resource Directory
- RAC Database System has two important services.
They are Global Cache Service (GCS) and Global
Enqueue Service (GES). These are basically
collections of background processes. These two
processes together cover and manage the total
Cache Fusion process, resource transfers, and
resource escalations among the instances. - Global Resource Directory
- GES and GCS together maintain a Global Resource
Directory (GRD) to record the information about
the resources and the enqueues. GRD remains in
the memory and is stored on all the instances.
Each instance manages a portion of the directory.
This distributed nature is a key point for fault
tolerance of the RAC. - Global Resource Directory (GRD) is the internal
database that records and stores the current
status of the data blocks. Whenever a block is
transferred out of a local cache to another
instances cache the GRD is updated. The
following resources information is available in
GRD. - Data Block Identifiers (DBA)
- Location of most current version
- Modes of the data blocks (N)Null,
(S)Shared, (X)Exclusive - The Roles of the data blocks (local or
global) held by each instance - Buffer caches on multiple nodes in the
cluster - GRD is akin to the previous version of Lock
Directory in the functionality perspective but
has been expanded with more components. It has
accurate measure of inventory of resources and
their status and location.
11Background Processes in a RAC instance
- Select name,description from vbgprocess where
paddr ltgt 00 - The one specific to a RAC instance are the DIAG,
LCK, LMON, LMNDn and LMSn process.
12DIAG Diagnosability Daemon
- The diagnosability daemon is responsible for
capturing information on process failures in a
RAC environment, and writing out trace
information for failure analysis. - The information produced by DIAG is most useful
when working in conjunction with Oracle Support
to troubleshoot causes for a failure. - Only a single DIAG process is needed for each
instance
13LCK Lock Process
- The lock process (LCK) manages requests that are
not cache-fusion requests, such as row cache
requests and library cache requests - Only a single LCK process is allowed for each
instance. - LCK maintains a list of lock elements and uses
this list to validate locks during instance
recovery
14LMDLock Manager Daemon Process
- The global enqueue service daemon (LMD) is a lock
agent process that coordinates enqueue manager
service requests. The requests are for global
cache service enqueues that control access to
global enqueues and resources. - The LMD process also handles deadlock detection
and remote enqueue requests.
15LMON Lock Monitor Process
- LMON is the global enqueue service monitor. It is
responsible for the reconfiguration of lock
resources when an instance joins the cluster or
leaves the cluster, and also is responsible for
the dynamic lock remastering - LMON will generate a trace file whenever a
reconfiguration occurs (as opposed to remastering
of a subset of locks). - It is the responsibility of LMON to check for the
death of instances clusterwide, and to initiate
reconfiguration as quickly as possible
16LMS Lock Manager Server Process
- The LMS process (or global cache service process)
is in charge of shipping the blocks between
instances for cache-fusion requests. In the event
of a consistent-read request, the LMS process
will first roll the block back, creating the
consistent read (CR) image of the block, and will
then ship that version of the block across the
interconnect to the foreground process making the
request at the remote instance. - In addition, LMS must interact with the LMD
process to retrieve lock requests placed by LMD.
An instance may dynamically generate up to LMS
processes, depending on the load
17Server Control Utility
- To manage the RAC database and its instances,
Oracle has provided a new utility called the
Server Control Utility (SRVCTL). This replaces
the earlier utility opsctl which was used in
the parallel server. - The Server Control Utility is a single point of
control between the Oracle Intelligent agent and
each node in the RAC system. The SRVCTL
communicates with the global daemon service (GSD)
and resides on each of the nodes. The SRVCTL
gathers information from the database and
instances and acts as an intermediary between
nodes and the Oracle Intelligent agent. - When you use the SRVCTL to perform configuration
operations on your cluster, the SRVCTL stores
configuration data in the Server Management
(SRVM) configuration repository. The SRVM
includes all the components of Enterprise Manager
such as the Intelligent Agent, the Server Control
Utility (SRVCTL), and the Global Services Daemon.
Thus, the SRVCTL is one of the SRVM Instance
Management Utilities. The SRVCTL uses SQLPlus
internally to perform stop and start activities
on each node.
18Server Control Utility
- For the SRVCTL to function, the Global Services
Daemon (GSD) should be running on the node. The
SRVCTL performs mainly two types of
administrative tasks Cluster Database Tasks and
Cluster Database Configuration Tasks. -
- SRVCTL Cluster Database tasks include
- Starts and stops cluster
databases. - Starts and stops cluster
database instances. - Starts and stops listeners
associated with a cluster database instance. - Obtains the status of a cluster
database instance. - Obtains the status of listeners
associated with a cluster database. -
- SRVCTL Cluster Database Configuration tasks
include - Adds and deletes cluster
database configuration information. - Adds an instance to, or deletes
an instance from a cluster database. - Renames an instance name within
a cluster database configuration. - Moves instances in a cluster
database configuration. - Sets and unsets the environment
variable for an instance in a cluster database - configuration.
- Sets and unsets the environment
variable for an entire cluster in a cluster
database - configuration.
19RAW Partitions, Cluster File System and Automatic
Storage Management (ASM)
- Raw Partitions are a set of unformatted devices
on a shared disk sub-system.A raw partition is a
disk drive device that does not have a file
system set up. The raw partition is portion of
the physical disk that is accessed at the lowest
possible level. The actual application that uses
a raw device is responsible for managing its own
I/O to the raw device with no operating system
buffering. - Traditionally, they were required for Oracle
Parallel Server (OPS) and they provided high
performance by bypassing the file system
overhead. Raw partitions were used in setting up
databases for performance gains and for the
purpose of concurrent access by multiple nodes in
the cluster without system-level buffering.
20RAW Partitions, Cluster File System and Automatic
Storage Management (ASM)
- Oracle 9i RAC and 10g now supports both the
cluster file system and the raw devices to store
the shared data. In addition, 10g RAC supports
shared storage resources from ASM instance. You
will be able to create the data files out of the
disk resources located in the ASM instance. The
ASM resources are sharable and accessed by all
the nodes in the RAC system.
21RAW Devices
- Raw Devices have been in use for very long time.
They were the primary storage structures for data
files of the Oracle Parallel Server. They remain
in use even in the RAC versions 9i and 10g. Raw
Devices are difficult to manage and administer,
but provide high performing shared storage
structures. When you use the raw devices for data
files, redo log files and control files, you may
have to use the local file systems or some sort
of network attached file system for writing the
archive log files, handling the utl_file_dir
files and files supporting the external tables. - On Raw Devices On Local File
System - Data files
Archive log files - Redo files
Oracle Home files - Control files CRS
Home files - Voting Disk Alert
log, Trace files - OCR file Files
for external tables -
utl_file_dir location
22RAW Devices
- Advantages
- Raw partitions have several advantages
- They are not subject to any operating system
locking. - The operating system buffer or cache is bypassed,
giving performance gains and reduced memory
consumption. - Multiple systems can be easily shared.
- The application or database system has full
control to manipulate the internals of access. - Historically, the support for asynchronous I/O on
UNIX systems was generally limited to raw
partitions
23RAW Devices
- Issues and Difficulties
- There are many administrative inconveniences and
drawbacks such as - The unit of allocation to the database is the
entire raw partition. We cannot use a raw
partition for multiple tablespaces. A raw
partition is not the same as a file system where
we can create many files. - Administrators have to create them with specific
sizes. When the databases grow in size, raw
partitions cannot be extended. We need to add
extra partitions to support the growing
tablespace. Sometimes we may have limitations on
the total number of raw partitions we can use in
the system. Furthermore, there are no database
operations that can occur on an individual data
file. There is, therefore, no logical benefit
from having a tablespace consisting of many data
files except for those tablespaces that are
larger than the maximum Oracle can support in a
single file. - We cannot use the standard file manipulation
commands on the raw partitions, and thus, on the
data files. We cannot use commands such as cpio
or tar for backup purposes. Backup strategy will
become more complicated
24RAW Devices
- Raw partitions cannot be used for writing the
archive logs. - Administrators need to keep track of the raw
volumes with their cryptic naming conventions.
However, by using the symbolic links, we can
reduce the hassles associated with names. - For example, a cryptic name like
/dev/rdsk/c8t4d5s4 or a name like /dev/sd/sd001
is an administrative challenge. To alleviate
this, administrators often rely on symbolic links
to provide logical names that make sense. This,
however, substitutes one complexity for another. - In a clustered environment like Linux clusters,
it is not guaranteed that the physical devices
will have the same device names on different
nodes or across reboots of a single node. To
solve this problem, manual intervention is
needed, which will increase administration
overhead.
25Cluster File System
- CFS offers a very good shared storage facility
for building the RAC database. CFS provides a
shared file system, which is mounted on all the
cluster nodes simultaneously. When you implement
the RAC database with the commercial CFS products
such as the Veritas CFS or PolyServe Matrix
Server, you will able to store all kinds of
database files including the shared Oracle Home
and CRS Home. - However, the capabilities of the CFS products are
not the same. For example, Oracle CFS (OCFS),
used in case of Linux RAC implementations, has
limitations. It is not a general purpose file
system. It cannot be used for shared Oracle Home.
26Cluster File System
- On Cluster File System
- Data files Archive Log files
- Redo files Oracle Home Files
- Control files Alert log,Trace files
- Voting Disk Files for External Tables
- OCR File utl_file_dir location
- A cluster file system (CFS) is a file system that
may be accessed (read and write) by all the
members in the cluster at the same time. This
implies that all the members of the cluster have
the same view. Some of the popular and widely
used cluster file system products for Oracle RAC
include HP Tru64 CFS, Veritas CFS, IBM GPFS,
Polyserve Matrix Server, and Oracle Cluster File
system. The cluster file system offers - Simple management.
- The use of Oracle Managed Files with RAC.
- A Single Oracle Software Installation.
- Auto-extend Enabled on Oracle Data Files.
- Uniform accessibility of Archive Logs.
- ODM compliant File systems.
27ASM Automatic Storage Management
- ASM is the new star on the block. ASM provides a
vertical integration of the file system and
volume manager for Oracle database files. ASM has
the capability to spread database files across
all available storage for optimal performance and
resource utilization. It enables simple and
non-intrusive resource allocation and provides
automatic rebalancing - When you are using the ASM for building shared
files, you would get almost the same performance
as that of raw partitions. The ASM controlled
disk devices will be part of ASM instance, which
can be shared by the RAC database instance. It is
similar to the situation where raw devices
supporting the RAC database had to be shared by
multiple nodes. The shared devices need to be
presented to multiple nodes on the cluster and
those devices will be input to the ASM instance.
There will be an ASM instance supporting each RAC
instance on the respective node
28ASM Automatic Storage Management
- From the ASM instance On local or CFS
- Data files Oracle
Home Files - Redo files CRS
Home Files - Control files Alert
log, trace files - Archive log files Files for
external tables - ---------------------------
util_file_dir location - Voting Disk and OCR file
- Are located on raw partitions
- ASM is for more Oracle specific data, redo log
files and archived log files.
29- Automatic Storage Management
- Automatic Storage Management simplifies
storage management for Oracle Databases. - Instead of managing many database files,
Oracle DBAs manage only a small number of disk
groups. A disk group is a set of disk devices
that Oracle manages as a single, logical unit. An
administrator can define a particular disk group
as the default disk group for a database, and
Oracle automatically allocates storage for and
creates or deletes the files associated with the
database object. - Automatic Storage Management also offers the
benefits of storage technologies such as RAID or
Logical Volume Managers (LVMs). Oracle can
balance I/O from multiple databases across all of
the devices in a disk group, and it implements
striping and mirroring to improve I/O performance
and data reliability. Because Automatic Storage
Management is written to work exclusively with
Oracle, it achieves better performance than
generalized storage virtualization solutions.
30- Shared Disk Storage
- Oracle RAC relies on a shared disk architecture.
The database files, online redo logs, and control
files for the database must be accessible to each
node in the cluster. The shared disks also store
the Oracle Cluster Registry and Voting Disk.
There are a variety of ways to configure shared
storage including direct attached disks
(typically SCSI over copper or fiber), Storage
Area Networks (SAN), and Network Attached Storage
(NAS). - Private NetworkEach cluster node is connected to
all other nodes via a private high-speed network,
also known as the cluster interconnect or
high-speed interconnect (HSI). This network is
used by Oracle's Cache Fusion technology to
effectively combine the physical memory (RAM) in
each host into a single cache. Oracle Cache
Fusion allows data stored in the cache of one
Oracle instance to be accessed by any other
instance by transferring it across the private
network. It also preserves data integrity and
cache coherency by transmitting locking and other
synchronization information across cluster nodes.
31- The private network is typically built with
Gigabit Ethernet, but for high-volume
environments, many vendors offer proprietary
low-latency, high-bandwidth solutions
specifically designed for Oracle RAC. Linux also
offers a means of bonding multiple physical NICs
into a single virtual NIC to provide increased
bandwidth and availability. - Public NetworkTo maintain high availability,
each cluster node is assigned a virtual IP
address (VIP). In the event of host failure, the
failed node's IP address can be reassigned to a
surviving node to allow applications to continue
accessing the database through the same IP
address. - Why do we have a Virtual IP (VIP) in 10g? Why
does it just return a dead connection when its
primary node fails?
32- It's all about availability of the
application. When a node fails, the VIP
associated with it is supposed to be
automatically failed over to some other node.
When this occurs, two things happen. - The new node re-arps the world indicating a
new MAC address for the address. For directly
connected clients, this usually causes them to
see errors on their connections to the old
address. - Subsequent packets sent to the VIP go to the
new node, which will send error RST packets back
to the clients. This results in the clients
getting errors immediately. - This means that when the client issues SQL to
the node that is now down, or traverses the
address list while connecting, rather than
waiting on a very long TCP/IP time-out (10
minutes), the client receives a TCP reset. In the
case of SQL, this is ORA-3113. In the case of
connect, the next address in tnsnames is used. - Without using VIPs, clients connected to a
node that died will often wait a 10-minute TCP
timeout period before getting an error. As a
result, you don't really have a good HA solution
without using VIPs (Source - Metalink Note
220970.1) .
33- The Oracle CRS contains all the cluster and
database configuration metadata along with
several system management features for RAC. It
allows the DBA to register and invite an Oracle
instance (or instances) to the cluster. During
normal operation, CRS will send messages (via a
special ping operation) to all nodes configured
in the clusteroften called the "heartbeat." If
the heartbeat fails for any of the nodes, it
checks with the CRS configuration files (on the
shared disk) to distinguish between a real node
failure and a network failure. - CRS maintains two files the Oracle Cluster
Registry (OCR) and the Voting Disk. The OCR and
the Voting Disk must reside on shared disks as
either raw partitions or files in a cluster
filesystem.
34- The Voting Disk is used by the Oracle cluster
manager in various layers. The Cluster Manager
and Node Monitor accepts registration of Oracle
instances to the cluster and it sends ping
messages to Cluster Managers (Node Monitor) on
other RAC nodes. If this heartbeat fails, oracm
uses a quorum file or a quorum partition on the
shared disk to distinguish between a node failure
and a network failure. So if a node stops sending
ping messages, but continues writing to the
quorum file or partition, then the other Cluster
Managers can recognize it as a network failure.
Hence the availability from the Voting Disk is
critical for the operation of the Oracle Cluster
Manager. - The shared volumes created for the OCR and
the voting disk should be configured using RAID
to protect against media failure. This requires
the use of an external cluster volume manager,
cluster file system, or storage hardware that
provides RAID protection. . - Oracle Cluster Registry (OCR) is used to store
the cluster configuration information among other
things. OCR needs to be accessible from all nodes
in the cluster. If OCR became inaccessible the
CSS daemon would soon fail, and take down the
node. PMON never needs to write to OCR. To
confirm if OCR is accessible, try ocrcheck from
your ORACLE_HOME and ORA_CRS_HOME.
35- Cache Fusion
- One of the bigger differences between Oracle RAC
and OPS is the presence of Cache Fusion
technology. In OPS, a request for data between
nodes required the data to be written to disk
first, and then the requesting node could read
that data. In RAC, data is passed along with
locks. - Every time an instance wants to update a
block, it has to obtain a lock on it to make sure
no other instance in the cluster is updating the
same - block. To resolve this problem, Oracle does a
data block ping mechanism that allows it to get
the status of the specific block before reading
it from the disk. Cache Fusion resolves data
block read/read, read/write and write/write
conflicts among ORACLE database nodes through
high performance interconnect networks, bypassing
much slower physical disk operations used in
previous releases. Using Oracle 9i RAC cache
fusion feature, close to linear scalability of
database performance can be achieved when adding
nodes to the cluster. ORACLE enables better
Database capacity planning and conserves capital
investments.
36Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
- Contents
- Introduction
- Oracle RAC 10g Overview
- Shared-Storage Overview
- FireWire Technology
- Hardware Costs
- Install the Linux Operating System
- Network Configuration
- Obtain Install FireWire Modules
- Create "oracle" User and Directories
- Create Partitions on the Shared FireWire Storage
Device
37Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
- Configure the Linux Servers for Oracle
- Configure the hangcheck-timer Kernel Module
- Configure RAC Nodes for Remote Access
- All Startup Commands for Each RAC Node
- Check RPM Packages for Oracle 10g Release 2
- Install Configure Oracle Cluster File System
(OCFS2) - Install Configure Automatic Storage Management
(ASMLib 2.0) - Download Oracle 10g RAC Software
- Install Oracle 10g Clusterware Software
- Install Oracle 10g Database Software
- Install Oracle10g Companion CD Software
- Create TNS Listener Process
38Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
- Create the Oracle Cluster Database
- Verify TNS Networking Files
- Create / Alter Tablespaces
- Verify the RAC Cluster Database Configuration
- Starting / Stopping the Cluster
- Transparent Application Failover - (TAF)
- Conclusion
- Acknowledgements
39Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
40Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
41Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
- Download
- - Red Hat Enterprise Linux 4
- - Oracle Cluster File System Release 2 -
(1.2.3-1) - Single Processor / SMP / - Hugemem - Oracle Cluster File System
Releaase 2 Tools - (1.2.1-1) - Tools / Console
- Oracle Database 10g Release 2 EE, Clusterware,
Companion CD - (10.2.0.1.0) - Precompiled
RHEL4 FireWire Modules - (2.6.9-22.EL) - ASMLib
2.0 Driver - (2.6.9-22.EL / 2.0.3-1) - Single
Processor / SMP / Hugemem - ASMLib 2.0 Library
and Tools - (2.0.3-1) - Driver Support Files /
Userspace Library
42Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
- Introduction
- One of the most efficient ways to become
familiar with Oracle Real Application Clusters
(RAC) 10g technology is to have access to an
actual Oracle RAC 10g cluster. There's no better
way to understand its benefitsincluding fault
tolerance, security, load balancing, and
scalabilitythan to experience them directly. - The Oracle Clusterware software will be installed
to /u01/app/oracle/product/crs on each of the
nodes that make up the RAC cluster. However, the
Clusterware software requires that two of its
filesthe Oracle Cluster Registry (OCR) file and
the Voting Disk filebe shared with all nodes in
the cluster. These two files will be installed on
shared storage using OCFS2. It is possible (but
not recommended by Oracle) to use RAW devices for
these files however, it is not possible to use
ASM for these two Clusterware files. - The Oracle Database 10g Release 2 software will
be installed into a separate Oracle Home, namely
/u01/app/oracle/product/10.2.0/db_1, on each of
the nodes that make up the RAC cluster. All the
Oracle physical database files (data, online redo
logs, control files, archived redo logs), will be
installed to different partitions of the shared
drive being managed by ASM. (The Oracle database
files can just as easily be stored on OCFS2.
Using ASM, however, makes the article that much
more interesting!)
43Build Your Own Oracle RAC 10g Release 2 Cluster
on Linux and FireWire
442. Oracle RAC 10g Overview
- Oracle RAC, introduced with Oracle9i, is the
successor to Oracle Parallel Server (OPS). RAC
allows multiple instances to access the same
database (storage) simultaneously. It provides
fault tolerance, load balancing, and performance
benefits by allowing the system to scale out, and
at the same timebecause all nodes access the
same databasethe failure of one instance will
not cause the loss of access to the database. - At the heart of Oracle RAC is a shared disk
subsystem. All nodes in the cluster must be able
to access all of the data, redo log files,
control files and parameter files for all nodes
in the cluster. The data disks must be globally
available to allow all nodes to access the
database. Each node has its own redo log and
control files but the other nodes must be able to
access them in order to recover that node in the
event of a system failure. - One of the bigger differences between Oracle RAC
and OPS is the presence of Cache Fusion
technology. In OPS, a request for data between
nodes required the data to be written to disk
first, and then the requesting node could read
that data. In RAC, data is passed along with
locks.
453. Shared-Storage Overview
- Fibre Channel is one of the most popular
solutions for shared storage. As I mentioned
previously, Fibre Channel is a high-speed
serial-transfer interface used to connect systems
and storage devices in either point-to-point or
switched topologies. Protocols supported by Fibre
Channel include SCSI and IP. - Fibre Channel configurations can support as many
as 127 nodes and have a throughput of up to 2.12
gigabits per second. Fibre Channel, however, is
very expensive the switch alone can cost as much
as US1,000 and high-end drives can reach prices
of US300. Overall, a typical Fibre Channel setup
(including cards for the servers) costs roughly
US5,000. - A less expensive alternative to Fibre Channel is
SCSI. SCSI technology provides acceptable
performance for shared storage, but for
administrators and developers who are used to
GPL-based Linux prices, even SCSI can come in
over budget at around US1,000 to US2,000 for a
two-node cluster. - Another popular solution is the Sun NFS (Network
File System) found on a NAS. It can be used for
shared storage but only if you are using a
network appliance or something similar.
Specifically, you need servers that guarantee
direct I/O over NFS, TCP as the transport
protocol, and read/write block sizes of 32K.
464. FireWire Technology
- Developed by Apple Computer and Texas
Instruments, FireWire is a cross-platform
implementation of a high-speed serial data bus.
With its high bandwidth, long distances (up to
100 meters in length) and high-powered bus,
FireWire is being used in applications such as
digital video (DV), professional audio, hard
drives, high-end digital still cameras and home
entertainment devices. Today, FireWire operates
at transfer rates of up to 800 megabits per
second while next generation FireWire calls for
speeds to a theoretical bit rate to 1,600 Mbps
and then up to a staggering 3,200 Mbps. That's
3.2 gigabits per second. This speed will make
FireWire indispensable for transferring massive
data files and for even the most demanding video
applications, such as working with uncompressed
high-definition (HD) video or multiple
standard-definition (SD) video streams.
47(No Transcript)
48- Oracle Clusterware - /u01/app/oracle/product/crs
- Oracle 10g Software (Without database)
/u01/app/oracle/product/10.1.0/data_1 -
(10.2.0.1)
- Oracle Cluster Registry (OCR) File -
/u02/oradata/orcl/OCRFile (OCFS2 ) - CRS Voting Disk - /u02/oradata/orcl/CSSFile
(OCFS2 ) - Oracle Database files ASM
495. Software Requirements
- SoftwareAt the software level, each node in a
RAC cluster needs - 1. An operating system
- 2. Oracle Clusterware Software
- 3. Oracle RAC software, and optionally
- An Oracle Automated Storage Management instance.
50Oracle Automated Storage Management (ASM)
- ASM is a new feature in Oracle Database 10g that
provides the services of a filesystem, logical
volume manager, and software RAID in a
platform-independent manner. Oracle ASM can
stripe and mirror your disks, allow disks to be
added or removed while the database is under
load, and automatically balance I/O to remove
"hot spots." It also supports direct and
asynchronous I/O and implements the Oracle Data
Manager API (simplified I/O system call
interface) introduced in Oracle9i. - Oracle ASM is not a general-purpose filesystem
and can be used only for Oracle data files, redo
logs, control files, and the RMAN Flash Recovery
Area. Files in ASM can be created and named
automatically by the database (by use of the
Oracle Managed Files feature) or manually by the
DBA. Because the files stored in ASM are not
accessible to the operating system, the only way
to perform backup and recovery operations on
databases that use ASM files is through Recovery
Manager (RMAN). - ASM is implemented as a separate Oracle instance
that must be up if other databases are to be able
to access it. Memory requirements for ASM are
light only 64MB for most systems. In Oracle RAC
environments, an ASM instance must be running on
each cluster node.
516. Install the Linux Operating System
- This article was designed to work with the Red
Hat Enterprise Linux 4 (AS/ES) operating
environment. . You will need three IP addresses
for each server one for the private network, one
for the public network, and one for the virtual
IP address. Use the operating system's network
configuration tools to assign the private and
public network addresses. Do not assign the
virtual IP address using the operating system's
network configuration tools this will be done by
the Oracle Virtual IP Configuration Assistant
(VIPCA) during Oracle RAC software installation. - Linux1
- eth0- Check off the option to Configure using
DHCP- Leave the Activate on boot checked- IP
Address 192.168.1.100- Netmask 255.255.255.0 - eth1- Check off the option to Configure using
DHCP- Leave the Activate on boot checked- IP
Address 192.168.2.100- Netmask 255.255.255.0
526. Install the Linux Operating System
- Linux2
- eth0- Check off the option to Configure using
DHCP- Leave the Activate on boot checked- IP
Address 192.168.1.101- Netmask 255.255.255.0 - eth1- Check off the option to Configure using
DHCP- Leave the Activate on boot checked- IP
Address 192.168.2.101- Netmask 255.255.255.0
537. Configure Network Settings
547. Configure Network Settings
557. Configure Network Settings
- Note that the virtual IP addresses only need to
be defined in the /etc/hosts file for both nodes.
The public virtual IP addresses will be
configured automatically by Oracle when you run
the Oracle Universal Installer, which starts
Oracle's Virtual Internet Protocol Configuration
Assistant (VIPCA). All virtual IP addresses will
be activated when the srvctl start nodeapps -n
ltnode_namegt command is run. This is the Host
Name/IP Address that will be configured in the
client(s) tnsnames.ora file (more details later).
-
567. Configure Network Settings
- Adjusting Network Settings
- Oracle now uses UDP as the default protocol on
Linux for interprocess communication, such as
cache fusion buffer transfers between the
instances. It is strongly suggested to adjust
the default and maximum send buffer size
(SO_SNDBUF socket option) to 256 KB, and the
default and maximum receive buffer size
(SO_RCVBUF socket option) to 256 KB. The receive
buffers are used by TCP and UDP to hold received
data until is is read by the application. The
receive buffer cannot overflow because the peer
is not allowed to send data beyond the buffer
size window. This means that datagrams will be
discarded if they don't fit in the socket receive
buffer. This could cause the sender to overwhelm
the receiver . - To make the change permanent, add the following
lines to the /etc/sysctl.conf file, which is used
during the boot process net.core.rmem_default262
144 net.core.wmem_default262144 - net.core.rmem_max262144
- net.core.wmem_max262144
57 8. Obtain and Install a Proper Linux Kernel
- http//oss.oracle.com/projects/firewire/dist/files
/RedHat/RHEL4/i386/oracle-firewire-modules-2.6.9-2
2.EL-1286-1.i686.rpm - Install the supporting FireWire modules, as root
- Install the supporting FireWire modules package
by running either of the following - rpm -ivh oracle-firewire-modules-2.6.9-22.EL-128
6-1.i686.rpm - (for single processor)- OR -
rpm -ivh oracle-firewire-modules-2.6.9-22.ELsmp-12
86-1.i686.rpm - (for multiple processors) Add
module options - Add the following lines to /etc/modprobe.conf
- options sbp2 exclusive_login0
58 8. Obtain and Install a Proper Linux Kernel
- Connect FireWire drive to each machine and boot
into the new kernel - After both machines are powered down, connect
each of them to the back of the FireWire drive.
Power on the FireWire drive. Finally, power on
each Linux server and ensure to boot each machine
into the new kernel - Check for SCSI Device
- 0104.0 FireWire (IEEE 1394) Texas Instruments
TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) - Second, let's check to see that the modules are
loaded - lsmod egrep "ohci1394sbp2ieee1394sd_modscsi
_mod" - sd_mod 13744 0
- sbp2 19724 0
- scsi_mod 106664 3 sg sd_mod sbp2
- ohci1394 28008 0 (unused)
- ieee1394 62884 0 sbp2 ohci1394
59 8. Obtain and Install a Proper Linux Kernel
- Third, let's make sure the disk was detected and
an entry was made by the kernel - cat /proc/scsi/scsi
- Attached devices
- Host scsi0 Channel 00 Id 00 Lun 00
- Vendor Maxtor Model
- OneTouch Rev 0200 Type Direct-Access ANSI SCSI
revision 06 Now let's verify that the FireWire
drive is accessible for multiple logins and shows
a valid login - dmesg grep sbp2
- ieee1394 sbp2 Query logins to SBP-2 device
successful - ieee1394 sbp2 Maximum concurrent logins
supported 3 - ieee1394 sbp2 Number of active logins 1
ieee1394 sbp2 Logged into SBP-2 device - ieee1394 sbp2 Node011023 Max speed S400 -
Max payload 2048 - fdisk -l
- Disk /dev/sda 203.9 GB, 203927060480 bytes 255
heads, 63 sectors/track, 24792 cylinders Units
cylinders of 16065 512 8225280 bytes
609. Create "oracle" User and Directories (both
nodes)
- Perform the following procedure on all nodes in
the cluster! - I will be using the Oracle Cluster File System
(OCFS) to store the files required to be shared
for the Oracle Cluster Ready Services (CRS). When
using OCFS, the UID of the UNIX user oracle and
GID of the UNIX group dba must be identical on
all machines in the cluster. If either the UID or
GID are different, the files on the OCFS file
system will show up as "unowned" or may even be
owned by a different user. For this article, I
will use 175 for the oracle UID and 115 for the
dba GID. - Create Group and User for Oracle
- Let's continue our example by creating the Unix
dba group and oracle user account along with all
appropriate directories. - mkdir -p /u01/app groupadd -g 115 dba
useradd -u 175 -g 115 -d /u01/app/oracle -s
/bin/bash -c "Oracle Software Owner" -p oracle
oracle chown -R oracledba /u01 passwd oracle
su - oracle Note When you are setting the
Oracle environment variables for each RAC node,
ensure to assign each RAC node a unique Oracle
SID! For this example, I used - linux1 ORACLE_SIDorcl1
- linux2 ORACLE_SIDorcl2
619. Create "oracle" User and Directories (both
nodes)
- Now, let's create the mount point for the Oracle
Cluster File System (OCFS) that will be used to
store files for the Oracle Cluster Ready Service
(CRS). These commands will need to be run as the
"root" user account - su
- mkdir -p /u02/oradata/orcl
- chown -R oracledba /u02
- Oracle Cluster File System (OCFS) version 2
- OCFS version 1 is a great alternative to raw
devices. Not only is it easier to administer and
maintain, it overcomes the limit of 255 raw
devices. However, it is not a general-purpose
cluster filesystem. It may only be used to store
the following types of files - Oracle data files
- Online redo logs
- Archived redo logs
- Control files
- Spfiles
- CRS shared files (Oracle Cluster Registry and CRS
voting disk).
6210. Creating Partitions on the Shared
FireWire Storage Device
- Create the following partitions on only one node
in the cluster! - The next step is to create the required
partitions on the FireWire (shared) drive. As I
mentioned previously, we will use OCFS to store
the two files to be shared for CRS. We will then
use ASM for all physical database files
(data/index files, online redo log files, control
files, SPFILE, and archived redo log files). - The following table lists the individual
partitions that will be created on the FireWire
(shared) drive and what files will be contained
on them. -
Reboot All Nodes in RAC
Cluster fdisk -l /dev/sda
6311. Configure the Linux Servers
- Several of the commands within this section
will need to be performed on every node within
the cluster every time the machine is booted.
This section provides very detailed information
about setting shared memory, semaphores, and file
handle limits. - Setting SHMMAX
- /etc/sysctl.conf
- echo "kernel.shmmax2147483648" gtgt
/etc/sysctl.conf - Setting Semaphore Kernel Parameters
- echo "kernel.sem250 32000 100 128" gtgt
/etc/sysctl.conf - Setting File Handles
- echo "fs.file-max65536" gtgt /etc/sysctl.conf
- ulimit
- unlimited
-
6412. Configure the hangcheck-timer Kernel
Module
- Perform the following configuration procedures on
all nodes in the cluster! - Oracle 9.0.1 and 9.2.0.1 used a userspace
watchdog daemon called watchdogd to monitor the
health of the cluster and to restart a RAC node
in case of a failure. Starting with Oracle
9.2.0.2, the watchdog daemon was deprecated by a
Linux kernel module named hangcheck-timer that
addresses availability and reliability problems
much better. The hang-check timer is loaded into
the Linux kernel and checks if the system hangs.
It will set a timer and check the timer after a
certain amount of time. There is a configurable
threshold to hang-check that, if exceeded will
reboot the machine. Although the hangcheck-timer
module is not required for Oracle CRS, it is
highly recommended by Oracle. - The hangcheck-timer.o Module
- The hangcheck-timer module uses a
kernel-based timer that periodically checks the
system task scheduler to catch delays in order to
determine the health of the system. If the system
hangs or pauses, the timer resets the node. The
hangcheck-timer module uses the Time Stamp
Counter (TSC) CPU register, which is incremented
at each clock signal. The TCS offers much more
accurate time measurements because this register
is updated by the hardware automatically. - Configuring Hangcheck Kernel Module Parameters
- su
- echo "options hangcheck-timer hangcheck_tick30
hangcheck_margin180" gtgt /etc/modules.conf -
6513. Configure RAC Nodes for Remote Access
- Perform the following configuration procedures on
all nodes in the cluster! - When running the Oracle Universal Installer on a
RAC node, it will use the rsh (or ssh) command to
copy the Oracle software to all other nodes
within the RAC cluster. The oracle UNIX account
on the node running the Oracle Installer
(runInstaller) must be trusted by all other nodes
in your RAC cluster. Therefore you should be able
to run r commands like rsh, rcp, and rlogin on
the Linux server you will be running the Oracle
installer from, against all other Linux servers
in the cluster without a password. The rsh daemon
validates users using the /etc/hosts.equiv file
or the .rhosts file found in the user's
(oracle's) home directory. (The use of rcp and
rsh are not required for normal RAC operation.
However rcp and rsh should be enabled for RAC and
patchset installation.) - Oracle added support in 10g for using the Secure
Shell (SSH) tool suite for setting up user
equivalence. This article, however, uses the
older method of rcp for copying the Oracle
software to the other nodes in the cluster. When
using the SSH tool suite, the scp (as opposed to
the rcp) command would be used to copy the
software in a very secure manner. - First, let's make sure that we have the rsh RPMs
installed on each node in the RAC cluster - rpm -q rsh rsh-server rsh-0.17-17
rsh-server-0.17-17 -
6613. Configure RAC Nodes for Remote Access
- To enable the "rsh" service, the "disable"
attribute in the /etc/xinetd.d/rsh file must be
set to "no" and xinetd must be reloaded. Do that
by running the following commands on all nodes in
the cluster - su
- chkconfig rsh on
- chkconfig rlogin on
- service xinetd reload
- Reloading configuration OK
- To allow the "oracle" UNIX user account to
be trusted among the RAC nodes, create the
/etc/hosts.equiv file on all nodes in the
cluster - su
- touch /etc/hosts.equiv
- chmod 600 /etc/hosts.equiv
- chown root.root /etc/hosts.equiv
- Now add all RAC nodes to the /etc/hosts.equiv
file similar to the following example for all
nodes in the cluster - cat /etc/hosts.equiv
- linux1 oracle
- linux2 oracle
- int-linux1 oracle
- int-linux2 oracle
6714. All Startup Commands for Each RAC Node
- Verify that the following startup commands
are included on all nodes in the cluster! - Up to this point, we have examined in great
detail the parameters and resources that need to
be configured on all nodes for the Oracle RAC 10g
configuration. In this section we will take a
"deep breath" and recap those parameters,
commands, and entries (in previous sections of
this document) that you must include in the
startup scripts for each Linux node in the RAC
cluster. - /etc/modules.conf
- /etc/sysctl.conf
- /etc/hosts
- /etc/hosts.equiv
- /etc/grub.conf
- /etc/rc.local
-
6815. Check RPM Packages for Oracle 10g Release
1
- Perform the following checks on all nodes in the
cluster! - make-3.79.1
- gcc-3.2.3-34
- glibc-2.3.2-95.20
- glibc-devel-2.3.2-95.20
- glibc-headers-2.3.2-95.20
- glibc-kernheaders-2.4-8.34
- cpp-3.2.3-34
- compat-db-4.0.14-5
- compat-gcc-7.3-2.96.128
- compat-gcc-c-7.3-2.96.128
- compat-libstdc-7.3-2.96.128
- compat-libstdc-
- devel-7.3-2.96.128
- openmotif-2.2.2-16
- setarch-1.3-1
- init 6
6916. Install and Configure OCFS Release 2
- Most of the configuration procedures in this
section should be performed on all nodes in the
cluster! Creating the OCFS2 filesystem, however,
should be executed on only one node in the
cluster. - It is now time to install OCFS2. OCFS2 is a
cluster filesystem that allows all nodes in a
cluster to concurrently access a device via the
standard filesystem interface. This allows for
easy management of applications that need to run
across a cluster. - OCFS Release 1 was released in 2002 to enable
Oracle RAC users to run the clustered database
without having to deal with RAW devices. The
filesystem was designed to store database related
files, such as data files, control files, redo
logs, archive logs, etc. OCFS Release 2 (OCFS2),
in contrast, has been designed as a
general-purpose cluster filesystem. With it, one
can store not only database related files on a
shared disk, but also store Oracle binaries and
configuration files (shared Oracle Home) making
management of RAC even easier. - Downloading OCFS (Available in the Red Hat 4
CDs) - ocfs2-2.6.9-22.EL-1.2.3-1.i686.rpm - (for single
processor) - or
- ocfs2-2.6.9-22.ELsmp-1.2.3-1.i686.rpm - (for
multiple processors) - Installing OCFS
- We will be installing the OCFS files onto two
single-processor machines. The installation
process is simply a matter of running the
following command on all nodes in the cluster as
the root user account -
7016. Install and Configure OCFS Release 2
- su
- rpm -Uvh ocfs2-2.6.9-22.EL-1.2.3-1.i686.rpm
\ocfs2console-1.2.1-1.i386.rpm
\ocfs2-tools-1.2.1-1.i386.rpm -
7117. Install and Configure Automatic Storage
Management and Disks
- Most of the installation and configuration
procedures should be performed on all nodes.
Creating the ASM disks, however, will only need
to be performed on a single node within the
cluster. - In this section, we will configure Automatic
Storage Management (ASM) to be used as the
filesystem/volume manager for all Oracle physical
database files (data, online redo logs, control
files, archived redo logs). - ASM was introduced in Oracle Database 10g and
relieves the DBA from having to manage individual
files and drives. ASM is built into the Oracle
kernel and provides the DBA with a way to manage
thousands of disk drives 24x7 for single as well
as clustered instances. All the files and
directories to be used for Oracle will be
contained in a disk group. ASM automatically
performs load balancing in parallel across all
available disk drives to prevent hot spots and
maximize performance, even with rapidly changing
data usage patterns.
7217. Install and Configure Automatic Storage
Management and Disks
- Downloading the ASMLib Packages
- Installing ASMLib Packages
- Edit the file /etc/sysconfig/rawdevices as
follows raw device bindings format ltrawdevgt
ltmajorgt ltminorgt ltrawdevgt ltblockdevgt example
/dev/raw/raw1 /dev/sda1 /dev/raw/raw2 8
5/dev/raw/raw2 /dev/sda2/dev/raw/raw3
/dev/sda3/dev/raw/raw4 /dev/sda4The raw device
bindings will be created on each reboot. - You would then want to change ownership of all
raw devices to the "oracle" user account chown
oracledba /dev/raw/raw2 chmod 660
/dev/raw/raw2 chown oracledba /dev/raw/raw3
chmod 660 /dev/raw/raw3 chown oracledba
/dev/raw/raw4 chmod 660 /dev/raw/raw4 - The last step is to reboot the server to bind the
devices or simply restart the rawdevices service
service rawdevices restart
7317. Install and Configure Automatic Storage
Management and Disks
- Creating ASM Disks for Oracle
- Install ASMLib 2.0 Packages
- This installation needs to be performed on all
nodes as the root user account - su - rpm -Uvh oracleasm-2.6.9-22.EL-2.0.3-1.i6
86.rpm \oracleasmlib-2.0.2-1.i386.rpm
\oracleasm-support-2.0.3-1.i386.rpmPreparing...
1001oracleasm-support
332oracleasm-2.6.9-22.E
L
673oracleasmlib
100
7417. Install and Configure Automatic Storage
Management and Disks
- su
- /etc/init.d/oracleasm createdisk VOL1 /dev/sda2
- Marking disk "/dev/sda2" as an ASM disk OK
- /etc/init.d/oracleasm createdisk VOL2 /dev/sda3
- Marking disk "/dev/sda3" as an ASM disk OK
- /etc/init.d/oracleasm createdisk VOL3 /dev/sda4
- Marking disk "/dev/sda4" as an ASM disk OK
- If you do receive a failure, try listing all ASM
disks using - /etc/init.d/oracleasm listdisks
- VOL1
- VOL2
- VOL3
7517. Install and Configure Automatic Storage
Management and Disks
- On all other nodes in the cluster, you must
perform a scandisk to recognize the new volumes
/etc/init.d/oracleasm scandisks Scanning system
for ASM disks OK We can now test that the ASM
disks were successfully created by using the
following command on all nodes as the root user
account - /etc/init.d/oracleasm listdisks
- VOL1
- VOL2
- VOL3
7618. Download Oracle RAC 10g Release 2
Software
- The following download procedures only need to be
performed on one node in the cluster! - The next logical step is to install Oracle
Clusterware Release 2 (10.2.0.1.0), Oracle
Database 10g Release 2 (10.2.0.1.0), and finally
the Oracle Database 10g Companion CD Release 2
(10.2.0.1.0) for Linux x86 software. However, you
must first download and extract the required
Oracle software packages from OTN. - You will be downloading and extracting the
required software from Oracle to only one of the
Linux nodes in the clusternamely, linux1. You
will perform all installs from this machine. The
Oracle installer will copy the required software
packages to all other nodes in the RAC
configuration we set up in Section 13. - Login to one of the nodes in the Linux RAC
cluster as the oracle user account. In this
example, you will be downloading the required
Oracle software to linux1 and saving them to
/u01/app/oracle/orainstall.
7719. Install Oracle 10g Clusterware Software
- Perform the following installation procedures on
only one node in the cluster! The Oracle
Clusterware software will be installed to all
other nodes in the cluster by the Oracle
Universal Installer. - You are now ready to install the "cluster" part
of the environment - the Oracle Clusterware. In
the previous section, you downloaded and
extracted the install files for Oracle
Clusterware to linux1 in the directory
/u01/app/oracle/orainstall/clusterware. This is
the only node from which you need to perform the
install. - During the installation of Oracle Clusterware,
you will be asked for the nodes involved and to
configure in the RAC cluster. Once the actual
installation starts, it will copy the required
software to all nodes using the remote access we
configured in the section Section 13 ("Configure
RAC Nodes for Remote Access"). - So, what exactly is the Oracle Clusterware
responsible for? It contains all of the cluster
and database configuration metadata along with
several system management features for RAC. It
allows the DBA to register and invite an Oracle
instance (or instances) to the cluster. During
norma