Scalable, Fault-Tolerant NAS for Oracle - The Next Generation PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Scalable, Fault-Tolerant NAS for Oracle - The Next Generation


1
Scalable, Fault-Tolerant NAS for Oracle - The
Next Generation
  • Kevin Closson
  • Chief Software Architect
  • Oracle Platform Solutions, Polyserve Inc

2
The Un-Show Stopper
  • NAS for Oracle is not file serving, let me
    explain
  • Think of GbE NFS I/O paths from Oracle Servers to
    the NAS device that are totally direct. No
    VLANing sort of indirection.
  • In these terms, NFS over GbE is just a protocol
    as is FCPover FiberChannel
  • The proof is in the numbers.
  • A single dual-socket/dual-core ADM server running
    Oracle10gR2 can push through 273MB/s of large
    I/Os (scattered reads, direct path read/write,
    etc) of triple-bonded GbE NICs!
  • Compare that to infrastructure and HW costs of
    4GbE FCP (450MB/s, but you need 2 cards for
    redundancy)
  • OLTP over modern NFS with GbE is not a
    challenging I/O profile.
  • However, not all NAS devices are created equal by
    any means

3
Agenda
  • Oracle on NAS
  • NAS Architecture
  • Proof of Concept Testing
  • Special Characteristics

4
Oracle on NAS
5
Oracle on NAS
  • Connectivity
  • Fantasyland Dream Grid would be nearly
    impossible with FibreChannel switched fabric, for
    instance
  • 128 nodes 256 HBAs, 2 switches each with 256
    ports just for the servers then you have to work
    out storage paths
  • Simplicity
  • NFS is simple. Anyone with a pulse can plug in
    cat-5 and mount filesystems.
  • MUCH MUCH MUCH MUCH MUCH simpler than
  • Raw partitions for ASM
  • Raw, OCFS2 for CRS
  • Oracle Home? Local Ext3 or UFS?
  • What a mess
  • Supports shared Oracle Home, shared APPL_TOP too
  • But not simpler than a Certified Third Party
    Cluster Filesystem , but that is a different
    presentation
  • Cost
  • FC HBAs are always going to be more expensive
    than NICs
  • Ports on enterprise-level FC switches are very
    expensive

6
Oracle on NAS
  • NFS Client Improvements
  • Direct IO
  • open(,O_DIRECT,) works with Linux NFS clients,
    Solaris NFS client, likely others
  • Oracle Improvements
  • init.ora filesystemio_optionsdirectIO
  • No async I/O on NFS, but look at the numbers
  • Oracle runtime checks mount options
  • Caveat It doesnt always get it right, but at
    least it tries (OSDS)
  • Dont be surprised to see Oracle offer a
    platform-independent NFS client
  • NFS V4 will have more improvements

7
NAS Architecture
8
NAS Architecture
  • Single-headed Filers
  • Clustered Single-headed Filers
  • Asymmetrical Multi-headed NAS
  • Symmetrical Multi-headed NAS

9
Single Headed Filer Architecture
10
NAS Architecture Single-headed Filer
GigE Network
Filesystems /u01 /u02 /u03
11
Oracle Servers Accessing a Single-headed Filer
I/O Bottleneck
A single one of these
Has the same (or more) bus bandwidth as this!
I/O Bottleneck
Filesystems /u01 /u02 /u03
12
Oracle Servers Accessing a Single-headed Filer
Single Point of Failure
Highly Available through failover-HA, DataGuard,
RAC, etc
Single Point of Failure
Filesystems /u01 /u02 /u03
13
Clustered Single-headed Filers
14
Architecture Cluster of Single-headed Filers
Paths Active After Failover
Filesystems /u01 /u02
Filesystems /u03
15
Oracle Servers Accessing a Cluster of
Single-headed Filers
16
Architecture Cluster of Single-headed Filers
What if /u03 I/O saturates this Filer?
17
Filer I/O Bottleneck. Resolution Data Migration
Paths Active After Failover
Filesystems /u01 /u02
Filesystems /u03
Filesystems /u04
Migrate some of the hot data to /u04
18
Data Migration Remedies I/O Bottleneck
NEW Single Point of Failure
Paths Active After Failover
Filesystems /u01 /u02
Filesystems /u03
Filesystems /u04
Migrate some of the hot data to /u04
19
Summary Single-headed Filers
  • Cluster to mitigate S.P.O.F
  • Clustering is a pure afterthought with filers
  • Failover Times?
  • Long, really really long.
  • Transparent?
  • Not in many cases.
  • Migrate data to mitigate I/O bottlenecks
  • What if the data hot spot moves with time? The
    Dog Chasing His Tail Syndrome
  • Poor Modularity
  • Expanded by pairs for data availability
  • Whats all this talk about CNS?

20
Asymmetrical Multi-headed NAS Architecture
21
Asymmetrical Multi-headed NAS Architecture
Three Active NAS Heads / Three For Failover
and Pools of Data
FibreChannel SAN


Note Some variants of this architecture support
M1 ActiveStandby but that doesnt really change
much.
22
Asymmetrical NAS Gateway Architecture
  • Really not much different than clusters of
    single-headed filers
  • 1 NAS head to 1 filesystem relationship
  • Migrate data to mitigate I/O contention
  • Failover not transparent
  • But
  • More Modular
  • Not necessary to scale up by pairs

23
Symmetric Multi-headed NAS
24
HP Enterprise File Services Clustered Gateway
25
Symmetric vs Asymmetric
EFS-CG
26
Enterprise File Services Clustered Gateway
Component Overview
  • Cluster Volume Manager
  • RAID 0
  • Expand Online
  • Fully Distributed, Symmetric Cluster Filesystem
  • The embedded filesystem is a fully distributed,
    symmetric cluster filesystem
  • Virtual NFS Services
  • Filesystems are presented through Virtual NFS
    Services
  • Modular and Scalable
  • Add NAS heads without interruption
  • All filesystems can be presented for read/write
    through any/all NAS heads

27
EFS-CG Clustered Volume Manager
  • RAID 0
  • LUNS are RAID 1, so this implements S.A.M.E.
  • Expand online
  • Add LUNS, grow volume
  • Up to 16TB
  • Single Volume

28
The EFS-CG Filesystem
  • All NAS devices have embedded operating systems
    and file systems, but the EFS-CG is
  • Fully Symmetric
  • Distributed Lock Manager
  • No Metadata Server or Lock Server
  • General Purpose clustered file system
  • Standard C Library and POSIX support
  • Journaled with Online recovery
  • Proprietary format but uses standard Linux file
    system semantics and system calls including
    flock() and fcntl() clusterwide
  • Expand a single filesystem online up to 16TB, up
    to 254 filesystems in current release.

29
EFS-CG Filesystem Scalability
30
Scalability. Single Filesystem Export Using x86
Xeon-based NAS Heads (Old Numbers)
1,196
1,084
986
1,200
1,000
739
800
MegaBytes per
Second (MB/s)
493
600
400
246
ApproximateSingle-headed Filer limit
123
200
0
1
2
4
6
8
9
10
Cluster Size (Nodes)
NAS Heads
HP StorageWorks Clustered File System is
optimized for both READ and WRITE performance.
31
Virtual NFS Services
  • Specialized Virtual Host IP
  • Filesystem groups are exported through VNFS
  • VNFS failover and rehosting are 100 transparent
    to NFS client
  • Including active file descriptors, file locks
    (e.g. fctnl/flock), etc

32
EFS-CG Filesystems and VNFS
33
Enterprise File Services Clustered Gateway
Enterprise File Services Clustered Gateway
vnfs2b
vnfs1
vnfs1b
vnfs3b
NAS Head
NAS Head
NAS Head
NAS Head
/u03
/u02
/u03
/u01
/u04
/u04

/u01
/u02
/u03
/u04
34
EFS-CG Management Console
35
EFS-CG Proof of Concept
36
EFS-CG Proof of Concept
  • Goals
  • Use Oracle10g (10.2.0.1) with a single high
    performance filesystem for the RAC database and
    measure
  • Durability
  • Scalability
  • Virtual NFS functionality

37
EFS-CG Proof of Concept
  • The 4 filesystems presented by the EFS-CG were
  • /u01. This filesystems contained all Oracle
    executables (e.g., ORACLE_HOME)
  • /u02. This filesystem contained the Oracle10gR2
    clusterware files (e.g., OCR, CSS) and some
    datafiles and External Tables for ETL testing
  • /u03. This filesystem was lower-performance space
    used for miscellaneous tests such as backup
    disk-to-disk
  • /u04. This filesystem resided on a
    high-performance volume that spanned two storage
    arrays. It contained the main benchmark database

38
EFS-CG P.O.C. Parallel Tablespace Creation
  • All datafiles created in a single exported
    filesystem
  • Proof of multi-headed, single filesystem write
    scalability

39
EFS-CG P.O.C. Parallel Tablespace Creation
40
EFS-CG P.O.C. Full Table Scan Performance
  • All datafiles located in a single exported
    filesystem
  • Proof of multi-headed, single filesystem
    sequential I/O scalability

41
EFS-CG P.O.C.Parallel Query Scan Throughput
42
EFS-CG P.O.C.OLTP Testing
  • OLTP Database based on an Order Entry Schema and
    workload
  • Test areas
  • Physical I/O Scalability under Oracle OLTP
  • Long Duration Testing


43
EFS-CG P.O.C.OLTP Workload Transaction Avg Cost
Oracle Statistics Average Per Transaction
SGA Logical Reads 33
SQL Executions 5
Physical I/O 6.9
Block Changes 8.5
User Calls 6
GCS/GES Messages Sent 12

Averages with RAC can be deceiving, be aware of
CR sends
44
EFS-CG P.O.C.OLTP Testing
45
EFS-CG P.O.C.OLTP Testing. Physical I/O
Operations
46
EFS-CG Handles all OLTP I/O Types Sufficientlyno
Logging Bottleneck
47
Long Duration Stress Test
  • Benchmarks do not prove durability
  • Benchmarks are sprints
  • Typically 30-60 minute measured runs (e.g.,
    TPC-C)
  • This long duration stress test was no benchmark
    by any means ?
  • Ramp OLTP I/O up to roughly 10,000/sec
  • Run non-stop until the aggregate I/O breaks
    through 10 Billion physical transfers
  • 10,000 physical I/O transfers per second for
    every second of nearly 12 days

48
Long Duration Stress Test
49
Long Duration Stress Test
50
Long Duration Stress Test
51
(No Transcript)
52
Special Characteristics
53
Special Characteristics
  • The EFS-CG NAS Heads are Linux Servers
  • Tasks can be executed directly within the EFS-CG
    NAS Heads at FCP speed
  • Compression
  • ETL, data importing
  • Backup
  • etc..

54
Example of EFS-CG Special Functionality
  • A table is exported on one of the RAC nodes
  • The export file is then compressed on the EFS-CG
    NAS head
  • CPU from NAS Head, instead of database servers
  • The NAS heads are really just protocol engines.
    I/O DMAs are offloaded to the I/O subsysystems.
    There are plenty of spare cycles.
  • Data movement at FCP rate instead of GigE
  • Offload the I/O fabric (NFS paths from servers to
    the EFS-CG)

55
Export a Table to NFS Mount
56
Compress it on the NAS Head
57
Questions and Answers
58
Backup Slide
59
EFS-CG Scales Up and Out
EFS-CG NAS Head
EFS-CG NAS Head

SAN
Write a Comment
User Comments (0)
About PowerShow.com