ASM without HW RAID - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

ASM without HW RAID

Description:

... Storage deployment ASM implementation details Storage in JBOD configuration (1 disk - 1 LUN) Each disk partitioned on OS level 1st partition ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 36

Provided by: LucaCana

Category:

more less

Transcript and Presenter's Notes

Title: ASM without HW RAID

1
Implementing ASM Without HW RAID,A Users
Experience
Luca Canali, CERN Dawid Wojcik, CERN UKOUG,
Birmingham, December 2008
2
Outlook

Introduction to ASM
Disk groups, fail groups, normal redundancy
Scalability and Performance of the solution
Possible pitfalls, sharing experiences
Implementation details, monitoring, and tools to
ease ASM deployment

3
Architecture and main concepts

Why ASM ?
Provides functionality of volume manager and a
cluster file system
Raw access to storage for performance
Why ASM-provided mirroring?
Allows to use lower-cost storage arrays
Allows to mirror across storage arrays
arrays are not single points of failure
Array (HW) maintenances can be done in a rolling
way
Stretch clusters

4
ASM and cluster DB architecture

Oracle architecture of redundant low-cost
components

5
Files, extents, and failure groups

Files and
extent
pointers
Failgroups
and ASM
mirroring

6
ASM disk groups

Example HW 4 disk arrays with 8 disks each
An ASM diskgroup is created using all available
disks
The end result is similar to a file system on
RAID 10
ASM allows to mirror across storage arrays
Oracle RDBMS processes directly access the
storage
RAW disk access

ASM Diskgroup
Mirroring
Striping
Striping
Failgroup1
Failgroup2
7
Performance and scalability

ASM with normal redundancy
Stress tested for CERNs use
Scales and performs

8
Case Study the largest cluster I have ever
installed, RAC5

The test used14 servers

9
Multipathed fiber channel

8 FC switches 4Gbps (10Gbps uplink)

10
Many spindles

26 storage arrays (16 SATA disks each)

11
Case Study I/O metrics for the RAC5 cluster

Measured, sequential I/O
Read 6 GB/sec
Read-Write 33 GB/sec
Measured, small random IO
Read 40K IOPS (8 KB read ops)
Note
410 SATA disks, 26 HBAS on the storage arrays
Servers 14 x 44Gbps HBAs, 112 cores, 224 GB of
RAM

12
How the test was run

A custom SQL-based DB workload
IOPS Probe randomly a large table (several TBs)
via several parallel queries slaves (each reads a
single block at a time)
MBPS Read a large (several TBs) table with
parallel query
The test table used for the RAC5 cluster was 5 TB
in size
created inside a disk group of 70TB

13
Possible pitfalls

Production Stories
Sharing experiences
3 years in production, 550 TB of raw capacity

14
Rebalancing speed

Rebalancing is performed (and mandatory) after
space management operations
Typically after HW failures (restore mirror)
Goal balanced space allocation across disks
Not based on performance or utilization
ASM instances are in charge of rebalancing
Scalability of rebalancing operations?
In 10g serialization wait events can limit
scalability
Even at maximum speed rebalancing is not always
I/O bound

15
Rebalancing, an example
16
VLDB and rebalancing

Rebalancing operations can move more data than
expected
Example
5 TB (allocated) 100 disks, 200 GB each
A disk is replaced (diskgroup rebalance)
The total IO workload is 1.6 TB (8x the disk
size!)
How to see this query vasm_operation, the
column EST_WORK keeps growing during rebalance
The issue excessive repartnering

17
Rebalancing issues wrap-up

Rebalancing can be slow
Many hours for very large disk groups
Risk associated
2nd disk failure while rebalancing
Worst case - loss of the diskgroup because
partner disks fail

18
Fast Mirror Resync

ASM 10g with normal redundancy does not allow to
offline part of the storage
A transient error in a storage array can cause
several hours of rebalancing to drop and add
disks
It is a limiting factor for scheduled
maintenances
11g has new feature fast mirror resync
Great feature for rolling intervention on HW

19
ASM and filesystem utilities

Only a few tools can access ASM
Asmcmd, dbms_file_transfer, xdb, ftp
Limited operations (no copy, rename, etc)
Require open DB instances
file operations difficult in 10g
11g asmcmd has the copy command

20
ASM and corruption

ASM metadata corruption
Can be caused by bugs
One case in prod after disk eviction
Physical data corruption
ASM will fix automatically most corruption on
primary extent
Typically when doing a full backup
Secondary extent corruption goes undetected
untill disk failure/rebalance can expose it

21
Disaster recovery

Corruption issues were fixed using physical
standby to move to fresh storage
For HA our experience is that disaster recovery
is needed
Standby DB
On-disk (flash) copy of DB

22
Implementation details
23
Storage deployment

Current storage deployment for Physics Databases
at CERN
SAN, FC (4Gb/s) storage enclosures with SATA
disks (8 or 16)
Linux x86_64, no ASM lib, device mapper instead
(naming persistence HA)
Over 150 FC storage arrays (production,
integration and test) and 2000 LUNs exposed
Biggest DB over 7TB (more to come when LHC starts
estimated growth up to 11TB/year)

24
Storage deployment

ASM implementation details
Storage in JBOD configuration (1 disk -gt 1 LUN)
Each disk partitioned on OS level
1st partition 45 of disk size faster part of
disk short stroke
2nd partition rest slower part full stroke

inner sectors full stroke
outer sectors short stroke
25
Storage deployment

Two diskgroups created for each cluster
DATA data files and online redo logs outer
part of the disks
RECO flash recovery area destination archived
redo logs and on disk backups inner part of the
disks
One failgroup per storage array

Failgroup4
Failgroup2
Failgroup3
Failgroup1
DATA_DG1
RECO_DG1
26
Storage management

SAN configuration in JBOD configuration many
steps, can be time consuming
Storage level
logical disks
LUNs
mappings
FC infrastructure zoning
OS creating device mapper configuration
multipath.conf name persistency HA

27
Storage management

Storage manageability
DBAs set-up initial configuration
ASM extra maintenance in case of storage
maintenance (disk failure)
Problems
How to quickly set-up SAN configuration
How to manage disks and keep track of the
mappingsphysical disk -gt LUN -gt Linux disk -gt
ASM Disk

SCSI 1013 2013 -gt/dev/sdn
/dev/sdax -gt/dev/mpath/rstor901_3 -gtASM
TEST1_DATADG1_0016
28
Storage management

Solution
Configuration DB - repository of FC switches,
port allocations and of all SCSI identifiers for
all nodes and storages
Big initial effort
Easy to maintain
High ROI
Custom tools
Tools to identify
SCSI (block) devices lt-gt device mapper device lt-gt
physical storage and FC port
Device mapper mapper device lt-gt ASM disk
Automatic generation of device mapper
configuration

29
Storage management

lssdisks.py
The following storages are connected
Host interface 1
Target ID 100 - WWPN 210000D0230BE0B5 -
Storage rstor316, Port 0
Target ID 101 - WWPN 210000D0231C3F8D -
Storage rstor317, Port 0
Target ID 102 - WWPN 210000D0232BE081 -
Storage rstor318, Port 0
Target ID 103 - WWPN 210000D0233C4000 -
Storage rstor319, Port 0
Target ID 104 - WWPN 210000D0234C3F68 -
Storage rstor320, Port 0
Host interface 2
Target ID 200 - WWPN 220000D0230BE0B5 -
Storage rstor316, Port 1
Target ID 201 - WWPN 220000D0231C3F8D -
Storage rstor317, Port 1
Target ID 202 - WWPN 220000D0232BE081 -
Storage rstor318, Port 1
Target ID 203 - WWPN 220000D0233C4000 -
Storage rstor319, Port 1
Target ID 204 - WWPN 220000D0234C3F68 -
Storage rstor320, Port 1
SCSI Id Block DEV MPath name
MP status Storage Port
------------- ---------------- -------------------
- ---------- ------------------ -----
0000 /dev/sda -
- - -

Custom made script
30
Storage management

listdisks.py
DISK NAME GROUP_NAME
FG H_STATUS MODE MOUNT_S STATE
TOTAL_GB USED_GB
---------------- ------------------ -------------
---------- ---------- ------- -------- -------
------ -----
rstor401_1p1 RAC9_DATADG1_0006 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.5
rstor401_1p2 RAC9_RECODG1_0000 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
119.9 1.7
rstor401_2p1 -- --
-- UNKNOWN ONLINE CLOSED NORMAL
111.8 111.8
rstor401_2p2 -- --
-- UNKNOWN ONLINE CLOSED NORMAL
120.9 120.9
rstor401_3p1 RAC9_DATADG1_0007 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.6
rstor401_3p2 RAC9_RECODG1_0005 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
120.9 1.8
rstor401_4p1 RAC9_DATADG1_0002 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.5
rstor401_4p2 RAC9_RECODG1_0002 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
120.9 1.8
rstor401_5p1 RAC9_DATADG1_0001 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.5
rstor401_5p2 RAC9_RECODG1_0006 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
120.9 1.8
rstor401_6p1 RAC9_DATADG1_0005 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.5
rstor401_6p2 RAC9_RECODG1_0007 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
120.9 1.8
rstor401_7p1 RAC9_DATADG1_0000 RAC9_DATADG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
111.8 68.6
rstor401_7p2 RAC9_RECODG1_0001 RAC9_RECODG1
RSTOR401 MEMBER ONLINE CACHED NORMAL
120.9 1.8

Custom made script
31
Storage management

gen_multipath.py
multipath default configuration for PDB
defaults
udev_dir /dev
polling_interval 10
selector "round-robin 0"
. . .
. . .
multipaths
multipath
wwid
3600d0230006c26660be0b5080a407e00
alias
rstor916_CRS
multipath
wwid
3600d0230006c26660be0b5080a407e01
alias rstor916_1
. . .

Custom made script
device mapper alias naming persistency and
multipathing (HA)
SCSI 1013 2013 -gt/dev/sdn
/dev/sdax -gt/dev/mpath/rstor916_1
32
Storage monitoring