A Closer Look inside Oracle ASM - PowerPoint PPT Presentation

About This Presentation
Title:

A Closer Look inside Oracle ASM

Description:

Goal: Reading ASM files with OS tools, using metadata information from X$ tables ... File#0, AU=0: disk header (disk name, etc), Allocation Table (AT) and Free Space ... – PowerPoint PPT presentation

Number of Views:460
Avg rating:3.0/5.0
Slides: 42
Provided by: lucac
Category:

less

Transcript and Presenter's Notes

Title: A Closer Look inside Oracle ASM


1
A Closer Look inside Oracle ASM
  • UKOUG Conference 2007
  • Luca Canali, CERN IT

2
Outline
  • Oracle ASM for DBAs
  • Introduction and motivations
  • ASM is not a black box
  • Investigation of ASM internals
  • Focus on practical methods and troubleshooting
  • ASM and VLDB
  • Metadata, rebalancing and performance
  • Lessons learned from CERNs production DB
    services

3
ASM
  • Oracle Automatic Storage Management
  • Provides the functionality of a volume manager
    and filesystem for Oracle (DB) files
  • Works with RAC
  • Oracle 10g feature aimed at simplifying storage
    management
  • Together with Oracle Managed Files and the Flash
    Recovery Area
  • An implementation of S.A.M.E. methodology
  • Goal of increasing performance and reducing cost

4
ASM for a Clustered Architecture
  • Oracle architecture of redundant low-cost
    components

5
ASM Disk Groups
  • Example HW 4 disk arrays with 8 disks each
  • An ASM diskgroup is created using all available
    disks
  • The end result is similar to a file system on
    RAID 10
  • ASM allows to mirror across storage arrays
  • Oracle RDBMS processes directly access the
    storage
  • RAW disk access

ASM Diskgroup
Mirroring
Striping
Striping
Failgroup1
Failgroup2
6
Files, Extents, and Failure Groups
  • Files and
  • extent
  • pointers
  • Failgroups
  • and ASM
  • mirroring

7
ASM Is not a Black Box
  • ASM is implemented as an Oracle instance
  • Familiar operations for the DBA
  • Configured with SQL commands
  • Info in V views
  • Logs in udump and bdump
  • Some secret details hidden in XTABLES and
    underscore parameters

8
Selected V Views and X Tables
View Name X Table Description
VASM_DISKGROUP XKFGRP performs disk discovery and lists diskgroups
VASM_DISK XKFDSK, XKFKID performs disk discovery, lists disks and their usage metrics
VASM_FILE XKFFIL lists ASM files, including metadata
VASM_ALIAS XKFALS lists ASM aliases, files and directories
VASM_TEMPLATE XKFTMTA ASM templates and their properties
VASM_CLIENT XKFNCL lists DB instances connected to ASM
VASM_OPERATION XKFGMG lists current rebalancing operations
N.A. XKFKLIB available libraries, includes asmlib
N.A. XKFDPARTNER lists disk-to-partner relationships
N.A. XKFFXP extent map table for all ASM files
N.A. XKFDAT allocation table for all ASM disks
9
ASM Parameters
  • Notable ASM instance parameters
  • .asm_diskgroups'TEST1_DATADG1','TEST1_RECODG1'
  • .asm_diskstring'/dev/mpath/itstorp'
  • .asm_power_limit5
  • .shared_pool_size70M
  • .db_cache_size50M
  • .large_pool_size50M
  • .processes100

10
More ASM Parameters
  • Underscore parameters
  • Several undocumented parameters
  • Typically dont need tuning
  • Exception _asm_ausize and _asm_stripesize
  • May need tuning for VLDB in 10g
  • New in 11g, diskgroup attributes
  • VASM_ATTRIBUTE, most notable
  • disk_repair_time
  • au_size
  • XKFENV shows underscore attributes

11
ASM Storage Internals
  • ASM Disks are divided in Allocation Units (AU)
  • Default size 1 MB (_asm_ausize)
  • Tunable diskgroup attribute in 11g
  • ASM files are built as a series of extents
  • Extents are mapped to AUs using a file extent map
  • When using normal redundancy, 2 mirrored
    extents are allocated, each on a different
    failgroup
  • RDBMS read operations access only the primary
    extent of a mirrored couple (unless there is an
    IO error)
  • In 10g the ASM extent size AU size

12
ASM Metadata Walkthrough
  • Three examples follow of how to read data
    directly from ASM.
  • Motivations
  • Build confidence in the technology, i.e. get a
    feeling of how ASM works
  • It may turn out useful one day to troubleshoot a
    production issue.

13
Example 1 Direct File Access 1/2
  • Goal Reading ASM files with OS tools, using
    metadata information from X tables
  • Example find the 2 mirrored extents of the RDBMS
    spfile
  • sys_at_ASM1gt select GROUP_KFFXP Group, DISK_KFFXP
    Disk, AU_KFFXP AU, XNUM_KFFXP Extent from
    XKFFXP where number_kffxp(select file_number
    from vasm_alias where name'spfiletest1.ora')
  • GROUP DISK AU EXTENT
  • ---------- ---------- ---------- ----------
  • 1 16 17528 0
  • 1 4 14838 0

14
Example 1 Direct File Access 2/2
  • Find the disk path
  • sys_at_ASM1gt select disk_number,path from
  • vasm_disk where GROUP_NUMBER1 and disk_number
  • in (16,4)
  • DISK_NUMBER PATH
  • ----------- ------------------------------------
  • 4 /dev/mpath/itstor417_1p1
  • 16 /dev/mpath/itstor419_6p1
  • Read data from disk using dd
  • dd if/dev/mpath/itstor419_6p1 bs1024k
  • count1 skip17528 strings

15
XKFFXP
Column Name Description
NUMBER_KFFXP ASM file number. Join with vasm_file and vasm_alias
COMPOUND_KFFXP File identifier. Join with compound_index in vasm_file
INCARN_KFFXP File incarnation id. Join with incarnation in vasm_file
XNUM_KFFXP ASM file extent number (mirrored extent pairs have the same extent value)
PXN_KFFXP Progressive file extent number
GROUP_KFFXP ASM disk group number. Join with vasm_disk and vasm_diskgroup
DISK_KFFXP ASM disk number. Join with vasm_disk
AU_KFFXP Relative position of the allocation unit from the beginning of the disk.
LXN_KFFXP 0-gtprimary extent,1-gtmirror extent, 2-gt2nd mirror copy (high redundancy and metadata)
16
Example 2 A Different Way
  • A different metadata table to reach the same goal
    of reading ASM files directly from OS
  • sys_at_ASM1gt select GROUP_KFDAT Group
    ,NUMBER_KFDAT Disk, AUNUM_KFDAT AU from XKFDAT
    where fnum_kfdat(select file_number from
    vasm_alias where name'spfiletest1.ora')
  • GROUP DISK AU
  • ---------- ---------- ----------
  • 1 4 14838
  • 1 16 17528

17
XKFDAT
Column Name (subset) Description
GROUP_KFDAT Diskgroup number, join with vasm_diskgroup
NUMBER_KFDAT Disk number, join with vasm_disk
COMPOUND_KFDAT Disk compund_index, join with vasm_disk
AUNUM_KFDAT Disk allocation unit (relative position from the beginning of the disk), join with xkffxp.au_kffxp
V_KFDAT Flag Vthis Allocation Unit is used FAU is free
FNUM_KFDAT File number, join with vasm_file
XNUM_KFDAT Progressive file extent number join with xkffxp.pxn_kffxp
18
Example 3 Yet Another Way
  • Using the internal package dbms_diskgroup
  • declare
  • fileType varchar2(50) fileName varchar2(50)
  • fileSz number blkSz number hdl number plkSz
    number
  • data_buf raw(4096)
  • begin
  • fileName 'TEST1_DATADG1/TEST1/spfiletest1.ora
    '
  • dbms_diskgroup.getfileattr(fileName,fileType,file
    Sz, blkSz)
  • dbms_diskgroup.open(fileName,'r',fileType,blkSz,
    hdl,plkSz, fileSz)
  • dbms_diskgroup.read(hdl,1,blkSz,data_buf)
  • dbms_output.put_line(data_buf)
  • end
  • /

19
DBMS_DISKGROUP
  • Can be used to read/write ASM files directly
  • Its an Oracle internal package
  • Does not require a RDBMS instance
  • 11gs asmcmd cp command uses dbms_diskgroup

Procedure Name Parameters
dbms_diskgroup.open (fileName, openMode, fileType, blkSz, hdl,plkSz, fileSz)
dbms_diskgroup.read (hdl, offset, blkSz, data_buf)
dbms_diskgroup.createfile (fileName, fileType, blkSz, fileSz, hdl, plkSz, fileGenName)
dbms_diskgroup.close (hdl)
dbms_diskgroup.commitfile (handle)
dbms_diskgroup.resizefile (handle,fsz)
dbms_diskgroup.remap (gnum, fnum, virt_extent_num)
dbms_diskgroup.getfileattr (fileName, fileType, fileSz, blkSz)
20
File Transfer Between OS and ASM
  • The supported tools (10g)
  • RMAN
  • DBMS_FILE_TRANSFER
  • FTP (XDB)
  • WebDAV (XDB)
  • They all require a RDBMS instance
  • In 11g, all the above plus asmcmd
  • cp command
  • Works directly with the ASM instance

21
Strace and ASM 1/3
  • Goal understand strace output when using ASM
    storage
  • Example
  • read64(15,"33\0_at_\"..., 8192, 473128960)8192
  • This is a read operation of 8KB from FD 15 at
    offset 473128960
  • What is the segment name, type, file and block ?

22
Strace and ASM 2/3
  • From /proc/ltpidgt/fd I find that FD15 is
  • /dev/mpath/itstor420_1p1
  • This is disk 20 of D.G.1 (from vasm_disk)
  • From xkffxp I find the ASM file and extent
  • Note offset 473128960 451 MB 27 8KB
  • sys_at_ASM1gtselect number_kffxp, xnum_kffxp from
    xkffxp where group_kffxp1 and disk_kffxp20 and
    au_kffxp451
  • NUMBER_KFFXP XNUM_KFFXP
  • ------------ ----------
  • 268 17

23
Strace and ASM 3/3
  • From vasm_alias I find the file alias for file
    268 USERS.268.612033477
  • From vdatafile view I find the RDBMS file 9
  • From dba extents finally find the owner and
    segment name relative to the original IO
    operation
  • sys_at_TEST1gtselect owner,segment_name,segment_type
    from dba_extents where FILE_ID9 and
    271710241024 between block_id and
    block_idblocks
  • OWNER SEGMENT_NAME SEGMENT_TYPE
  • ----- ------------ ------------
  • SCOTT EMP TABLE

24
Investigation of Fine Striping
  • An application finding the layout of
    fine-striped files
  • Explored using strace of an oracle session
    executing alter system dump logfile ..
  • Result round robin distribution over 8 x 1MB
    extents

25
Metadata Files
  • ASM diskgroups contain hidden files
  • Not listed in VASM_FILE (file lt256)
  • Details are available in XKFFIL
  • In addition the first 2 AUs of each disk are
    marked as file0 in XKFDAT
  • Example (10g)
  • GROUP FILE FILESIZE_AFTER_MIRR
    RAW_FILE_SIZE
  • ---------- ---------- -------------------
    -------------
  • 1 1 2097152
    6291456
  • 1 2 1048576
    3145728
  • 1 3 264241152
    795869184
  • 1 4 1392640
    6291456
  • 1 5 1048576
    3145728
  • 1 6 1048576
    3145728

26
ASM Metadata 1/2
  • File0, AU0 disk header (disk name, etc),
    Allocation Table (AT) and Free Space Table (FST)
  • File0, AU1 Partner Status Table (PST)
  • File1 File Directory (files and their extent
    pointers)
  • File2 Disk Directory
  • File3 Active Change Directory (ACD)
  • The ACD is analogous to a redo log, where changes
    to the metadata are logged.
  • Size42MB number of instances
  • Source Oracle Automatic Storage Management,
    Oracle Press Nov 2007, N. Vengurlekar, M.
    Vallath, R.Long

27
ASM Metadata 2/2
  • File4 Continuing Operation Directory (COD).
  • The COD is analogous to an undo tablespace. It
    maintains the state of active ASM operations such
    as disk or datafile drop/add. The COD log record
    is either committed or rolled back based on the
    success of the operation.
  • File5 Template directory
  • File6 Alias directory
  • 11g, File9 Attribute Directory
  • 11g, File12 Staleness registry, created when
    needed to track offline disks

28
ASM Rebalancing
  • Rebalancing is performed (and mandatory) after
    space management operations
  • Goal balanced space allocation across disks
  • Not based on performance or utilization
  • ASM spreads every file across all disks in a
    diskgroup
  • ASM instances are in charge of rebalancing
  • Extent pointers changes are communicated to the
    RDBMS
  • RDBMS ASMB process keeps an open connection to
    ASM
  • This can be observed by running strace against
    ASMB
  • In RAC, extra messages are passed between the
    cluster ASM instances
  • LMD0 of the ASM instances are very active during
    rebalance

29
ASM Rebalancing and VLDB
  • Performance of Rebalancing is important for VLDB
  • An ASM instance can use parallel slaves
  • RBAL coordinates the rebalancing operations
  • ARBx processes pick up chunks of work. By
    default they log their activity in udump
  • Does it scale?
  • In 10g serialization wait events can limit
    scalability
  • Even at maximum speed rebalancing is not always
    I/O bound

30
ASM Rebalancing Performance
  • Tracing ASM rebalancing operations
  • 10046 trace of the arbx processes
  • Oradebug setospid
  • oradebug event 10046 trace name context forever,
    level 12
  • Process log files (in bdump) with orasrp (tkprof
    will not work)
  • Main wait events from my tests with RAC (6 nodes)
  • DFS lock handle
  • Waiting for CI level 5 (cross instance lock)
  • Buffer busy wait
  • unaccounted for
  • enq AD - allocate AU
  • enq AD - deallocate AU
  • log write(even)
  • log write(odd)

31
ASM Single Instance Rebalancing
  • Single instance rebalance
  • Faster in RAC if you can rebalance with only 1
    node up (I have observed 20 to 100 speed
    improvement)
  • Buffer busy wait can be the main event
  • It seems to depend on the number of files in the
    diskgroup.
  • Diskgroups with a small number of (large) files
    have more contention (arbx processes operate
    concurrently on the same file)
  • Only seen in tests with 10g
  • 11g has improvements regarding rebalancing
    contention

32
Rebalancing, an Example
Data D.Wojcik, CERN IT
33
Rebalancing Workload
  • When ASM mirroring is used (e.g. with normal
    redundancy)
  • Rebalancing operations can move more data than
    expected
  • Example
  • 5 TB (allocated) 100 disks, 200 GB each
  • A disk is replaced (diskgroup rebalance)
  • The total IO workload is 1.6 TB (8x the disk
    size!)
  • How to see this query vasm_operation, the
    column EST_WORK keeps growing during rebalance
  • The issue excessive repartnering

34
ASM Disk Partners
  • ASM diskgroup with normal redundancy
  • Two copies of each extents are written to
    different failgroups
  • Two ASM disks are partners
  • When they have at least one extent set in common
    (they are the 2 sides of a mirror for some data)
  • Each ASM disk has a limited number of partners
  • Typically 10 disk partners XKFDPARTNER
  • Helps to reduce the risk associated with 2
    simultaneous disk failures

35
Free and Usable Space
  • When ASM mirroring is used not all the free
    space should be occupied
  • VASM_DISKGROUP.USABLE_FILE_MB
  • Amount of free space that can be safely utilized
    taking mirroring into account, and yet be able to
    restore redundancy after a disk failure
  • its calculated for the case of the worst
    scenario, anyway it is a best practice not to
    have it go negative (it can)
  • This can be a problem when deploying a small
    number of large LUNs and/or failgroups

36
Fast Mirror Resync
  • ASM 10g with normal redundancy does not allow to
    offline part of the storage
  • A transient error in a storage array can cause
    several hours of rebalancing to drop and add
    disks
  • It is a limiting factor for scheduled
    maintenances
  • 11g has new feature fast mirror resync
  • Redundant storage can be put offline for
    maintenance
  • Changes are accumulated in the staleness registry
    (file12)
  • Changes are applied when the storage is back
    online

37
Read Performance, Random I/O
  • IOPS measured with SQL (synthetic test)

130 IOPS per disk Destroking, only the external
part of the disks is used
38
Read Performance, Sequential I/O
Limited by HBAs -gt 4 x 2 Gb (measured with
parallel query)
39
Implementation Details
  • Multipathing
  • Linux Device Mapper (2.6 kernel)
  • Block devices
  • RHEL4 and 10gR2 allow to skip raw devices mapping
  • External half of the disk for data disk groups
  • JBOD config
  • No HW RAID
  • ASM used to mirror across disk arrays
  • HW
  • Storage arrays (Infortrend) FC controller, SATA
    disks
  • FC (Qlogic) 4Gb switch and HBAs (2Gb in older
    HW)
  • Servers are 2x CPUs, 4GB RAM, 10.2.0.3 on RHEL4,
    RAC of 4 to 8 nodes

40
Conclusions
  • CERN deploys RAC and ASM on Linux on commodity HW
  • 2.5 years of production, 110 Oracle 10g RAC nodes
    and 300TB of raw disk space (Dec 2007)
  • ASM metadata
  • Most critical part, especially rebalancing
  • Knowledge of some ASM internals helps
    troubleshooting
  • ASM on VLDB
  • Know and work around pitfalls in 10g
  • 11g has important manageability and performance
    improvements

41
QA
  • QA
  • Links
  • http//cern.ch/phydb
  • http//twiki.cern.ch/twiki/bin/view/PSSGroup/ASM_I
    nternals
  • http//www.cern.ch/canali
Write a Comment
User Comments (0)
About PowerShow.com