The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull

Description:

Crash Recovery. issue is consistency of ... zero recovery time in case of disk failure, just use copy ... RAID 3 - Byte level striping, parity on check disk ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 22
Provided by: webCe
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull


1
The HP AutoRAID Hierarchical Storage SystemJohn
Wilkes, Richard Golding, Carl Staelin, and Tim
Sullivan
  • virtualized disk gets smart

2
HP AutoRAID 2
  • File System Recap
  • OS manages storage of files on storage media
    using a File System
  • storage media
  • comprised of an array of data units, called
    sectors
  • File System
  • organizes sectors into addressable storage units
  • establishes directory structure for accessing
    files
  • FFS and LFS both developed as improvements over
    previous FSes
  • improved performance by optimizing access
  • FFS
  • increased block size to reduce of block
    addresses managed in directory
  • logically grouped cylinders to help ensure
    locality for blocks of a file
  • LFS
  • eliminated seek times by always writing at end
    of the log
  • introduced new addressable structure called
    extents
  • an extent is a large contiguous set of blocks

3
HP AutoRAID 3
  • Crash Recovery
  • issue is consistency of directory data after a
    crash or power failure
  • directory information typically written after
    the file data is written
  • FFS
  • after a crash you have no way of knowing what
    you were last doing
  • requires a consistency check
  • all inode information must be verified against
    data it maps to
  • inconsistencies cannot always be repaired, data
    can be lost
  • LFS
  • drastically reduces time to recover because of
    checkpointing
  • checkpoint noted recent time when files and
    inode map were consistent
  • verify by rolling forward through log from last
    checkpoint
  • LFS keeps lots of other metadata information and
    stores some of it with the file
  • increased odds of restoring consistency
  • But neither can recover from a hardware failure.

4
HP AutoRAID 4
  • RAID ! (round about the 1980s)
  • Redundant Array of Inexpensive (or Independent)
    Disks
  • connect multiple cheap disks into an ARRAY of
    disks, spread data across them!
  • a single disk has less reliability than an array
    of smaller drives with redundancy
  • Virtualization !
  • multiple disks but the File System sees only one
    virtual unit (doesnt know its virtual!)
  • requires an ARRAY CONTROLLER, a combination of
    hardware and software
  • handles mapping between where the FS thinks data
    is and where it actually is
  • Redundancy!
  • partial, like parity
  • full, like an extra copy
  • if a single drive in the array is lost, its data
    can be automatically regenerated
  • no longer have to worry too much about drives
    failing!

5
HP AutoRAID 5
  • RAID Levels
  • RAID 1 - Mirroring
  • full redundancy!
  • zero recovery time in case of disk failure, just
    use copy
  • storage capacity 50 of total size of array
  • writes are serialized at some level between the
    two disks
  • in case of crash or power failure, both disks
    are NOT in inconsistent state
  • this makes writes slower than just writing to
    one disk
  • a write request does not return until both
    copies have been updated
  • transfer rate same as one disk
  • parallel reads !
  • each copy can service a read request

6
HP AutoRAID 6
  • RAID Levels
  • RAID 3 - Byte level striping, parity on check
    disk
  • spread data by striping byte1 -gt disk1, byte2
    -gt disk2, byte3 -gt disk3
  • reads and writes of stripes bytes happen at the
    same time!
  • transfer rate (N - 1) transfer rate of one
    disk
  • only partial redundancy!
  • check disk stores parity information
  • parity overhead amounts to one bit per group of
    corresponding bits in a stripe
  • redundancy overhead 1 / N
  • Oops! Byte striping means every disk involved in
    every request!
  • No parallel reads nor writes

7
HP AutoRAID 7
  • Parity
  • parity is computed using XOR ( )

8
HP AutoRAID 8
  • RAID Levels
  • RAID 5 - Block level striping, parity
    interleaved
  • striping unit is 1 block block1 -gt disk1,
    block2 -gt block2, block3 -gt block3, etc.
  • blocks of stripes written at same time!
  • transfer rate (N - 1) transfer rate of one
    disk
  • only partial redundancy!
  • parity information dispersed round-robin among
    all disks
  • same redundancy overhead as level 3, 1 / N
  • Hey! Block striping can mean that every disk is
    NOT involved in a (small) request
  • parallel reads and writes can occur, depends on
    which disks store involved blocks
  • BUT writes get slower!
  • this happened in RAID 3 too
  • read - modify - write
  • read parity
  • recompute/modify parity
  • write data and parity

9
HP AutoRAID 9
  • RAID 1 vs RAID 5
  • Reads
  • RAID 1 (mirroring)
  • always offers parallel reads
  • RAID 5
  • can only sometimes offer parallel reads
  • depends on where the needed blocks are
  • two read requests that require blocks on the
    same disk must be serialized
  • Writes
  • RAID 1
  • (mirroring) must complete two writes before
    request returns
  • granularity of serialization can be smaller than
    a file
  • cant do parallel writes
  • RAID 5
  • typically does read-modify-write to recompute
    parity
  • (HP AutoRAID uses combo of read-modify-write and
    LFS !)
  • cant do parallel writes either

10
HP AutoRAID 10
  • Storage Hierarchy HP AutoRAID
  • RAID 1 fast reads and writes, but 50
    redundancy overhead
  • RAID 5 strong reads, slow writes, 1/N storage
    overhead
  • RAID 1 is fast but expensive, like a cache!
  • RAID 5 is slower but cheaper, like main memory!
  • Neither is optimum under all circumstances
  • SO create a hierarchy
  • use mirroring for active blocks
  • active set blocks of regularly read and
    written files
  • use RAID 5 for inactive blocks
  • inactive set blocks of read-only and rarely
    accessed files
  • Sounds hard!
  • Who pushes the data back and forth between the
    sets?
  • How often do you have to do it?
  • if the sets change too often, no time for
    anything else!

11
HP AutoRAID 11
  • Who Minds the Storage Hierarchy?
  • The System Administrator?
  • as long as you dont have to pay them much
  • and if they get it right all the time and dont
    make any mistakes
  • The File System?
  • if so, big plus File System knows better than
    anything who is using which files
  • can best determine active and inactive sets
    based on tracking access patterns
  • BUT, there are a lot of different OSes with
    different File System options
  • that makes deployment hard
  • each File System must be modified in order to
    manage a storage hierarchy
  • An Array Controller?
  • embed the software to manage the hierarchy in
    the hardware of a controller
  • no deployment issues, just add the hardware to
    the system
  • overrules the existing File System
  • lose the ability to track access patterns
  • need a reliable and often correct policy for
    determining active/inactive sets
  • sounds like virtualization

12
HP AutoRAID 12
  • HP AutoRAID (local hard drive gets smart!)
  • array controllers embedded software manages
    active/inactive sets
  • application level user interface for
    configuration parameters
  • set up LUNs (virtual logical units)
  • virtualization
  • File System is out of the loop!
  • Consider Mapping
  • File System things it is addressing the blocks
    of a particular file
  • doesnt know the file is actually in a storage
    hierarchy
  • is the requested file in the active set?
  • Or inactive set?
  • which disk is it on?
  • need some set mapping between what the file
    system sees and where data actually resides on
    disk

13
HP AutoRAID 13
  • Virtual to Physical Mapping
  • Physically
  • the array is structured by an address hierarchy
  • PEGs contain 3 or more PEXs
  • PEXs address 1MB worth 128KB segments
  • a segment holds 2 Relocation Blocks
  • PEXs are typically 1MB of contiguous disk space
  • Segments are 128KB of contiguous sectors
  • Relocation Blocks serve as the
  • striping unit in RAID 5, the mirroring unit in
    RAID 1,
  • and as the unit of migration between active and
    inactive sets
  • Virtually, the File System sees
  • LUNs Logical Units
  • purely virtual, no superblock, no directory, not
    a partition
  • rather is a set of RBs that get mapped to
    physical segments when actually used
  • user can create as many LUNs as they want
  • Each LUN has a virtual device table that holds
    the list of RBs assigned to it

14
HP AutoRAID 14
  • Mapping
  • if RB3 migrates from inactive to active, simply
    update the PEX mapping in the PEG table that maps
    RB3

15
HP AutoRAID 15
  • How cool is that
  • What you can do when youre not in control
    anymore..
  • Hot-pluggable disks
  • take one out and RAID immediately begins
    regenerating missing data
  • or, if one fails, activate a spare, if available
  • array still functions, no down time
  • requests for missing data are given top priority
    for regeneration
  • Create a larger array on the fly
  • size of array is limited to the size of the
    smallest disk
  • so take a small disk out and put a larger disk
    in
  • systematically replace all disks, one by one,
    letting each regenerate
  • when last bigger disk goes in, array is
    automatically larger

16
HP AutoRAID 16
  • HP AutoRAID Read and Write Operations
  • RAID 1 Mirrored Storage Class
  • normal RAID Level 1 reads and writes
  • 2 reads can happen in parallel
  • a write is serialized (at the segment level)
    between the two disks
  • both updates must complete before request
    returns (remember the overhead!)
  • RAID 5 Storage Class
  • reads are processed as normal RAID 5 read
    operations
  • reads are parallel if possible
  • writes are log structured
  • when they happen is more complicated
  • RAID 5 Writes happen for 1 of 3 reasons
  • a File System request tries to write data at
    RAID 5
  • results in promotion of requested data to active
    set
  • (no actual write happens at RAID 5 in this case)
  • Mirrored storage class runs out of space
  • so data is demoted from active to inactive, RBs
    copied from active to inactive

17
HP AutoRAID 17
  • Holes, Cleaning, and Garbage Collection
  • Holes come from
  • demotion of RBs from active to inactive leaves
    holes in PEXs of mirrored class
  • holes are managed as a free list
  • promotion of RBs from inactive to active leaves
    holes in PEXs of RAID 5
  • by the way, RAID 5 in HP AutoRAID uses LFS
  • so holes must be garbage collected
  • Cleaning
  • plug the holes
  • RBs are migrated between PEGs to fill some,
    empty others
  • cleaning mirrored class frees up PEGs to
    accommodate bursts or to give to RAID 5
  • cleaning RAID 5 is an alternative to garbage
    collection
  • Garbage Collection
  • normal LFS garbage collection
  • or can be hole plugging garbage collection to
    fill/free PEGs
  • this performs much better, reduces garbage
    collection by up to 90!

18
HP AutoRAID 18
  • Performance
  • depends most on how much of the active set fits
    into the mirrored class
  • if it all fits, then RAID 5 goes unused.
    Performance is that of a RAID I array
  • tested OLTP against weaker RAID and JBOD
  • JBOD just a bunch of disks, striped, no
    redundancy (so performs the best!)
  • tested with all of active set fitting in
    Mirrored Storage class
  • so no migration overhead
  • AutoRAID lags due to redundancy overhead
  • tested performance for different s of active
    set at mirrored level
  • more disks higher at Mirrored Storage Class
  • obviously performance rises with higher
    because less migration
  • interesting to note at 8 drives, when all of
    active set fits
  • performance rises because transfer rate is
    increasing, more disks to write to

Shows transaction rate of OLTP for slow RAID, HP
AutoRAID, and for JBOD
Shows transaction rate as number of disk in
AutoRAID increases
19
HP AutoRAID 19
  • Can the File System help?
  • File System sees virtual disk,
  • probably has its own ideas of how best to lay
    data to blocks to optimize access
  • perhaps by assigning RBs of a LUN to a linear
    set of contiguous blocks
  • BUT are they really going to be contiguous?
  • in the array, RBs can be mapped anywhere and
    most likely are not stored linearly
  • so does this make seek times really bad?
  • ran tests where they initially set up array
  • with all RBs laid out completely linearly
  • with all RBs laid out completely randomly
  • Resulted in only modest performance gains for
    initial linear layout
  • note there is no way to migrate data between
    sets and maintain a linear layout..
  • Conclusion
  • the 64KB RB allocation block may sound big, but
    works just fine
  • remember, large block sizes amortize seek times

20
HP AutoRAID 20
  • Mirrored Storage Class Read Selection Algorithm
  • which copy should be read?
  • possibilities
  • strict alternation
  • keep one disk head on the outer track, the other
    on the inner
  • read from the disk with the shortest queue
  • read from the disk with the shortest seek time
  • strict alternation and inner/outer can give big
    benefits under certain workloads
  • AND can really punish under other workloads
  • shortest queue and shortest seek time yield same
    modest gain
  • but it is hard to track shortest seek time
  • so shortest queue wins

21
HP AutoRAID 21
  • Conclusion
  • redundancy protects from data loss due to
    hardware failure
  • different striping units and levels of
    redundancy result in different performance
  • performance depends on type of workload
  • redundancy also introduces overhead
  • 50 for mirroring
  • reduce redundancy overhead by using a storage
    hierarchy
  • implementing different RAID levels for active
    and inactive data
  • storage hierarchy managed by an array controller
  • management software embedded onto hardware
    controller
  • special mapping virtualizes the array
  • File System sees one (or more) virtual logical
    units
Write a Comment
User Comments (0)
About PowerShow.com