The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull

Description:

Crash Recovery. issue is consistency of ... zero recovery time in case of disk failure, just use copy ... RAID 3 - Byte level striping, parity on check disk ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 22

Provided by: webCe

Learn more at: http://web.cecs.pdx.edu

Category:

more less

Transcript and Presenter's Notes

Title: The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sull

1
The HP AutoRAID Hierarchical Storage SystemJohn
Wilkes, Richard Golding, Carl Staelin, and Tim
Sullivan

virtualized disk gets smart

2
HP AutoRAID 2

File System Recap
OS manages storage of files on storage media
using a File System
storage media
comprised of an array of data units, called
sectors
File System
organizes sectors into addressable storage units
establishes directory structure for accessing
files
FFS and LFS both developed as improvements over
previous FSes
improved performance by optimizing access
FFS
increased block size to reduce of block
addresses managed in directory
logically grouped cylinders to help ensure
locality for blocks of a file
LFS
eliminated seek times by always writing at end
of the log
introduced new addressable structure called
extents
an extent is a large contiguous set of blocks

3
HP AutoRAID 3

Crash Recovery
issue is consistency of directory data after a
crash or power failure
directory information typically written after
the file data is written
FFS
after a crash you have no way of knowing what
you were last doing
requires a consistency check
all inode information must be verified against
data it maps to
inconsistencies cannot always be repaired, data
can be lost
LFS
drastically reduces time to recover because of
checkpointing
checkpoint noted recent time when files and
inode map were consistent
verify by rolling forward through log from last
checkpoint
LFS keeps lots of other metadata information and
stores some of it with the file
increased odds of restoring consistency
But neither can recover from a hardware failure.

4
HP AutoRAID 4

RAID ! (round about the 1980s)
Redundant Array of Inexpensive (or Independent)
Disks
connect multiple cheap disks into an ARRAY of
disks, spread data across them!
a single disk has less reliability than an array
of smaller drives with redundancy
Virtualization !
multiple disks but the File System sees only one
virtual unit (doesnt know its virtual!)
requires an ARRAY CONTROLLER, a combination of
hardware and software
handles mapping between where the FS thinks data
is and where it actually is
Redundancy!
partial, like parity
full, like an extra copy
if a single drive in the array is lost, its data
can be automatically regenerated
no longer have to worry too much about drives
failing!

5
HP AutoRAID 5

RAID Levels
RAID 1 - Mirroring
full redundancy!
zero recovery time in case of disk failure, just
use copy
storage capacity 50 of total size of array
writes are serialized at some level between the
two disks
in case of crash or power failure, both disks
are NOT in inconsistent state
this makes writes slower than just writing to
one disk
a write request does not return until both
copies have been updated
transfer rate same as one disk
parallel reads !
each copy can service a read request

6
HP AutoRAID 6

RAID Levels
RAID 3 - Byte level striping, parity on check
disk
spread data by striping byte1 -gt disk1, byte2
-gt disk2, byte3 -gt disk3
reads and writes of stripes bytes happen at the
same time!
transfer rate (N - 1) transfer rate of one
disk
only partial redundancy!
check disk stores parity information
parity overhead amounts to one bit per group of
corresponding bits in a stripe
redundancy overhead 1 / N
Oops! Byte striping means every disk involved in
every request!
No parallel reads nor writes

7
HP AutoRAID 7

Parity
parity is computed using XOR ( )

8
HP AutoRAID 8

RAID Levels
RAID 5 - Block level striping, parity
interleaved
striping unit is 1 block block1 -gt disk1,
block2 -gt block2, block3 -gt block3, etc.
blocks of stripes written at same time!
transfer rate (N - 1) transfer rate of one
disk
only partial redundancy!
parity information dispersed round-robin among
all disks
same redundancy overhead as level 3, 1 / N
Hey! Block striping can mean that every disk is
NOT involved in a (small) request
parallel reads and writes can occur, depends on
which disks store involved blocks

BUT writes get slower!
this happened in RAID 3 too
read - modify - write
read parity
recompute/modify parity
write data and parity

9
HP AutoRAID 9

RAID 1 vs RAID 5
Reads
RAID 1 (mirroring)
always offers parallel reads
RAID 5
can only sometimes offer parallel reads
depends on where the needed blocks are
two read requests that require blocks on the
same disk must be serialized
Writes
RAID 1
(mirroring) must complete two writes before
request returns
granularity of serialization can be smaller than
a file
cant do parallel writes
RAID 5
typically does read-modify-write to recompute
parity
(HP AutoRAID uses combo of read-modify-write and
LFS !)
cant do parallel writes either

10
HP AutoRAID 10

Storage Hierarchy HP AutoRAID
RAID 1 fast reads and writes, but 50
redundancy overhead
RAID 5 strong reads, slow writes, 1/N storage
overhead
RAID 1 is fast but expensive, like a cache!
RAID 5 is slower but cheaper, like main memory!
Neither is optimum under all circumstances
SO create a hierarchy
use mirroring for active blocks
active set blocks of regularly read and
written files
use RAID 5 for inactive blocks
inactive set blocks of read-only and rarely
accessed files
Sounds hard!
Who pushes the data back and forth between the
sets?
How often do you have to do it?
if the sets change too often, no time for
anything else!

11
HP AutoRAID 11

Who Minds the Storage Hierarchy?
The System Administrator?
as long as you dont have to pay them much
and if they get it right all the time and dont
make any mistakes
The File System?
if so, big plus File System knows better than
anything who is using which files
can best determine active and inactive sets
based on tracking access patterns
BUT, there are a lot of different OSes with
different File System options
that makes deployment hard
each File System must be modified in order to
manage a storage hierarchy
An Array Controller?
embed the software to manage the hierarchy in
the hardware of a controller
no deployment issues, just add the hardware to
the system
overrules the existing File System
lose the ability to track access patterns
need a reliable and often correct policy for
determining active/inactive sets
sounds like virtualization

12
HP AutoRAID 12

HP AutoRAID (local hard drive gets smart!)
array controllers embedded software manages
active/inactive sets
application level user interface for
configuration parameters
set up LUNs (virtual logical units)
virtualization
File System is out of the loop!
Consider Mapping
File System things it is addressing the blocks
of a particular file
doesnt know the file is actually in a storage
hierarchy
is the requested file in the active set?
Or inactive set?
which disk is it on?
need some set mapping between what the file
system sees and where data actually resides on
disk

13
HP AutoRAID 13

Virtual to Physical Mapping
Physically
the array is structured by an address hierarchy
PEGs contain 3 or more PEXs
PEXs address 1MB worth 128KB segments
a segment holds 2 Relocation Blocks
PEXs are typically 1MB of contiguous disk space
Segments are 128KB of contiguous sectors
Relocation Blocks serve as the
striping unit in RAID 5, the mirroring unit in
RAID 1,
and as the unit of migration between active and
inactive sets
Virtually, the File System sees
LUNs Logical Units
purely virtual, no superblock, no directory, not
a partition
rather is a set of RBs that get mapped to
physical segments when actually used
user can create as many LUNs as they want
Each LUN has a virtual device table that holds
the list of RBs assigned to it

14
HP AutoRAID 14

Mapping
if RB3 migrates from inactive to active, simply
update the PEX mapping in the PEG table that maps
RB3

15
HP AutoRAID 15

How cool is that
What you can do when youre not in control
anymore..
Hot-pluggable disks
take one out and RAID immediately begins
regenerating missing data
or, if one fails, activate a spare, if available
array still functions, no down time
requests for missing data are given top priority
for regeneration
Create a larger array on the fly
size of array is limited to the size of the
smallest disk
so take a small disk out and put a larger disk
in
systematically replace all disks, one by one,
letting each regenerate
when last bigger disk goes in, array is
automatically larger

16
HP AutoRAID 16

HP AutoRAID Read and Write Operations
RAID 1 Mirrored Storage Class
normal RAID Level 1 reads and writes
2 reads can happen in parallel
a write is serialized (at the segment level)
between the two disks
both updates must complete before request
returns (remember the overhead!)
RAID 5 Storage Class
reads are processed as normal RAID 5 read
operations
reads are parallel if possible
writes are log structured
when they happen is more complicated
RAID 5 Writes happen for 1 of 3 reasons
a File System request tries to write data at
RAID 5
results in promotion of requested data to active
set
(no actual write happens at RAID 5 in this case)
Mirrored storage class runs out of space
so data is demoted from active to inactive, RBs
copied from active to inactive

17
HP AutoRAID 17

Holes, Cleaning, and Garbage Collection
Holes come from
demotion of RBs from active to inactive leaves
holes in PEXs of mirrored class
holes are managed as a free list
promotion of RBs from inactive to active leaves
holes in PEXs of RAID 5
by the way, RAID 5 in HP AutoRAID uses LFS
so holes must be garbage collected
Cleaning
plug the holes
RBs are migrated between PEGs to fill some,
empty others
cleaning mirrored class frees up PEGs to
accommodate bursts or to give to RAID 5
cleaning RAID 5 is an alternative to garbage
collection
Garbage Collection
normal LFS garbage collection
or can be hole plugging garbage collection to
fill/free PEGs
this performs much better, reduces garbage
collection by up to 90!

18
HP AutoRAID 18

Performance
depends most on how much of the active set fits
into the mirrored class
if it all fits, then RAID 5 goes unused.
Performance is that of a RAID I array
tested OLTP against weaker RAID and JBOD
JBOD just a bunch of disks, striped, no
redundancy (so performs the best!)
tested with all of active set fitting in
Mirrored Storage class
so no migration overhead
AutoRAID lags due to redundancy overhead
tested performance for different s of active
set at mirrored level
more disks higher at Mirrored Storage Class
obviously performance rises with higher
because less migration
interesting to note at 8 drives, when all of
active set fits
performance rises because transfer rate is
increasing, more disks to write to

Shows transaction rate of OLTP for slow RAID, HP
AutoRAID, and for JBOD
Shows transaction rate as number of disk in
AutoRAID increases
19
HP AutoRAID 19

Can the File System help?
File System sees virtual disk,
probably has its own ideas of how best to lay
data to blocks to optimize access
perhaps by assigning RBs of a LUN to a linear
set of contiguous blocks
BUT are they really going to be contiguous?
in the array, RBs can be mapped anywhere and
most likely are not stored linearly
so does this make seek times really bad?
ran tests where they initially set up array
with all RBs laid out completely linearly
with all RBs laid out completely randomly
Resulted in only modest performance gains for
initial linear layout
note there is no way to migrate data between
sets and maintain a linear layout..
Conclusion
the 64KB RB allocation block may sound big, but
works just fine
remember, large block sizes amortize seek times

20
HP AutoRAID 20

Mirrored Storage Class Read Selection Algorithm
which copy should be read?
possibilities
strict alternation
keep one disk head on the outer track, the other
on the inner
read from the disk with the shortest queue
read from the disk with the shortest seek time
strict alternation and inner/outer can give big
benefits under certain workloads
AND can really punish under other workloads
shortest queue and shortest seek time yield same
modest gain
but it is hard to track shortest seek time
so shortest queue wins

21
HP AutoRAID 21

Conclusion
redundancy protects from data loss due to
hardware failure
different striping units and levels of
redundancy result in different performance
performance depends on type of workload
redundancy also introduces overhead
50 for mirroring
reduce redundancy overhead by using a storage
hierarchy
implementing different RAID levels for active
and inactive data
storage hierarchy managed by an array controller
management software embedded onto hardware
controller
special mapping virtualizes the array
File System sees one (or more) virtual logical
units