ECE 6160: Advanced Computer Networks Disk Arrays - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

ECE 6160: Advanced Computer Networks Disk Arrays

Description:

Platter. Arm. Access time = seek time rotational delay transfer time overhead. seek time = 5-15 milliseconds to move the disk arm and settle on a cylinder ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 39
Provided by: ben57
Category:

less

Transcript and Presenter's Notes

Title: ECE 6160: Advanced Computer Networks Disk Arrays


1
ECE 6160 Advanced Computer NetworksDisk Arrays
  • Instructor Dr. Xubin (Ben) He
  • Email Hexb_at_tntech.edu
  • Tel 931-372-3462
  • Course web http//www.ece.tntech.edu/hexb/616f05

2
Prev
  • Disks
  • Tapes

3
Rotational Media
Sector
Track
Arm
Cylinder
Platter
Head
Access time seek time rotational delay
transfer timeoverhead seek time 5-15
milliseconds to move the disk arm and settle on a
cylinder rotational delay 8 milliseconds for
full rotation at 7200 RPM average delay 4
ms transfer time 1 millisecond for an 8KB block
at 8 MB/s
4
Disk Operations
  • Seek move head to track
  • Rotation wait for sector under head
  • TransferMove data to/from disks
  • Overhead
  • Controller delay
  • Queuing delay

Access time seek time rotational delay
transfer timeoverhead
5
Improving disk performance.
  • Use large sectors to improve bandwidth
  • Use track caches and read ahead
  • Read entire track into on-controller cache
  • Exploit locality (improves both latency and BW)
  • Design file systems to maximize locality
  • Allocate files sequentially on disks (exploit
    track cache)
  • Locate similar files in same cylinder (reduce
    seeks)
  • Locate simlar files in near-by cylinders (reduce
    seek distance)
  • Pack bits closer together to improve transfer
    rate and density.
  • Use a collection of small disks to form a large,
    high performance one---gtdisk array
  • Stripping data across multiple disks to allow
    parallel I/O, thus improving performance.

6
Use Arrays of Small Disks?
  • Katz and Patterson asked in 1987
  • Can smaller disks be used to close gap in
    performance between disks and CPUs?

Conventional 4 disk designs
10
5.25
3.5
14
High End
Low End
Disk Array 1 disk design
3.5
7
Replace Small Number of Large Disks with Large
Number of Small Disks! (1988 Disks)
IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15
MB/s 600 I/Os/s 250 KHrs 250K
x70 23 GBytes 11 cu. ft. 1 KW 110 MB/s 3900
IOs/s ??? Hrs 150K
IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5
MB/s 55 I/Os/s 50 KHrs 2K
Capacity Volume Power Data Rate I/O Rate
MTTF Cost
9X
3X
8X
6X
Disk Arrays have potential for large data and I/O
rates, high MB per cu. ft., high MB per KW, but
what about reliability?
8
Array Reliability
  • MTTF Mean Time To Failure average time that a
    non - repairable component will operate before
    experiencing failure.
  • Reliability of N disks Reliability of 1 Disk N
  • 50,000 Hours 70 disks 700 hours
  • Disk system MTTF Drops from 6 years to 1
    month!
  • Arrays without redundancy too unreliable to be
    useful!
  • Solution redundancy.

9
Redundant Arrays of (Inexpensive) Disks
  • Replicate data over several disks so that no data
    will be lost if one disk fails.
  • Redundancy yields high data availability
  • Availability service still provided to user,
    even if some components failed
  • Disks will still fail
  • Contents reconstructed from data redundantly
    stored in the array
  • ? Capacity penalty to store redundant info
  • ? Bandwidth penalty to update redundant info

10
(No Transcript)
11
Levels of RAID
  • Original RAID paper described five categories
    (RAID levels 1-5). (Patterson et al, A case for
    redundant arrays of inexpensive disks (RAID),
    ACM SIGMOD, 1988)
  • Disk striping with no redundant now is called
    RAID0 or JBOD(Just a bunch of disks).
  • Other kinds have been proposed in literature,
  • Level 6 (PQ Redundancy), Level 10, RAID53,
    etc.
  • Except RAID0, all the RAID levels trade disk
    capacity for reliability, and the extra
    reliability makes parallism a practical way to
    improve performance.

12
RAID 0 Nonredundant (JBOD)
  • High I/O performance.
  • Data is not save redundantly.
  • Single copy of data is striped across multiple
    disks.
  • Low cost.
  • Lack of redundancy.
  • Least reliable single disk failure leads to data
    loss.

13
Redundant Arrays of Inexpensive DisksRAID 1
Disk Mirroring/Shadowing
recovery group
 Each disk is fully duplicated onto its
mirror Very high availability can be
achieved Bandwidth sacrifice on write
Logical write two physical writes Reads may
be optimized, minimize the queue and disk search
time Most expensive solution 100 capacity
overhead
Targeted for high I/O rate , high availability
environments
14
RAID 2 Memory-Style ECC
Data Disks
Multiple ECC Disks and a Parity Disk
  • Multiple disks record the ECC information to
    determine which disk is in fault
  • A parity disk is then used to reconstruct
    corrupted or lost data
  • Needs log2(number of disks) redundancy disks

15
RAID 3 Bit (Byte) Interleaved Parity
  • Only need one parity disk
  • Write/Read accesses all disks
  • Only one request can be serviced at a time
  • Easy to implement
  • Provides high bandwidth but not high I/O rates

Targeted for high bandwidth applications
Multimedia, Image Processing
16
RAID 3
  • Sum computed across recovery group to protect
    against hard disk failures, stored in P disk
  • Logically, a single high capacity, high transfer
    rate disk good for large transfers
  • Wider arrays reduce capacity costs, but decreases
    availability
  • 12.5 capacity cost for parity in this
    configuration

Inspiration for RAID 4
  • RAID 3 relies on parity disk to discover errors
    on Read
  • But every sector has an error detection field
  • Rely on error detection field to catch errors on
    read, not on the parity disk
  • Allows independent reads to different disks
    simultaneously

17
RAID 4 Block Interleaved Parity
  • Blocks striping units
  • Allow for parallel access by multiple I/O
    requests, high I/O rates
  • Doing multiple small reads is now faster than
    before. (allows small read requests to be
    restricted to a single disk).
  • Large writes(full stripe), update the parity
  • P d0 d1 d2 d3
  • Small writes(eg. write on d0), update the
    parity
  • P d0 d1 d2 d3
  • P d0 d1 d2 d3 P d0 d0
  • However, writes are still very slow since the
    parity
  • disk is the bottleneck.

18
Problems of Disk Arrays Small Writes
(read-modify-write procedure)
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)
XOR


XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
19
Inspiration for RAID 5
  • RAID 4 works well for small reads
  • Small writes (write to one disk)
  • Option 1 read other data disks, create new sum
    and write to Parity Disk
  • Option 2 since P has old sum, compare old data
    to new data, add the difference to P
  • Small writes are limited by Parity Disk Write to
    D0, D5 both also write to P disk. Parity disk
    must be updated for every write operation!

20
Redundant Arrays of Inexpensive Disks RAID 5
High I/O Rate Interleaved Parity
Increasing Logical Disk Addresses
D0
D1
D2
D3
P
Independent writes possible because
of interleaved parity
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Example write to D0, D5 uses disks 0, 1, 3, 4
P
D16
D17
D18
D19
D20
D21
D22
D23
P
. . .
. . .
. . .
. . .
. . .
Disk Columns
21
RAID 5 Block Interleaved Distributed-Parity
Left Symmetric Distribution
  • Parity disk (block number/4) mod 5
  • Eliminate the parity disk bottleneck of RAID 4
  • Best small read, large read and large write
    performance
  • Can correct any single self-identifying failure
  • Small logical writes take two physical reads and
    two
  • physical writes.
  • Recovering needs reading all nonfailed disks

22
RAID 6 P Q Redundancy
  • An extension to RAID 5 but with two-dimensional
    parity.
  • Each row has P parity and each row has Q parity.
  • (Reed-Solomon Codes)
  • Has an extremely high data fault tolerance and
  • can sustain multiple simultaneous drive
    failures
  • Rarely implemented

More information, please see the paper A
tutorial on Reed-Solomon Coding for Fault
Tolerance in RAID-like Systems
23
Comparison of RAID Levels (N disks, each with
capacity of C)
24
Implementation Consideration
  • Avoiding Stale Data
  • Regenerating Parity after a System Crash
  • Operating with a Failed Disks
  • Orthogonal RAID
  • Stripping Unit Size
  • Other RAID Improvement Techniques

25
Avoiding Stale Data
  • Maintain a bit-vector to indicate the validity of
    each logical sector.
  • Avoid Reading Stale Data
  • When a disk fails, the corresponding logical
    sectors must be marked invalid before any read
    access when a disk fails.
  • Avoid Creating Stale Data
  • When the invalid sector has been reconstructed
    to a spare disks, the corresponding logical
    sectors must be marked valid before any write
    access.

26
Regenerating Parity after a System Crash
  • Hardware RAID system
  • Before servicing any write request, the
    corresponding parity sectors must be mark
    inconsistent.
  • When bringing a system up from a system crash,
    all inconsistent parity sectors must be
    regenerated.
  • Periodically mark partial sectors as consistent
    to avoid having to regenerate a large number of
    parity sectors after each crash.
  • Software RAID system
  • A simple solution
  • Mark the corresponding parity sectors as
    inconsistent before each write operation, and
    mark them consistent after the write operation.
  • A more practical solution
  • Maintain a most recently used pool that keeps
    track of a fixed number of inconsistent parity
    sectors on stable storage.

27
Operating with a Failed Disk
  • A disk array operating with a failed disk can
    potentially lose data in the even of a system
    crash. Therefore, we need to perform some form of
    logging on every write operation to prevent the
    loss. Two elegant methods
  • Demand reconstruction
  • Require stand-by disks
  • Any write access to a parity stripe with an
    invalid sector triggers reconstruction of the
    appropriate data immediately onto the spare
    disks.
  • Parity sparing
  • Does not need stand-by disks but require
    additional metadata
  • Use spares to make smaller disk arrays
  • Smaller arrays means higher reliability, faster
    reconstruction.
  • On a disk failure, merge smaller arrays into a
    larger one
  • For more information, please see the paper
  • Failure Evaluation of Disk Array Organization

28
Orthogonal RAID
Option 1
Option 2
Error Correction Group Option
29
Stripping Unit in RAID 5
  • S optimal stripping unit
  • N the number of disks
  • S increases as N increases for read-intensive
    workloads
  • S decreases as N increases for write-intensive
    workloads
  • S is independent of N for unspecified mix of
    reads and writes
  • Recommended strip size
  • S ½average disk positioning timedisk
    transfer rate

For more information, see the paper
P.M. Chen, Stripping in a Raid Level 5 Disk
Array, ACM 1995
30
Other RAID Improvement Techniques
  • Improving small write performance of RAID 5
  • Buffering and caching
  • Floating Parity
  • Shorten the read-modify-write of parity updating
    to nearly a single disk access time on average
  • Basic Idea the new parity block can be written
    on the rotationally nearest unallocated block
    following the old parity block.
  • Declustered Parity
  • Distributing the increased load caused by disk
    failures uniformly over all disks
  • Basic Idea construct a multiple RAID system with
    overlapping parity groups.

31
Other RAIDs
  • HP AutoRAID
  • AFRAID
  • RAPID
  • SwiftRAID
  • TickerTAIP
  • SMDA

32
Berkeley History RAID-I
  • RAID-I (1989)
  • Consisted of a Sun 4/280 workstation with 128 MB
    of DRAM, four dual-string SCSI controllers, 28
    5.25-inch SCSI disks and specialized disk
    striping software
  • Today RAID is 19 billion dollar industry, 80
    nonPC disks sold in RAIDs

33
RAID Techniques Goal was performance, popularity
due to reliability of storage
1 0 0 1 0 0 1 1
1 0 0 1 0 0 1 1
Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow"
Logical write two physical writes 100
capacity overhead
1 0 0 1 0 0 1 1
0 0 1 1 0 0 1 0
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally Logically a single
high data bw disk
High I/O Rate Parity Array (RAID 5)
Interleaved parity blocks Independent reads and
writes Logical write 2 reads 2 writes
34
RAID 0 Striped Disk Array without Fault
Tolerance
RAID Level 0 requires a minimum of 2 drives to
implement
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com