ECE 6160: Advanced Computer Networks Disk Arrays - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

ECE 6160: Advanced Computer Networks Disk Arrays

Description:

Platter. Arm. Access time = seek time rotational delay transfer time overhead. seek time = 5-15 milliseconds to move the disk arm and settle on a cylinder ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 39

Provided by: ben57

Category:

more less

Transcript and Presenter's Notes

Title: ECE 6160: Advanced Computer Networks Disk Arrays

1
ECE 6160 Advanced Computer NetworksDisk Arrays

Instructor Dr. Xubin (Ben) He
Email Hexb_at_tntech.edu
Tel 931-372-3462
Course web http//www.ece.tntech.edu/hexb/616f05

2
Prev

Disks
Tapes

3
Rotational Media
Sector
Track
Arm
Cylinder
Platter
Head
Access time seek time rotational delay
transfer timeoverhead seek time 5-15
milliseconds to move the disk arm and settle on a
cylinder rotational delay 8 milliseconds for
full rotation at 7200 RPM average delay 4
ms transfer time 1 millisecond for an 8KB block
at 8 MB/s
4
Disk Operations

Seek move head to track
Rotation wait for sector under head
TransferMove data to/from disks
Overhead
Controller delay
Queuing delay

Access time seek time rotational delay
transfer timeoverhead
5
Improving disk performance.

Use large sectors to improve bandwidth
Use track caches and read ahead
Read entire track into on-controller cache
Exploit locality (improves both latency and BW)
Design file systems to maximize locality
Allocate files sequentially on disks (exploit
track cache)
Locate similar files in same cylinder (reduce
seeks)
Locate simlar files in near-by cylinders (reduce
seek distance)
Pack bits closer together to improve transfer
rate and density.
Use a collection of small disks to form a large,
high performance one---gtdisk array
Stripping data across multiple disks to allow
parallel I/O, thus improving performance.

6
Use Arrays of Small Disks?

Katz and Patterson asked in 1987
Can smaller disks be used to close gap in
performance between disks and CPUs?

Conventional 4 disk designs
10
5.25
3.5
14
High End
Low End
Disk Array 1 disk design
3.5
7
Replace Small Number of Large Disks with Large
Number of Small Disks! (1988 Disks)
IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15
MB/s 600 I/Os/s 250 KHrs 250K
x70 23 GBytes 11 cu. ft. 1 KW 110 MB/s 3900
IOs/s ??? Hrs 150K
IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5
MB/s 55 I/Os/s 50 KHrs 2K
Capacity Volume Power Data Rate I/O Rate
MTTF Cost
9X
3X
8X
6X
Disk Arrays have potential for large data and I/O
rates, high MB per cu. ft., high MB per KW, but
what about reliability?
8
Array Reliability

MTTF Mean Time To Failure average time that a
non - repairable component will operate before
experiencing failure.
Reliability of N disks Reliability of 1 Disk N
50,000 Hours 70 disks 700 hours
Disk system MTTF Drops from 6 years to 1
month!
Arrays without redundancy too unreliable to be
useful!
Solution redundancy.

9
Redundant Arrays of (Inexpensive) Disks

Replicate data over several disks so that no data
will be lost if one disk fails.
Redundancy yields high data availability
Availability service still provided to user,
even if some components failed
Disks will still fail
Contents reconstructed from data redundantly
stored in the array
? Capacity penalty to store redundant info
? Bandwidth penalty to update redundant info

10
(No Transcript)
11
Levels of RAID

Original RAID paper described five categories
(RAID levels 1-5). (Patterson et al, A case for
redundant arrays of inexpensive disks (RAID),
ACM SIGMOD, 1988)
Disk striping with no redundant now is called
RAID0 or JBOD(Just a bunch of disks).
Other kinds have been proposed in literature,
Level 6 (PQ Redundancy), Level 10, RAID53,
etc.
Except RAID0, all the RAID levels trade disk
capacity for reliability, and the extra
reliability makes parallism a practical way to
improve performance.

12
RAID 0 Nonredundant (JBOD)

High I/O performance.
Data is not save redundantly.
Single copy of data is striped across multiple
disks.
Low cost.
Lack of redundancy.
Least reliable single disk failure leads to data
loss.

13
Redundant Arrays of Inexpensive DisksRAID 1
Disk Mirroring/Shadowing
recovery group
Each disk is fully duplicated onto its
mirror Very high availability can be
achieved Bandwidth sacrifice on write
Logical write two physical writes Reads may
be optimized, minimize the queue and disk search
time Most expensive solution 100 capacity
overhead
Targeted for high I/O rate , high availability
environments
14
RAID 2 Memory-Style ECC
Data Disks
Multiple ECC Disks and a Parity Disk

Multiple disks record the ECC information to
determine which disk is in fault
A parity disk is then used to reconstruct
corrupted or lost data
Needs log2(number of disks) redundancy disks

15
RAID 3 Bit (Byte) Interleaved Parity

Only need one parity disk
Write/Read accesses all disks
Only one request can be serviced at a time
Easy to implement
Provides high bandwidth but not high I/O rates

Targeted for high bandwidth applications
Multimedia, Image Processing
16
RAID 3

Sum computed across recovery group to protect
against hard disk failures, stored in P disk
Logically, a single high capacity, high transfer
rate disk good for large transfers
Wider arrays reduce capacity costs, but decreases
availability
12.5 capacity cost for parity in this
configuration

Inspiration for RAID 4

RAID 3 relies on parity disk to discover errors
on Read
But every sector has an error detection field
Rely on error detection field to catch errors on
read, not on the parity disk
Allows independent reads to different disks
simultaneously

17
RAID 4 Block Interleaved Parity

Blocks striping units
Allow for parallel access by multiple I/O
requests, high I/O rates
Doing multiple small reads is now faster than
before. (allows small read requests to be
restricted to a single disk).
Large writes(full stripe), update the parity
P d0 d1 d2 d3
Small writes(eg. write on d0), update the
parity
P d0 d1 d2 d3
P d0 d1 d2 d3 P d0 d0
However, writes are still very slow since the
parity
disk is the bottleneck.

18
Problems of Disk Arrays Small Writes
(read-modify-write procedure)
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)
XOR

XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
19
Inspiration for RAID 5

RAID 4 works well for small reads
Small writes (write to one disk)
Option 1 read other data disks, create new sum
and write to Parity Disk
Option 2 since P has old sum, compare old data
to new data, add the difference to P
Small writes are limited by Parity Disk Write to
D0, D5 both also write to P disk. Parity disk
must be updated for every write operation!

20
Redundant Arrays of Inexpensive Disks RAID 5
High I/O Rate Interleaved Parity
Increasing Logical Disk Addresses
D0
D1
D2
D3
P
Independent writes possible because
of interleaved parity
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Example write to D0, D5 uses disks 0, 1, 3, 4
P
D16
D17
D18
D19
D20
D21
D22
D23
P
. . .
. . .
. . .
. . .
. . .
Disk Columns
21
RAID 5 Block Interleaved Distributed-Parity
Left Symmetric Distribution

Parity disk (block number/4) mod 5
Eliminate the parity disk bottleneck of RAID 4
Best small read, large read and large write
performance
Can correct any single self-identifying failure
Small logical writes take two physical reads and
two
physical writes.
Recovering needs reading all nonfailed disks

22
RAID 6 P Q Redundancy

An extension to RAID 5 but with two-dimensional
parity.
Each row has P parity and each row has Q parity.
(Reed-Solomon Codes)
Has an extremely high data fault tolerance and
can sustain multiple simultaneous drive
failures
Rarely implemented

More information, please see the paper A
tutorial on Reed-Solomon Coding for Fault
Tolerance in RAID-like Systems
23
Comparison of RAID Levels (N disks, each with
capacity of C)
24
Implementation Consideration

Avoiding Stale Data
Regenerating Parity after a System Crash
Operating with a Failed Disks
Orthogonal RAID
Stripping Unit Size
Other RAID Improvement Techniques

25
Avoiding Stale Data

Maintain a bit-vector to indicate the validity of
each logical sector.
Avoid Reading Stale Data
When a disk fails, the corresponding logical
sectors must be marked invalid before any read
access when a disk fails.
Avoid Creating Stale Data
When the invalid sector has been reconstructed
to a spare disks, the corresponding logical
sectors must be marked valid before any write
access.

26
Regenerating Parity after a System Crash

Hardware RAID system
Before servicing any write request, the
corresponding parity sectors must be mark
inconsistent.
When bringing a system up from a system crash,
all inconsistent parity sectors must be
regenerated.
Periodically mark partial sectors as consistent
to avoid having to regenerate a large number of
parity sectors after each crash.
Software RAID system
A simple solution
Mark the corresponding parity sectors as
inconsistent before each write operation, and
mark them consistent after the write operation.
A more practical solution
Maintain a most recently used pool that keeps
track of a fixed number of inconsistent parity
sectors on stable storage.

27
Operating with a Failed Disk

A disk array operating with a failed disk can
potentially lose data in the even of a system
crash. Therefore, we need to perform some form of
logging on every write operation to prevent the
loss. Two elegant methods
Demand reconstruction
Require stand-by disks
Any write access to a parity stripe with an
invalid sector triggers reconstruction of the
appropriate data immediately onto the spare
disks.
Parity sparing
Does not need stand-by disks but require
additional metadata
Use spares to make smaller disk arrays
Smaller arrays means higher reliability, faster
reconstruction.
On a disk failure, merge smaller arrays into a
larger one
For more information, please see the paper
Failure Evaluation of Disk Array Organization

28
Orthogonal RAID
Option 1
Option 2
Error Correction Group Option
29
Stripping Unit in RAID 5

S optimal stripping unit
N the number of disks
S increases as N increases for read-intensive
workloads
S decreases as N increases for write-intensive
workloads
S is independent of N for unspecified mix of
reads and writes
Recommended strip size
S ½average disk positioning timedisk
transfer rate

For more information, see the paper
P.M. Chen, Stripping in a Raid Level 5 Disk
Array, ACM 1995
30
Other RAID Improvement Techniques

Improving small write performance of RAID 5
Buffering and caching
Floating Parity
Shorten the read-modify-write of parity updating
to nearly a single disk access time on average
Basic Idea the new parity block can be written
on the rotationally nearest unallocated block
following the old parity block.
Declustered Parity
Distributing the increased load caused by disk
failures uniformly over all disks
Basic Idea construct a multiple RAID system with
overlapping parity groups.

31
Other RAIDs

HP AutoRAID
AFRAID
RAPID
SwiftRAID
TickerTAIP
SMDA

32
Berkeley History RAID-I

RAID-I (1989)
Consisted of a Sun 4/280 workstation with 128 MB
of DRAM, four dual-string SCSI controllers, 28
5.25-inch SCSI disks and specialized disk
striping software
Today RAID is 19 billion dollar industry, 80
nonPC disks sold in RAIDs

33
RAID Techniques Goal was performance, popularity
due to reliability of storage
1 0 0 1 0 0 1 1
1 0 0 1 0 0 1 1
Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow"
Logical write two physical writes 100
capacity overhead
1 0 0 1 0 0 1 1
0 0 1 1 0 0 1 0
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally Logically a single
high data bw disk
High I/O Rate Parity Array (RAID 5)
Interleaved parity blocks Independent reads and
writes Logical write 2 reads 2 writes
34
RAID 0 Striped Disk Array without Fault
Tolerance
RAID Level 0 requires a minimum of 2 drives to
implement
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)

Write a Comment

User Comments (0)