Deconstructing Storage Arrays - PowerPoint PPT Presentation

About This Presentation
Title:

Deconstructing Storage Arrays

Description:

Determining the Chunk Size. Chunk size. amount of data contiguously allocated to one disk ... Cluster results and identify actual chunk size. Chunk Size Example ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 88
Provided by: Csw5
Category:

less

Transcript and Presenter's Notes

Title: Deconstructing Storage Arrays


1
Deconstructing Storage Arrays
  • Timothy E. Denehy, John Bent, Florentina I.
    Popovici,
  • Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
  • University of Wisconsin, Madison

2
Gray-box Research
  • Computer systems becoming more complex
  • Transistors
  • Lines of code
  • Each component is becoming more complex
  • Interactions between subsystems can affect
  • Performance
  • Reliability
  • Power
  • Security

3
Gray-box Research
  • Interfaces remain the same
  • Changes can be difficult and impractical
  • Support multiple platforms or legacy systems
  • Commercial acceptance for wide-spread adoption
  • Hardware and software phenomenon
  • IA-32 instruction set, POSIX OS, SCSI storage
  • Problem lack of information

4
Gray-box Solution
  • Treat target system as a gray-box
  • General characteristics are known
  • Extract information from an existing interface
  • e.g. determine cache contents
  • Exploit information to control system behavior
  • e.g. access cached data first

5
Gray-box Information Techniques
  • Make assumptions about target system
  • Observe system inputs and outputs
  • Statistical methods
  • Draw inferences about internal structure
  • Microbenchmarks and probes
  • Parameterize system components
  • Observe system under controlled input

6
Gray-box Applications
  • Gray-box techniques have been used to identify
  • Memory hierarchy parameters Saavedra and
    Smith
  • Processor cycle time Staelin and McVoy
  • Low-level disk characteristics Worthington et
    al.
  • Buffer cache replacement algorithms Burnett et
    al.
  • File system data structures Sivathanu et al.
  • storage array characteristics Shear

7
Shear
  • Software tool that automatically determines the
    important properties of a storage array
  • Enables file system performance tuning with
    knowledge of storage array characteristics
  • Acts as a management tool to help configure,
    monitor, and maintain storage arrays

8
Outline
  • Introduction
  • Shear
  • Background
  • Algorithm
  • Case Studies
  • Performance Stripe-aligned Writes
  • Management Detecting Misconfiguration, Failure
  • Conclusion

9
Shear Goals
  • Determine storage array characteristics

SCSI
SCSI
10
Shear Goals
  • Determine storage array characteristics
  • Number of disks

SCSI
SCSI
11
Shear Goals
  • Determine storage array characteristics
  • Number of disks
  • Chunk size

SCSI
SCSI
12
Shear Goals
  • Determine storage array characteristics
  • Number of disks
  • Chunk size
  • Layout and redundancy scheme

SCSI
SCSI
RAID-0
13
Shear Goals
  • Determine storage array characteristics
  • Number of disks
  • Chunk size
  • Layout and redundancy scheme

SCSI
SCSI
RAID-1
14
Shear Goals
  • Determine storage array characteristics
  • Number of disks
  • Chunk size
  • Layout and redundancy scheme

SCSI
SCSI
RAID-5
15
Shear Motivation
  • Performance
  • Tune file systems to array characteristics
  • Management
  • Verify configuration
  • Detect failure

16
Shear Techniques
  • Microbenchmarks and probes
  • Controlled, random access read and write patterns
  • Measure response time of access patterns
  • Measure steady-state performance
  • Statistical clustering
  • Automatically classify fast and slow regimes
  • Identify patterns that utilize only a single disk

17
Shear Assumptions
  • Storage array
  • Layout follows a repeatable pattern
  • Composed of homogeneous disks
  • System
  • Able to bypass the file system and buffer cache
  • Little traffic from other processes

18
Outline
  • Introduction
  • Shear
  • Background
  • Algorithm
  • Case Studies
  • Performance Stripe-aligned Writes
  • Management Detecting Misconfiguration, Failure
  • Conclusion

19
Shear Algorithm
  • Pattern size
  • Chunk size
  • Layout of chunks to disks
  • Level of redundancy

20
Determining the Pattern Size
  • Find the size of the layout's repeating pattern
  • Not always the stripe size
  • Choose a hypothetical pattern size
  • Perform random reads at multiples of that
    distance
  • Repeat for a range of pattern sizes
  • Cluster results and identify actual pattern size

21
Pattern Size Example
22
Pattern Size Example
23
Pattern Size Example
24
Pattern Size Example
25
Pattern Size Example
26
Pattern Size Example
27
Pattern Size Example
28
Pattern Size Example
29
Pattern Size Example
30
Pattern Size Example
31
Pattern Size Example
32
Pattern Size Example
33
Pattern Size Example
34
Pattern Size Example
35
Pattern Size Example
36
Pattern Size Example
37
Pattern Size Example
38
Pattern Size Example
39
Pattern Size Example
40
Shear Algorithm
  • Pattern size
  • Chunk size
  • Layout of chunks to disks
  • Level of redundancy

41
Determining the Chunk Size
  • Chunk size
  • amount of data contiguously allocated to one disk
  • Find the boundaries between disks
  • Choose a hypothetical boundary offset
  • Perform random reads on both sides of that offset
  • Repeat for all offsets in the pattern size
  • Cluster results and identify actual chunk size

42
Chunk Size Example
43
Chunk Size Example
44
Chunk Size Example
45
Chunk Size Example
46
Chunk Size Example
47
Chunk Size Example
48
Chunk Size Example
49
Chunk Size Example
50
Chunk Size Example
51
Chunk Size Example
52
Chunk Size Example
53
Chunk Size Example
54
Shear Algorithm
  • Pattern size
  • Chunk size
  • Layout of chunks to disks
  • Level of redundancy

55
Determining the Read Layout
  • Find mapping of chunks to disks
  • Choose a pair of chunks in the pattern
  • Perform random reads to both chunks
  • Repeat for all pairs of chunks
  • Cluster results and identify chunks on same disk

56
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
6
5
4
57
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
6
5
4
Testing 0, 0
58
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 1
59
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 2
60
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 3
61
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 4
62
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 5
63
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
6
5
4
7
Testing 0, 6
64
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
6
5
4
Testing 0, 7
65
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
6
5
4
Testing 1, 1
66
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 2
67
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 3
68
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 4
69
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 5
70
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 6
71
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Testing 1, 7
72
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
73
Read Layout Example
RAID-0 ZIG-ZAG 4 Disks
7
5
4
6
Actual 0, 7 1, 6 2, 5 3, 4
74
Shear Algorithm
  • Pattern size
  • Chunk size
  • Layout of chunks to disks
  • Level of redundancy

75
Determining Level of Redundancy
  • Ratio of read to write bandwidth reveals the type
    of redundancy in the array
  • Expected R/W ratios
  • RAID-0 1 (no redundancy)
  • RAID-1 2 (mirroring)
  • RAID-4 varies (examine write layout)
  • RAID-5 4 (parity)

76
Shear Experience
  • Shear has been applied to
  • Linux software RAID
  • Poor RAID-5 parity updates
  • Adaptec hardware RAID controller
  • Implements RAID-5 left-asymmetric layout
  • RAID-0
  • RAID-1
  • Chained Declustering
  • RAID-4
  • RAID-5
  • PQ

77
Outline
  • Introduction
  • Shear
  • Background
  • Algorithm
  • Case Studies
  • Performance Stripe-aligned Writes
  • Management Detecting Misconfiguration, Failure
  • Conclusion

78
RAID-5 Performance
  • Small writes on RAID-5 are problematic
  • Require two reads, parity calculation, two writes
  • Writing in full stripes is more efficient

RAID-5
79
Stripe-aligned Writes
  • Overcome RAID-5 small write problem
  • Modified Linux disk scheduler
  • Groups writes into full stripes
  • Aligns writes along stripe boundaries
  • Approximately 20 lines of code
  • Experiment
  • Hardware RAID-5, 4 disks, 16 KB chunks
  • Create 100 files of varying sizes

80
Stripe-aligned Writes Experiment
  • Simple modification has a large impact

81
Detecting Misconfigurations
  • Software RAID, 4 Disks, 8 KB Chunks
  • What if one disk is accidentally used twice?

82
Detecting Misconfigurations
83
Detecting Failures
  • Software RAID
  • RAID-5 LS
  • 10 disks
  • 8 KB chunks

84
Detecting Failures
  • Software RAID
  • RAID-5 LS
  • 10 disks
  • 8 KB chunks
  • Disk 5 fails

85
Outline
  • Introduction
  • Shear
  • Background
  • Algorithm
  • Case Studies
  • Performance Stripe-aligned Writes
  • Management Detecting Misconfiguration, Failure
  • Conclusion

86
Conclusion
  • Gray-box research
  • Extract / exploit information from existing
    interfaces
  • Shear
  • Extracts information
  • Automatically determines storage array properties
  • Exploits information
  • File system performance tuning
  • Storage management

87
Questions?
  • http//www.cs.wisc.edu/adsl/
Write a Comment
User Comments (0)
About PowerShow.com