Storage Systems Part I

About This Presentation

Title:

Storage Systems Part I

Description:

... to an arm enabling it to move across the platter surface ... between platter. ... tracks on the edge of the platter is larger than the tracks close to ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 50

Provided by: paa5138

Category:

more less

Transcript and Presenter's Notes

Title: Storage Systems Part I

1
Storage Systems Part I
INF SERV Media Storage and Distribution Systems

17/10 - 2002

2
Overview

Disks
mechanics and properties
Disk scheduling
traditional
real-time
stream oriented

3
Storage System

The VoD storage systems deals with issues like
data retrieval from storage devices
data placement and organization
QoS guarantees like ensured continuous delivery
must consider the storage sub-system architecture
for optimal performance

4
Disks
5
Disks I

Disks are orders of magnitude slower than main
memory, but are cheaper and have more capacity
Disks are used to have a persistent system and
manage huge amounts of information
Because...
...there is a large speed mismatch compared to
main memory (this gap will increase according
to Moores law),
...disk I/O is often the main performance
bottleneck
...we need to minimize the number of accesses,
...
we must look closer on how to manage disks

6
Disks II

Two resources of importance
storage space
disk I/O bandwidth
Several approaches to manage multimedia data on
disks
specific disk scheduling and large buffers
(traditional file structure)
optimize data placement for contiguous media
(traditional retrieval mechanisms)
combinations of the above

7
Mechanics of Disks
Spindleof which the platters rotate around
Tracksconcentric circles on asingle platter
Platterscircular platters covered with magnetic
material to provide nonvolatile storage of bits
Disk headsread or alter the magnetism (bits)
passing under it. The heads are attached to an
arm enabling it to move across the platter surface
Sectorssegments of the track circle separated
by non-magnetic gaps.The gaps are often used to
identifybeginning of a sector
Cylinderscorresponding tracks on the different
platters are said to form a cylinder
8
Disk Specifications
Note 1disk manufacturers usually denote GB as
109 whereascomputer quantities often arepowers
of 2, i.e., GB is 230

Disk technology develops fast
Some existing (Seagate) disks today

Note 2there is usually a trade off between
speed and capacity
Note 3there is a difference between internal
and formatted transfer rate. Internal is only
between platter. Formatted is after the signals
interfere with the electronics (cabling loss,
interference, retransmissions, checksums, etc.)
9
Disk Capacity

The size of the disk is dependent on
the number of platters
whether the platters use one or both sides
number of tracks per surface
(average) number of sectors per track
number of bytes per sector
Example (Cheetah X15)
4 platters using both sides 8 surfaces
18497 tracks per surface
617 sectors per track (average)
512 bytes per sector
Total capacity 8 x 18497 x 617 x 512 ? 4.6 x
1010 42.8 GB
Formatted capacity 36.7 GB

Note 1the tracks on the edge of the platter is
larger than the tracks close to the spindle.
Today, most disks are zoned, i.e., the outer
tracks have more sectors than the inner tracks
Note 2there is a difference between formatted
and total capacity. Some of the capacity is used
for storing checksums, spare tracks, gaps, etc.
10
Disk Access Time I

How do we retrieve data from disk?
position head over the cylinder (track) on which
the block (consisting of one or more sectors) are
located
read or write the data block as the sectors move
under the head when the platters rotate
The time between the moment issuing a disk
request and the time the block is resident in
memory is called disk latency or disk access time

11
Disk Access Time II
Disk platter
Disk access time
Disk head
Seek time
Rotational delay
Transfer time
Disk arm
Other delays
12
Disk Access Time Seek Time

Seek time is the time to position the head
the heads require a minimum amount of time to
start and stop moving the head
some time is used for actually moving the head
roughly proportional to the number of cylinders
traveled

Typical average 10 ms ? 40 ms 7.4 ms
(Barracuda 180) 5.7 ms (Cheetah 36) 3.6 ms
(Cheetah X15)
13
Disk Access Time Rotational Delay

Time for the disk platters to rotate so the first
of the required sectors are under the disk head

Average delay is 1/2 revolutionTypical
average 8.33 ms (3.600 RPM) 5.56 ms
(5.400 RPM) 4.17 ms (7.200 RPM) 3.00 ms
(10.000 RPM) 2.00 ms (15.000 RPM)
14
Disk Access Time Transfer Time

Time for data to be read by the disk head, i.e.,
time it takes the sectors of the requested block
to rotate past the head
Transfer time
Example 1If a disk has 250 KB per track and
operates at 10.000 RPM, we can read from the
disk at 40.69 MB/s
Example 2 Barracuda 180406 KB per track x
7.200 RPM ? 47.58 MB/s
Example 2 Cheetah X15316 KB per track x
15.000 RPM ? 77.15 MB/s
Access time is dependent on data density and
rotation speed
If we has to change track, time must also be
added for moving the head

Noteone might achieve these transfer rates
reading continuously on disk, but time must be
added for seeks, etc.
15
Disk Access Time Other Delays

There are several other factors which might
introduce additional delays
CPU time to issue and process I/O
contention for controller
contention for bus
contention for memory
verifying block correctness with checksums
(retransmissions)
waiting in scheduling queue
...
Typical values 0 (maybe except from waiting
in the queue)

16
Disk Throughput

How much data can we retrieve per second?
Throughput
Examplefor each operation we have- average
seek- average rotational delay- transfer time-
no gaps, etc.
Cheetah X154 KB blocks ? 0.71 MB/s64 KB blocks
? 11.42 MB/s
Barracuda 180 4 KB blocks ? 0.35 MB/s64 KB
blocks ? 5.53 MB/s

Noteto increase overall throughput, one should
read as much as possible contiguously on disk
17
Some Complicating Issues

There are several complicating factors
The other delays described earlier like
consumed CPU time, resource contention, etc.
zoned disks, i.e., outer tracks are longer and
therefore usually have more sectors than inner
checksums are also stored with each the sectors

inner
outer
Note 1transfer rates are higher on outer tracks
Note 3the checksum is read for each track and
used to validate the track
Note 5for older drives the checksum is 16 bytes
Note 4the checksum is usually calculated using
Reed-Solomon interleaved with CRC
Note 6SCSI disks may be changed by user to have
other sector sizes
Note 2gaps between sectors
18
Writing and Modifying Blocks

A write operation is analogous to read operations
must add time for block allocation
a complication occurs if the write operation has
to be verified must wait another rotation and
then read the block to see if it is the block we
wanted to write
Total write time ? read time time for one
rotation
Cannot modify a block directly
read block into main memory
modify the block
write new content back to disk
(verify the write operation)
Total modify time ? read time time to modify
write time

19
Disk Controllers

To manage the different parts of the disk, we use
a disk controller, which is a small processor
capable of
controlling the actuator moving the head to the
desired track
selecting which platter and surface to use
knowing when right sector is under the head
transferring data between main memory and disk
New controllers acts like small computers
themselves
both disk and controller now has an own buffer
reducing disk access time
data on damaged disk blocks/sectors are just
moved to spare room at the disk the system
above (OS) does not know this, i.e., a block may
lie elsewhere than the OS thinks

20
Efficient Secondary Storage Usage

Many programs are assumed to fit in main memory,
but one must assume that data is larger than main
memory
Must take into account the use of secondary
storage
there are large access time gaps, i.e., a disk
access will probably dominate the total execution
time
there may be huge performance improvements if we
reduce the number of disk accesses
a slow algorithm with few disk accesses will
probably outperform a fast algorithm with many
disk accesses
Several ways to optimize .....
disk scheduling
block size
multiple disks
prefetching
file management / data placement
memory caching / replacement algorithms

21
Disk Scheduling
22
Disk Scheduling I

Seek time is a dominant factor of total disk I/O
time
Let operating system or disk controller choose
which request to serve next depending on current
position on disk and requested blocks position
on disk (disk scheduling)
Note that disk scheduling ? CPU scheduling
a mechanical device hard to determine
(accurate) access times
disk accesses cannot be preempted runs until it
finishes
disk I/O often the main performance bottleneck
General goals
short response time
high overall throughput
fairness (equal probability for all blocks to be
accessed in the same time)
Tradeoff seek and rotational delay vs. maximum
response time

23
Disk Scheduling II

Several traditional algorithms
First-Come-First-Serve (FCFS)
Shortest Seek Time First (SSTF)
SCAN (and variations)
Look (and variations)

24
First-Come-First-Serve (FCFS)

FCFS serves the first arriving request first
Long seeks
Short average response time

incoming requests (in order of arrival)
12
14
2
7
21
8
24
12
14
2
7
21
Notethe lines only indicate some time not
exact amount
8
24
25
Shortest Seek Time First (SSTF)

SSTF serves closest request first
short seek times
longer maximum seek times may even lead to
starvation

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
26
SCAN

SCAN (elevator) moves head edge to edge and
serves requests on the way
bi-directional
compromise between response time and seek time
optimizations

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
scheduling queue
27
C-SCAN

Circular-SCAN moves head from edge to edge
serves requests on one way uni-directional
improves response time (fairness)

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
scheduling queue
28
SCAN vs. C-SCAN

Why is C-SCAN in average better in reality than
SCAN when both service the same number of
requests in two passes?
modern disks must accelerate (speed up and down)
when seeking
head movement formula

time
number of tracks seek time constant fixed overhead
cylinders traveled
if n is large
29
LOOK and C-LOOK

LOOK (C-LOOK) is a variation of SCAN (C-SCAN)
same schedule as SCAN
does not run to the edges
stops and returns at outer- and innermost request
increased efficiency
SCAN vs. LOOK example

incoming requests (in order of arrival)
12
14
2
7
21
8
24
scheduling queue
2
7
8
24
21
14
12
30
V-SCAN(R)

V-SCAN(R) combines SCAN (LOOK) and SSTF
define a R-sized unidirectional SCAN (LOOK)
window, i.e., C-SCAN (C-LOOK),
V-SCAN(0.6) makes a C-SCAN (C-LOOK) window over
60 of the cylinders
uses SSTF for requests outside the window
V-SCAN(0.0) equivalent with SSTF
V-SCAN(1.0) equivalent with SCAN (C-LOOK)
V-SCAN(0.2) is supposed to be an appropriate
configuration

cylinder number
1
5
10
15
20
25
31
Continuous Media Disk Scheduling

Suitability of classical algorithms
minimal disk arm movement (short seek times)
no provision of time or deadlines
generally not suitable
Continuous media requirements
serve both periodic and aperiodic requests
never miss deadline due to aperiodic requests
aperiodic requests must not starve
support multiple streams
balance buffer space and efficiency tradeoff

32
Real-Time Disk Scheduling

Targeted for real-time applications with
deadlines
Several proposed algorithms
earliest deadline first (EDF)
SCAN-EDF
shortest seek and earliest deadline by
ordering/value (SSEDO / SSEDV)
priority SCAN (PSCAN)
...

33
Earliest Deadline First (EDF)

EDF serves the request with nearest deadline
first
non-preemptive (i.e., a request with a shorter
deadline must wait)
excessive seeks
poor throughput

incoming requests (in order of arrival)
12,5
14,6
2,4
7,7
21,1
8,2
24,3
12,5
14,6
2,4
7,7
21,1
8,2
24,3
scheduling queue
34
SCAN-EDF

SCAN-EDF combines SCAN and EDF
the real-time aspects of EDF
seek optimizations of SCAN
especially useful if the end of the period of a
batch is the deadline

increase efficiency by modifying the deadlines
method
serve requests with earlier deadline first (EDF)
sort requests with same deadline after track
location (SCAN)

incoming requests (in order of arrival)
2,3
14,1
9,3
7,2
21,1
8,2
24,2
16,1
2,3
14,1
9,3
7,2
21,1
8,2
24,2
16,1
scheduling queue
Notesimilarly, we can combine EDF with C-SCAN,
LOOK or C-LOOK
35
Stream Oriented Disk Scheduling

Targeted for streaming contiguous media data
Several algorithms proposed
group sweep scheduling (GSS)
mixed disk scheduling strategy
contiguous media file system (CMFS)
lottery scheduling
stride scheduling
batched SCAN (BSCAN)
greedy-but-safe EDF (GS_EDF)
bubble up
MARS scheduler
chello
adaptive disk scheduler for mixed media workloads
(APEX)

multimedia applications may require both RT and
NRT data desirable to have all on
same disk
36
Group Sweep Scheduling (GSS)

GSS combines Round-Robin (RR) and SCAN
requests are serviced in rounds (cycles)
principle
divide S active streams into G groups
service the G groups in RR order
service each stream in a group in C-SCAN order
playout can start at the end of the group
special cases
G S RR scheduling
G 1 SCAN scheduling
tradeoff between buffer space and disk arm
movement
try different values for G giving minimum buffer
requirement select minimum
a large G ? smaller groups, more arm movements,
smaller buffers (reuse)
a small G ? larger groups, less arm movements,
larger buffers
with high loads and equal playout rates, GSS and
SCAN often service streams in same order
replacing RR with FIFO and group requests after
deadline gives SCAN-EDF

37
Group Sweep Scheduling (GSS)

GSS example streams A, B, C and D ? g1A,C and
g2B,D
RR group schedule
C-SCAN block schedule within a group

25
A2
A1
A3
B2
B3
B1
C1
C2
C3
D3
D1
D2
A1
g1
A,C
C1
B1
g2
B,D
D1
C2
g1
C,A
A2
B2
g2
B,D
D2
g1
A3
A,C
C3
B3
g2
B,D
D3
38
Mixed Disk Scheduling Strategy (MDSS)

MDSS combines SSTF with buffer overflow and
underflow prevention
data delivered to several buffers (one per
stream)
disk bandwidth share allocated according to
buffer fill level
SSTF is used to schedule the requests

share allocator
SSTF scheduler
...
...
39
Continuous Media File System Disk Scheduling

CMFS provides (propose) several algorithms
determines new schedule on completion of each
request
orders request so that no deadline violations
occur delays new streams until it is safe to
proceed (admission control)
all based on slack-time
amount of time that can be used for non-real-time
requests or
work-ahead for continuous media requests
based on amount of data in buffers and deadlines
of next requests(how long can I delay the
request before violating the deadline?)
useful algorithms
greedy serve one stream as long as possible
cyclic serve always the stream with shortest
slack-time

40
MARS Disk Scheduler

Massively-parallel And Real-time Storage (MARS)
scheduler supports mixed media on a single system
a two-level scheduling
top-level 1 NRT queue and n (1) RT queue(SCAN,
but future GSS, SCAN-EDF, or)
use deficit RR fair queuing to assign quantums
to each queue per round divides total
bandwidth among queues
bottom-level select requests from queues
according to quantums, use SCAN order
work-conserving(variable round times, new round
starts immediately)

NRT
RT

deficit round robin fair queuingjob selector
41
Chello

Chello is part of the Symphony FS supporting
mixed media
two-level scheduling
top-level n (3) service classes (queues)
deadline ( end-of-round) real-time (EDF)
throughput intensive best effort (FCFS)
interactive best effort (FCFS)
divides total bandwidth among queues according
to a static proportional allocation scheme(equal
to MARS job selector)
bottom-level class independent scheduler (FCFS)
select requests from queues according to quantums
sort requests from each queue in SCAN order when
transferred
partially work-conserving(extra requests might
be added at the end of the classindependent
scheduler if space, but constant rounds)

deadline RT
throughput intensive best-effort
interactive best-effort
1
7
4
3
2
8
2
1
2
sort each queue in SCAN order when transferred
42
Adaptive Disk Scheduler for Mixed Media Workloads

APEX is another mixed media scheduler designed
for MM DBSs
two-level scheduling similar to Chello and MARS
uses token bucket for traffic shaping(bandwidth
allocation)
the batch builder select requests inFCFS order
from the queues based on number of tokens each
queue must sort according to deadline (or
another strategy)
work-conserving
adds extra requests if possible to a batch
starts extra batch between ordinary batches

Batch Builder
43
APEX, Chello and C-LOOK Comparison

Results from Ketil Lund (2002)
Configuration
Atlas Quantum 10K
Avg. seek 5.0ms
Avg. latency 3.0ms
transfer rate 18 26 MB/s
data placement random, video and audio
multiplexed
round time 1 second
block size 64KB
Video playback and user queries
Six video clients
Each playing back a random video
Random start time (after 17 secs, all have
started)

44
APEX, Chello and C-LOOK Comparison

Nine different user-query traces, each with the
following characteristics
Inter-arrival time of queries is exponentially
distributed, with a mean of 10 secs
Each query requests between two and 1011 pages
Inter-arrival time of disk requests in a query is
exponentially distributed, with a mean of 9.7ms
Start with one trace, and then add traces, in
order to increase workload (? queries may
overlap)
Video data disk requests are assigned to a
real-time queue
User-query disk requests to a best-effort queue
Bandwidth is shared 50/50 between real-time queue
and best-effort queue
We measure response times (i.e., time from
request arrived at disk scheduler, until data is
placed in the buffer) for user-query disk
requests, and check whether deadline violations
occur for video data disk requests

45
APEX, Chello and C-LOOK Comparison
Deadlineviolations(video)
46
Disk Scheduling Today

Most algorithms assume linear head movement
overhead, but this is not the case (acceleration)
Disk buffer caches may use read-ahead prefetching
The disk parameters exported to the OS may be
completely different from the actual disk
mechanics
Modern disks (often) have a built-in SCAN
scheduler
Actual VoD server implementation (???)
hierarchical software scheduler
several top-level queues, at least
RT (EDF)
NRT (FCFS)
process queues in rounds (RR)
dynamic assignment of quantums
work-conserving with variable round length(full
disk bandwidth utilization vs. buffer
requirement)
only simple collection of requests according to
quantums in lowest level and forwarding to disk,
because ...
..fixed SCAN scheduler in hardware (on disk)

EDF / FCFS
SCAN
47
The EndSummary
48
Summary

The main bottleneck is disk I/O performance due
to disk mechanics seek time and rotational
delays
Many algorithms trying to minimize seek
overhead(most existing systems uses a SCAN
derivate)
World today more complicated (both different
media and unknown disk characteristics)
Next week, distribution (part II)
In two weeks, storage systems (part II)
data placement
multiple disks
memory caching
...

49
Some References

Anderson, D. P., Osawa, Y., Govindan, R.A File
System for Continuous Media, ACM Transactions on
Computer Systems, Vol. 10, No. 4, Nov. 1992, pp.
311 - 337
Elmasri, R. A., Navathe, S. Fundamentals of
Database Systems, Addison Wesley, 2000
Garcia-Molina, H., Ullman, J. D., Widom, J.
Database Systems The Complete Book, Prentice
Hall, 2002
Lund, K. Adaptive Disk Scheduling for
Multimedia Database Systems, PhD thesis,
IFI/UniK, UiO (to be finished soon)
Plagemann, T., Goebel, V., Halvorsen, P., Anshus,
O. Operating System Support for Multimedia
Systems, Computer Communications, Vol. 23, No.
3, February 2000, pp. 267-289
Seagate Technology, http//www.seagate.com
Sitaram, D., Dan, A. Multimedia Servers
Applications, Environments, and Design, Morgan
Kaufmann Publishers, 2000