Storage Systems Part I - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Storage Systems Part I

Description:

... to an arm enabling it to move across the platter surface ... between platter. ... tracks on the edge of the platter is larger than the tracks close to ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 50
Provided by: paa5138
Category:

less

Transcript and Presenter's Notes

Title: Storage Systems Part I


1
Storage Systems Part I
INF SERV Media Storage and Distribution Systems
  • 17/10 - 2002

2
Overview
  • Disks
  • mechanics and properties
  • Disk scheduling
  • traditional
  • real-time
  • stream oriented

3
Storage System
  • The VoD storage systems deals with issues like
  • data retrieval from storage devices
  • data placement and organization
  • QoS guarantees like ensured continuous delivery
  • must consider the storage sub-system architecture
    for optimal performance

4
Disks
5
Disks I
  • Disks are orders of magnitude slower than main
    memory, but are cheaper and have more capacity
  • Disks are used to have a persistent system and
    manage huge amounts of information
  • Because...
  • ...there is a large speed mismatch compared to
    main memory (this gap will increase according
    to Moores law),
  • ...disk I/O is often the main performance
    bottleneck
  • ...we need to minimize the number of accesses,
  • ...
  • we must look closer on how to manage disks

6
Disks II
  • Two resources of importance
  • storage space
  • disk I/O bandwidth
  • Several approaches to manage multimedia data on
    disks
  • specific disk scheduling and large buffers
    (traditional file structure)
  • optimize data placement for contiguous media
    (traditional retrieval mechanisms)
  • combinations of the above

7
Mechanics of Disks
Spindleof which the platters rotate around
Tracksconcentric circles on asingle platter
Platterscircular platters covered with magnetic
material to provide nonvolatile storage of bits
Disk headsread or alter the magnetism (bits)
passing under it. The heads are attached to an
arm enabling it to move across the platter surface
Sectorssegments of the track circle separated
by non-magnetic gaps.The gaps are often used to
identifybeginning of a sector
Cylinderscorresponding tracks on the different
platters are said to form a cylinder
8
Disk Specifications
Note 1disk manufacturers usually denote GB as
109 whereascomputer quantities often arepowers
of 2, i.e., GB is 230
  • Disk technology develops fast
  • Some existing (Seagate) disks today

Note 2there is usually a trade off between
speed and capacity
Note 3there is a difference between internal
and formatted transfer rate. Internal is only
between platter. Formatted is after the signals
interfere with the electronics (cabling loss,
interference, retransmissions, checksums, etc.)
9
Disk Capacity
  • The size of the disk is dependent on
  • the number of platters
  • whether the platters use one or both sides
  • number of tracks per surface
  • (average) number of sectors per track
  • number of bytes per sector
  • Example (Cheetah X15)
  • 4 platters using both sides 8 surfaces
  • 18497 tracks per surface
  • 617 sectors per track (average)
  • 512 bytes per sector
  • Total capacity 8 x 18497 x 617 x 512 ? 4.6 x
    1010 42.8 GB
  • Formatted capacity 36.7 GB

Note 1the tracks on the edge of the platter is
larger than the tracks close to the spindle.
Today, most disks are zoned, i.e., the outer
tracks have more sectors than the inner tracks
Note 2there is a difference between formatted
and total capacity. Some of the capacity is used
for storing checksums, spare tracks, gaps, etc.
10
Disk Access Time I
  • How do we retrieve data from disk?
  • position head over the cylinder (track) on which
    the block (consisting of one or more sectors) are
    located
  • read or write the data block as the sectors move
    under the head when the platters rotate
  • The time between the moment issuing a disk
    request and the time the block is resident in
    memory is called disk latency or disk access time

11
Disk Access Time II
Disk platter
Disk access time
Disk head
Seek time
Rotational delay
Transfer time
Disk arm
Other delays
12
Disk Access Time Seek Time
  • Seek time is the time to position the head
  • the heads require a minimum amount of time to
    start and stop moving the head
  • some time is used for actually moving the head
    roughly proportional to the number of cylinders
    traveled

Typical average 10 ms ? 40 ms 7.4 ms
(Barracuda 180) 5.7 ms (Cheetah 36) 3.6 ms
(Cheetah X15)
13
Disk Access Time Rotational Delay
  • Time for the disk platters to rotate so the first
    of the required sectors are under the disk head

Average delay is 1/2 revolutionTypical
average 8.33 ms (3.600 RPM) 5.56 ms
(5.400 RPM) 4.17 ms (7.200 RPM) 3.00 ms
(10.000 RPM) 2.00 ms (15.000 RPM)
14
Disk Access Time Transfer Time
  • Time for data to be read by the disk head, i.e.,
    time it takes the sectors of the requested block
    to rotate past the head
  • Transfer time
  • Example 1If a disk has 250 KB per track and
    operates at 10.000 RPM, we can read from the
    disk at 40.69 MB/s
  • Example 2 Barracuda 180406 KB per track x
    7.200 RPM ? 47.58 MB/s
  • Example 2 Cheetah X15316 KB per track x
    15.000 RPM ? 77.15 MB/s
  • Access time is dependent on data density and
    rotation speed
  • If we has to change track, time must also be
    added for moving the head

Noteone might achieve these transfer rates
reading continuously on disk, but time must be
added for seeks, etc.
15
Disk Access Time Other Delays
  • There are several other factors which might
    introduce additional delays
  • CPU time to issue and process I/O
  • contention for controller
  • contention for bus
  • contention for memory
  • verifying block correctness with checksums
    (retransmissions)
  • waiting in scheduling queue
  • ...
  • Typical values 0 (maybe except from waiting
    in the queue)

16
Disk Throughput
  • How much data can we retrieve per second?
  • Throughput
  • Examplefor each operation we have- average
    seek- average rotational delay- transfer time-
    no gaps, etc.
  • Cheetah X154 KB blocks ? 0.71 MB/s64 KB blocks
    ? 11.42 MB/s
  • Barracuda 180 4 KB blocks ? 0.35 MB/s64 KB
    blocks ? 5.53 MB/s

Noteto increase overall throughput, one should
read as much as possible contiguously on disk
17
Some Complicating Issues
  • There are several complicating factors
  • The other delays described earlier like
    consumed CPU time, resource contention, etc.
  • zoned disks, i.e., outer tracks are longer and
    therefore usually have more sectors than inner
  • checksums are also stored with each the sectors

inner
outer
Note 1transfer rates are higher on outer tracks
Note 3the checksum is read for each track and
used to validate the track
Note 5for older drives the checksum is 16 bytes
Note 4the checksum is usually calculated using
Reed-Solomon interleaved with CRC
Note 6SCSI disks may be changed by user to have
other sector sizes
Note 2gaps between sectors
18
Writing and Modifying Blocks
  • A write operation is analogous to read operations
  • must add time for block allocation
  • a complication occurs if the write operation has
    to be verified must wait another rotation and
    then read the block to see if it is the block we
    wanted to write
  • Total write time ? read time time for one
    rotation
  • Cannot modify a block directly
  • read block into main memory
  • modify the block
  • write new content back to disk
  • (verify the write operation)
  • Total modify time ? read time time to modify
    write time

19
Disk Controllers
  • To manage the different parts of the disk, we use
    a disk controller, which is a small processor
    capable of
  • controlling the actuator moving the head to the
    desired track
  • selecting which platter and surface to use
  • knowing when right sector is under the head
  • transferring data between main memory and disk
  • New controllers acts like small computers
    themselves
  • both disk and controller now has an own buffer
    reducing disk access time
  • data on damaged disk blocks/sectors are just
    moved to spare room at the disk the system
    above (OS) does not know this, i.e., a block may
    lie elsewhere than the OS thinks

20
Efficient Secondary Storage Usage
  • Many programs are assumed to fit in main memory,
    but one must assume that data is larger than main
    memory
  • Must take into account the use of secondary
    storage
  • there are large access time gaps, i.e., a disk
    access will probably dominate the total execution
    time
  • there may be huge performance improvements if we
    reduce the number of disk accesses
  • a slow algorithm with few disk accesses will
    probably outperform a fast algorithm with many
    disk accesses
  • Several ways to optimize .....
  • disk scheduling
  • block size
  • multiple disks
  • prefetching
  • file management / data placement
  • memory caching / replacement algorithms

21
Disk Scheduling
22
Disk Scheduling I
  • Seek time is a dominant factor of total disk I/O
    time
  • Let operating system or disk controller choose
    which request to serve next depending on current
    position on disk and requested blocks position
    on disk (disk scheduling)
  • Note that disk scheduling ? CPU scheduling
  • a mechanical device hard to determine
    (accurate) access times
  • disk accesses cannot be preempted runs until it
    finishes
  • disk I/O often the main performance bottleneck
  • General goals
  • short response time
  • high overall throughput
  • fairness (equal probability for all blocks to be
    accessed in the same time)
  • Tradeoff seek and rotational delay vs. maximum
    response time

23
Disk Scheduling II
  • Several traditional algorithms
  • First-Come-First-Serve (FCFS)
  • Shortest Seek Time First (SSTF)
  • SCAN (and variations)
  • Look (and variations)

24
First-Come-First-Serve (FCFS)
  • FCFS serves the first arriving request first
  • Long seeks
  • Short average response time

incoming requests (in order of arrival)
12
14
2
7
21
8
24
12
14
2
7
21
Notethe lines only indicate some time not
exact amount
8
24
25
Shortest Seek Time First (SSTF)
  • SSTF serves closest request first
  • short seek times
  • longer maximum seek times may even lead to
    starvation

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
26
SCAN
  • SCAN (elevator) moves head edge to edge and
    serves requests on the way
  • bi-directional
  • compromise between response time and seek time
    optimizations

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
scheduling queue
27
C-SCAN
  • Circular-SCAN moves head from edge to edge
  • serves requests on one way uni-directional
  • improves response time (fairness)

incoming requests (in order of arrival)
12
14
2
7
21
8
24
24
8
21
7
2
14
12
scheduling queue
28
SCAN vs. C-SCAN
  • Why is C-SCAN in average better in reality than
    SCAN when both service the same number of
    requests in two passes?
  • modern disks must accelerate (speed up and down)
    when seeking
  • head movement formula

time
number of tracks seek time constant fixed overhead
cylinders traveled
if n is large
29
LOOK and C-LOOK
  • LOOK (C-LOOK) is a variation of SCAN (C-SCAN)
  • same schedule as SCAN
  • does not run to the edges
  • stops and returns at outer- and innermost request
  • increased efficiency
  • SCAN vs. LOOK example

incoming requests (in order of arrival)
12
14
2
7
21
8
24
scheduling queue
2
7
8
24
21
14
12
30
V-SCAN(R)
  • V-SCAN(R) combines SCAN (LOOK) and SSTF
  • define a R-sized unidirectional SCAN (LOOK)
    window, i.e., C-SCAN (C-LOOK),
  • V-SCAN(0.6) makes a C-SCAN (C-LOOK) window over
    60 of the cylinders
  • uses SSTF for requests outside the window
  • V-SCAN(0.0) equivalent with SSTF
  • V-SCAN(1.0) equivalent with SCAN (C-LOOK)
  • V-SCAN(0.2) is supposed to be an appropriate
    configuration

cylinder number
1
5
10
15
20
25
31
Continuous Media Disk Scheduling
  • Suitability of classical algorithms
  • minimal disk arm movement (short seek times)
  • no provision of time or deadlines
  • generally not suitable
  • Continuous media requirements
  • serve both periodic and aperiodic requests
  • never miss deadline due to aperiodic requests
  • aperiodic requests must not starve
  • support multiple streams
  • balance buffer space and efficiency tradeoff

32
Real-Time Disk Scheduling
  • Targeted for real-time applications with
    deadlines
  • Several proposed algorithms
  • earliest deadline first (EDF)
  • SCAN-EDF
  • shortest seek and earliest deadline by
    ordering/value (SSEDO / SSEDV)
  • priority SCAN (PSCAN)
  • ...

33
Earliest Deadline First (EDF)
  • EDF serves the request with nearest deadline
    first
  • non-preemptive (i.e., a request with a shorter
    deadline must wait)
  • excessive seeks
  • poor throughput

incoming requests (in order of arrival)
12,5
14,6
2,4
7,7
21,1
8,2
24,3
12,5
14,6
2,4
7,7
21,1
8,2
24,3
scheduling queue
34
SCAN-EDF
  • SCAN-EDF combines SCAN and EDF
  • the real-time aspects of EDF
  • seek optimizations of SCAN
  • especially useful if the end of the period of a
    batch is the deadline
  • increase efficiency by modifying the deadlines
  • method
  • serve requests with earlier deadline first (EDF)
  • sort requests with same deadline after track
    location (SCAN)

incoming requests (in order of arrival)
2,3
14,1
9,3
7,2
21,1
8,2
24,2
16,1
2,3
14,1
9,3
7,2
21,1
8,2
24,2
16,1
scheduling queue
Notesimilarly, we can combine EDF with C-SCAN,
LOOK or C-LOOK
35
Stream Oriented Disk Scheduling
  • Targeted for streaming contiguous media data
  • Several algorithms proposed
  • group sweep scheduling (GSS)
  • mixed disk scheduling strategy
  • contiguous media file system (CMFS)
  • lottery scheduling
  • stride scheduling
  • batched SCAN (BSCAN)
  • greedy-but-safe EDF (GS_EDF)
  • bubble up
  • MARS scheduler
  • chello
  • adaptive disk scheduler for mixed media workloads
    (APEX)

multimedia applications may require both RT and
NRT data desirable to have all on
same disk
36
Group Sweep Scheduling (GSS)
  • GSS combines Round-Robin (RR) and SCAN
  • requests are serviced in rounds (cycles)
  • principle
  • divide S active streams into G groups
  • service the G groups in RR order
  • service each stream in a group in C-SCAN order
  • playout can start at the end of the group
  • special cases
  • G S RR scheduling
  • G 1 SCAN scheduling
  • tradeoff between buffer space and disk arm
    movement
  • try different values for G giving minimum buffer
    requirement select minimum
  • a large G ? smaller groups, more arm movements,
    smaller buffers (reuse)
  • a small G ? larger groups, less arm movements,
    larger buffers
  • with high loads and equal playout rates, GSS and
    SCAN often service streams in same order
  • replacing RR with FIFO and group requests after
    deadline gives SCAN-EDF

37
Group Sweep Scheduling (GSS)
  • GSS example streams A, B, C and D ? g1A,C and
    g2B,D
  • RR group schedule
  • C-SCAN block schedule within a group

25
A2
A1
A3
B2
B3
B1
C1
C2
C3
D3
D1
D2
A1
g1
A,C
C1
B1
g2
B,D
D1
C2
g1
C,A
A2
B2
g2
B,D
D2
g1
A3
A,C
C3
B3
g2
B,D
D3
38
Mixed Disk Scheduling Strategy (MDSS)
  • MDSS combines SSTF with buffer overflow and
    underflow prevention
  • data delivered to several buffers (one per
    stream)
  • disk bandwidth share allocated according to
    buffer fill level
  • SSTF is used to schedule the requests

share allocator
SSTF scheduler
...
...
39
Continuous Media File System Disk Scheduling
  • CMFS provides (propose) several algorithms
  • determines new schedule on completion of each
    request
  • orders request so that no deadline violations
    occur delays new streams until it is safe to
    proceed (admission control)
  • all based on slack-time
  • amount of time that can be used for non-real-time
    requests or
  • work-ahead for continuous media requests
  • based on amount of data in buffers and deadlines
    of next requests(how long can I delay the
    request before violating the deadline?)
  • useful algorithms
  • greedy serve one stream as long as possible
  • cyclic serve always the stream with shortest
    slack-time

40
MARS Disk Scheduler
  • Massively-parallel And Real-time Storage (MARS)
    scheduler supports mixed media on a single system
  • a two-level scheduling
  • top-level 1 NRT queue and n (1) RT queue(SCAN,
    but future GSS, SCAN-EDF, or)
  • use deficit RR fair queuing to assign quantums
    to each queue per round divides total
    bandwidth among queues
  • bottom-level select requests from queues
    according to quantums, use SCAN order
  • work-conserving(variable round times, new round
    starts immediately)

NRT
RT

deficit round robin fair queuingjob selector
41
Chello
  • Chello is part of the Symphony FS supporting
    mixed media
  • two-level scheduling
  • top-level n (3) service classes (queues)
  • deadline ( end-of-round) real-time (EDF)
  • throughput intensive best effort (FCFS)
  • interactive best effort (FCFS)
  • divides total bandwidth among queues according
    to a static proportional allocation scheme(equal
    to MARS job selector)
  • bottom-level class independent scheduler (FCFS)
  • select requests from queues according to quantums
  • sort requests from each queue in SCAN order when
    transferred
  • partially work-conserving(extra requests might
    be added at the end of the classindependent
    scheduler if space, but constant rounds)

deadline RT
throughput intensive best-effort
interactive best-effort
1
7
4
3
2
8
2
1
2
sort each queue in SCAN order when transferred
42
Adaptive Disk Scheduler for Mixed Media Workloads
  • APEX is another mixed media scheduler designed
    for MM DBSs
  • two-level scheduling similar to Chello and MARS
  • uses token bucket for traffic shaping(bandwidth
    allocation)
  • the batch builder select requests inFCFS order
    from the queues based on number of tokens each
    queue must sort according to deadline (or
    another strategy)
  • work-conserving
  • adds extra requests if possible to a batch
  • starts extra batch between ordinary batches

Batch Builder
43
APEX, Chello and C-LOOK Comparison
  • Results from Ketil Lund (2002)
  • Configuration
  • Atlas Quantum 10K
  • Avg. seek 5.0ms
  • Avg. latency 3.0ms
  • transfer rate 18 26 MB/s
  • data placement random, video and audio
    multiplexed
  • round time 1 second
  • block size 64KB
  • Video playback and user queries
  • Six video clients
  • Each playing back a random video
  • Random start time (after 17 secs, all have
    started)

44
APEX, Chello and C-LOOK Comparison
  • Nine different user-query traces, each with the
    following characteristics
  • Inter-arrival time of queries is exponentially
    distributed, with a mean of 10 secs
  • Each query requests between two and 1011 pages
  • Inter-arrival time of disk requests in a query is
    exponentially distributed, with a mean of 9.7ms
  • Start with one trace, and then add traces, in
    order to increase workload (? queries may
    overlap)
  • Video data disk requests are assigned to a
    real-time queue
  • User-query disk requests to a best-effort queue
  • Bandwidth is shared 50/50 between real-time queue
    and best-effort queue
  • We measure response times (i.e., time from
    request arrived at disk scheduler, until data is
    placed in the buffer) for user-query disk
    requests, and check whether deadline violations
    occur for video data disk requests

45
APEX, Chello and C-LOOK Comparison
Deadlineviolations(video)
46
Disk Scheduling Today
  • Most algorithms assume linear head movement
    overhead, but this is not the case (acceleration)
  • Disk buffer caches may use read-ahead prefetching
  • The disk parameters exported to the OS may be
    completely different from the actual disk
    mechanics
  • Modern disks (often) have a built-in SCAN
    scheduler
  • Actual VoD server implementation (???)
  • hierarchical software scheduler
  • several top-level queues, at least
  • RT (EDF)
  • NRT (FCFS)
  • process queues in rounds (RR)
  • dynamic assignment of quantums
  • work-conserving with variable round length(full
    disk bandwidth utilization vs. buffer
    requirement)
  • only simple collection of requests according to
    quantums in lowest level and forwarding to disk,
    because ...
  • ..fixed SCAN scheduler in hardware (on disk)

EDF / FCFS
SCAN
47
The EndSummary
48
Summary
  • The main bottleneck is disk I/O performance due
    to disk mechanics seek time and rotational
    delays
  • Many algorithms trying to minimize seek
    overhead(most existing systems uses a SCAN
    derivate)
  • World today more complicated (both different
    media and unknown disk characteristics)
  • Next week, distribution (part II)
  • In two weeks, storage systems (part II)
  • data placement
  • multiple disks
  • memory caching
  • ...

49
Some References
  • Anderson, D. P., Osawa, Y., Govindan, R.A File
    System for Continuous Media, ACM Transactions on
    Computer Systems, Vol. 10, No. 4, Nov. 1992, pp.
    311 - 337  
  • Elmasri, R. A., Navathe, S. Fundamentals of
    Database Systems, Addison Wesley, 2000
  • Garcia-Molina, H., Ullman, J. D., Widom, J.
    Database Systems The Complete Book, Prentice
    Hall, 2002
  • Lund, K. Adaptive Disk Scheduling for
    Multimedia Database Systems, PhD thesis,
    IFI/UniK, UiO (to be finished soon)
  • Plagemann, T., Goebel, V., Halvorsen, P., Anshus,
    O. Operating System Support for Multimedia
    Systems, Computer Communications, Vol. 23, No.
    3, February 2000, pp. 267-289
  • Seagate Technology, http//www.seagate.com
  • Sitaram, D., Dan, A. Multimedia Servers
    Applications, Environments, and Design, Morgan
    Kaufmann Publishers, 2000
Write a Comment
User Comments (0)
About PowerShow.com