Chapter 7 Storage Systems - PowerPoint PPT Presentation

1 / 124
About This Presentation
Title:

Chapter 7 Storage Systems

Description:

Buses Connecting I/O Devices to CPU/Memory. Reliability, Availability, and ... IOP steals memory cycles. OP Device Address. target device. where cmnds are ... – PowerPoint PPT presentation

Number of Views:1197
Avg rating:3.0/5.0
Slides: 125
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: Chapter 7 Storage Systems


1
Chapter 7 Storage Systems
2
Outline
  • Introduction
  • Types of Storage Devices
  • Buses Connecting I/O Devices to CPU/Memory
  • Reliability, Availability, and Dependability
  • RAID Redundant Arrays of Inexpensive Disks
  • Errors and Failures in Real Systems
  • I/O Performance Measures
  • A Little Queuing Theory
  • Benchmarks of Storage Performance and
    Availability
  • Crosscutting Issues
  • Design An I/O System

3
7.1 Introduction
4
Motivation Who Cares About I/O?
  • CPU Performance 2 times very 18 months
  • I/O performance limited by mechanical delays
    (disk I/O)
  • lt 10 per year (I/O per sec or MB per sec)
  • Amdahl's Law system speed-up limited by the
    slowest part!
  • 10 I/O 10x CPU ? 5x Performance (lose
    50)
  • 10 I/O 100x CPU ? 10x Performance (lose 90)
  • I/O bottleneck
  • Diminishing fraction of time in CPU
  • Diminishing value of faster CPUs

5
Position of I/O in Computer Architecture Past
  • An orphan in the architecture domain
  • I/O meant the non-processor and memory stuff
  • Disk, tape, LAN, WAN, etc.
  • Performance was not a major concern
  • Devices characterized as
  • Extraneous, non-priority, infrequently used, slow
  • Exception is swap area of disk
  • Part of the memory hierarchy
  • Hence part of system performance but youre hosed
    if you use it often

6
Position of I/O in Computer Architecture Now
  • Trends I/O is the bottleneck
  • Communication is frequent
  • Voice response transaction systems, real-time
    video
  • Multimedia expectations
  • Even standard networks come in gigabit/sec
    flavors
  • For multi-computers
  • Result
  • Significant focus on system bus performance
  • Common bridge to the memory system and the I/O
    systems
  • Critical performance component for the SMP server
    platforms

7
System vs. CPU Performance
  • Care about speed at which user jobs get done
  • Throughput - how many jobs/time (system view)
  • Latency - how quick for a single job (user view)
  • Response time time between when a command is
    issued and results appear (user view)
  • CPU performance main factor when
  • Job mix fits in the memory ? there are very few
    page faults
  • I/O performance main factor when
  • The job is too big for the memory - paging is
    dominant
  • When the job reads/writes/creates a lot of
    unexpected files
  • OLTP Decision support -- Database
  • And then there is graphics specialty I/O devices

8
System Performance
  • Depends on many factors in the worst case
  • CPU
  • Compiler
  • Operating System
  • Cache
  • Main Memory
  • Memory-IO bus
  • I/O controller or channel
  • I/O drivers and interrupt handlers
  • I/O devices there are many types
  • Level of autonomous behavior
  • Amount of internal buffer capacity
  • Device specific parameters for latency and
    throughput

9
I/O Systems
May the same or differentMemory I/O Bus
interrupts
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
I/O Controller
I/O Controller
Graphics
Disk
Disk
Network
10
Keys to a Balanced System
  • Its all about overlap - I/O vs CPU
  • Timeworkload Timecpu TimeI/O - Timeoverlap
  • Consider the benefit of just speeding up one
  • Amdahls Law (see P4 as well)
  • Latency vs. Throughput

11
I/O System Design Considerations
  • Depends on type of I/O device
  • Size, bandwidth, and type of transaction
  • Frequency of transaction
  • Defer vs. do now
  • Appropriate memory bus utilization
  • What should the controller do
  • Programmed I/O
  • Interrupt vs. polled
  • Priority or not
  • DMA
  • Buffering issues - what happens on over-run
  • Protection
  • Validation

12
Types of I/O Devices
  • Behavior
  • Read, Write, Both
  • Once, multiple
  • Size of average transaction
  • Bandwidth
  • Latency
  • Partner - the speed of the slowest link theory
  • Human operated (interactive or not)
  • Machine operated (local or remote)

13
Some I/O Device Characteristics
14
Is I/O Important?
  • Depends on your application
  • Business - disks for file system I/O
  • Graphics - graphics cards or special
    co-processors
  • Parallelism - the communications fabric
  • Our focus mainline uniprocessing
  • Storage subsystems (Chapter 7)
  • Networks (Chapter 8)
  • Noteworthy Point
  • The traditional orphan
  • But now often viewed more as a front line topic

15
7.2 Types of Storage Devices
16
Magnetic Disks
  • 2 important Roles
  • Long term, non-volatile storage file system and
    OS
  • Lowest level of the memory hierarchy
  • Most of the virtual memory is physically resident
    on the disk
  • Long viewed as a bottleneck
  • Mechanical system ? slow
  • Hence they seem to be an easy target for improved
    technology
  • Disk improvement w.r.t. density have done better
    than Moores law

17
Disks are organized into platters, tracks, and
sectors
(1-12 2 sides)
(5000 30000 each surface)
(100 500)
A sector is the smallestunit that can be read or
written
18
Physical Organization Options
  • Platters one or many
  • Density - fixed or variable
  • All tracks have the same no. of sectors?)
  • Organization - sectors, cylinders, and tracks
  • Actuators - 1 or more
  • Heads - 1 per track or 1 per actuator
  • Access - seek time vs. rotational latency
  • Seek related to distance but not linearly
  • Typical rotation 3600 RPM or 15000 RPM
  • Diameter 1.0 to 3.5 inches

19
Typical Physical Organization
  • Multiple platters
  • Metal disks covered with magnetic recording
    material on both sides
  • Single actuator (since they are expensive)
  • Single R/W head per arm
  • One arm per surface
  • All heads therefore over same cylinder
  • Fixed sector size
  • Variable density encoding
  • Disk controller usually built in processor
    buffering

20
Characteristics of Three Magnetic Disks of 2000
21
Anatomy of a Read Access
  • Steps
  • Memory mapped I/O over bus to controller
  • Controller starts access
  • Seek rotational latency wait
  • Sector is read and buffered (validity checked)
  • Controller says ready or DMAs to main memory and
    then says ready

22
Access Time
  • Access Time
  • Seek time time to move the arm over the proper
    track
  • Very non-linear accelerate and decelerate times
    complicate
  • Rotation latency (delay) time for the requested
    sector to rotate under the head (on average 0.5
    RPM)
  • Transfer time time to transfer a block of bits
    (typically a sector) under the read-write head
  • Controller overhead the overhead the controller
    imposes in performing an I/O access
  • Queuing delay time spent waiting for a disk to
    become free

23
Access Time Example
  • Assumption average seek time 5ms transfer
    rate 40MB/sec 10,000 RPM controller overhead
    0.1ms no queuing delay
  • What is the average time to r/w a 512-byte
    sector?
  • Answer

24
Cost VS Performance
  • Large-diameter drives have many more data to
    amortize the cost of electronics ? lowest cost
    per GB
  • Higher sales volume ? lower manufacturing cost
  • 3.5-inch drive, the largest surviving drive in
    2001, also has the highest sales volume, so it
    unquestionably has the best price per GB

25
Future of Magnetic Disks
  • Areal density bits/unit area is common
    improvement metric
  • Trends
  • Until 1988 29 improvement per year
  • 1988 1996 60 per year
  • 1997 2001 100 per year
  • 2001
  • 20 billion bits per square inch
  • 60 billion bit per square inch demonstrated in
    labs

26
Disk Price Trends by Capacity
27
Disk Price Trends Dollars Per MB
28
Cost VS Access Time for SRAM, DRAM, and Magnetic
Disk
29
Disk Alternatives
  • Optical Disks
  • Optical compact disks (CD) 0.65GB
  • Digital video discs, digital versatile disks
    (DVD) 4.7GB 2 sides
  • Rewritable CD (CD-RW) and write-once CD (CD-R)
  • Rewritable DVD (DVD-RAM) and write-once DVD
    (DVD-R)
  • Robotic Tape Storage
  • Optical Juke Boxes
  • Tapes DAT, DLT
  • Flash memory
  • Good for embedded systems
  • Nonvolatile storage and rewritable ROM

30
7.3 Bus Connecting I/O Devices to CPU/Memory
31
I/O Connection Issues
Connecting the CPU to the I/O device world
  • Shared communication link between subsystems
  • Typical choice is a bus
  • Advantages
  • Shares a common set of wires and protocols ? low
    cost
  • Often based on standard - PCI, SCSI, etc. ?
    portable and versatility
  • Disadvantages
  • Poor performance
  • Multiple devices imply arbitration and therefore
    contention
  • Can be a bottleneck

32
I/O Connection Issues Multiple Buses
  • I/O bus
  • Lengthy
  • Many types of connected devices
  • Wide range in device bandwidth
  • Follow a bus standard
  • Accept devices varying in latency and bandwidth
    capabilities
  • CPU-memory bus
  • Short
  • High speed
  • Match to the memory system to maximize CPU-memory
    bandwidth
  • Knows all types of devices that must connect
    together

33
Typical Bus Synchronous Read Transaction
34
Bus Design Decisions
  • Other things to standardize as well
  • Connectors
  • Voltage and current levels
  • Physical encoding of control signals
  • Protocols for good citizenship

35
Bus Design Decisions (Cont.)
  • Bus master devices that can initiate a R/W
    transaction
  • Multiple multiple CPUs, I/O device initiate bus
    transactions
  • Multiple bus masters need arbitration (fixed
    priority or random)
  • Split transaction for multiple masters
  • Use packets for the full transaction (does not
    hold the bus)
  • A read transaction is broken into read-request
    and memory-reply transactions
  • Make the bus available for other masters while
    the data is read/written from/to the specified
    address
  • Transactions must be tagged
  • Higher bandwidth, but also higher latency

36
Split Transaction Bus
37
Bus Design Decisions (Cont.)
  • Clocking Synchronous vs. Asynchronous
  • Synchronous
  • Include a clock in the control lines, and a fixed
    protocol for address and data relative to the
    clock
  • Fast and inexpensive (little or no logic to
    determine what's next)
  • Everything on the bus must run at the same clock
    rate
  • Short length (due to clock skew)
  • CPU-memory buses
  • Asynchronous
  • Easier to connect a wide variety of devices, and
    lengthen the bus
  • Scale better with technological changes
  • I/O buses

38
Synchronous or Asynchronous?
39
Standards
  • The Good
  • Let the computer and I/O-device designers work
    independently
  • Provides a path for second party (e.g. cheaper)
    competition
  • The Bad
  • Become major performance anchors
  • Inhibit change
  • How to create a standard
  • Bottom-up
  • Company tries to get standards committee to
    approve its latest philosophy in hopes that
    theyll get the jump on the others (e.g. S bus,
    PC-AT bus, ...)
  • De facto standards
  • Top-down
  • Design by committee (PCI, SCSI, ...)

40
Some Sample I/O Bus Designs
41
Some Sample Serial I/O Bus
Often used in embedded computers
42
CPU-Memory Buses Found in 2001 Servers
Crossbar Switch
43
Connecting the I/O Bus
  • To main memory
  • I/O bus and CPU-memory bus may the same
  • I/O commands on bus could interfere with CPU's
    access memory
  • Since cache misses are rare, does not tend to
    stall the CPU
  • Problem is lack of coherency
  • Currently, we consider this case
  • To cache
  • Access
  • Memory-mapped I/O or distinct instruction (I/O
    opcodes)
  • Interrupt vs. Polling
  • DMA or not
  • Autonomous control allows overlap and latency
    hiding
  • However there is a cost impact

44
A typical interface of I/O devices and an I/O bus
to the CPU-memory bus
45
Processor Interface Issues
  • Processor interface
  • Interrupts
  • Memory mapped I/O
  • I/O Control Structures
  • Polling
  • Interrupts
  • DMA
  • I/O Controllers
  • I/O Processors
  • Capacity, Access Time, Bandwidth
  • Interconnections
  • Busses

46
I/O Controller
Ready, done, error
I/O Address
Command, Interrupt
47
Memory Mapped I/O
Some portions of memory address space are
assigned to I/O device.Reads/Writes to these
space cause data transfer
48
Programmed I/O
  • Polling
  • I/O module performs the action, on behalf of the
    processor
  • But I/O module does not interrupt CPU when I/O is
    done
  • Processor is kept busy checking status of I/O
    module
  • not an efficient way to use the CPU unless the
    device is very fast!
  • Byte by Byte

49
Interrupt-Driven I/O
  • Processor is interrupted when I/O module ready to
    exchange data
  • Processor is free to do other work
  • No needless waiting
  • Consumes a lot of processor time because every
    word read or written passes through the processor
    and requires an interrupt
  • Interrupt per byte

50
Direct Memory Access (DMA)
  • CPU issues request to a DMA module (separate
    module or incorporated into I/O module)
  • DMA module transfers a block of data directly to
    or from memory (without going through CPU)
  • An interrupt is sent when the task is complete
  • Only one interrupt per block, rather than one
    interrupt per byte
  • The CPU is only involved at the beginning and end
    of the transfer
  • The CPU is free to perform other tasks during
    data transfer

51
Input/Output Processors
52
7.4 Reliability, Availability, and Dependability
53
Dependability, Faults, Errors, and Failures
  • Computer system dependability is the quality of
    delivered service such that reliance can
    justifiably be placed on this service. The
    service delivered by a system is its observed
    actual behavior as perceived by other system(s)
    interacting with this system's users. Each module
    also has an ideal specified behavior, where a
    service specification is an agreed description of
    the expected behavior. A system failure occurs
    when the actual behavior deviates from the
    specified behavior. The failure occurred because
    of an error, a defect in that module. The cause
    of an error is a fault. When a fault occurs, it
    creates a latent error, which becomes effective
    when it is activated when the error actually
    affects the delivered service, a failure occurs.
    The time between the occurrence of an error and
    the resulting failure is the error latency. Thus,
    an error is the manifestation in the system of a
    fault, and a failure is the manifestation on the
    service of an error.

54
Faults, Errors, and Failures
  • A fault creates one or more latent errors
  • The properties of errors are
  • A latent error becomes effective once activated
  • An error may cycle between its latent and
    effective states
  • An effective error often propagates from one
    component to another, thereby creating new
    errors.
  • A component failure occurs when the error affects
    the delivered service.
  • These properties are recursive and apply to any
    component

55
Example of Faults, Errors, and Failures
  • Example 1
  • A programming mistake fault
  • The consequence is an error or latent error
  • Upon activation, the error becomes effective
  • When this effective error produces erroneous data
    that affect the delivered service, a failure
    occurs
  • Example 2
  • An alpha particle hitting a DRAM ? fault
  • It changes the memory ? latent error
  • Affected memory word is read ? effective error
  • The effective error produces erroneous data that
    affect the delivered service ? failure (If ECC
    corrected the error, a failure would not occur)

56
Service Accomplishment and Interruption
  • Service accomplishment service is delivered as
    specified
  • Service interruptiondelivered service is
    different from the specified service
  • Transitions between these two states are caused
    by failures or restorations

57
Measure Reliability And Availability
  • Reliability measure of the continuous service
    accomplishment from a reference initial instant
  • Mean time to failure (MTTF)
  • The reciprocal of MTTF is a rate of failures
  • Service interruption is measured as mean time to
    repair (MTTR)
  • Availability measure of the service
    accomplishment w.r.t the alternation between the
    above-mentioned two states
  • Measured as MTTF/(MTTF MTTR)
  • Mean time between failure MTTF MTTR

58
Example
  • A disk subsystem
  • 10 disks, each rated at 1,000,000-hour MTTF
  • 1 SCSI controller, 500,000-hour MTTF
  • 1 power supply, 200,.000-hour MTTF
  • 1 fan, 200,000-hour MTTF
  • 1 SCSI cable, 1000,000-hour MTTF
  • Component lifetimes are exponentially distributed
    (the component age is not important in
    probability of failure), and independent failure

59
Cause of Faults
  • Hardware faults devices that fail
  • Design faults faults in software (usually) and
    hardware design (occasionally)
  • Operation faults mistakes by operations and
    maintenance personnel
  • Environmental faults fire, flood, earthquake,
    power failure, and sabotage

60
Classification of Faults
  • Transient faults exist for a limited time and are
    not recurring
  • Intermittent faults cause a system to oscillate
    between faulty and fault-free operation
  • Permanent faults do not correct themselves with
    the passing of time

61
Reliability Improvements
  • Fault avoidance how to prevent, by construction,
    fault occurrence
  • Fault tolerance how to provide, by redundancy,
    service complying with the service specification
    in spite of faults having occurred or that are
    occurring
  • Error removal how to minimize, by verification,
    the presence of latent errors
  • Error forecasting how to estimate, by
    evaluation, the presence, creation, and
    consequences of errors

62
7.5 RAID Redundant Arrays of Inexpensive Disks
63
3 Important Aspects of File Systems
  • Reliability is anything broken?
  • Redundancy is main hack to increased reliability
  • Availability is the system still available to
    the user?
  • When single point of failure occurs is the rest
    of the system still usable?
  • ECC and various correction schemes help (but
    cannot improve reliability)
  • Data Integrity
  • You must know exactly what is lost when something
    goes wrong

64
Disk Arrays
  • Multiple arms improve throughput, but not
    necessarily improve latency
  • Striping
  • Spreading data over multiple disks
  • Reliability
  • General metric is N devices have 1/N reliability
  • Rule of thumb MTTF of a disk is about 5 years
  • Hence need to add redundant disks to compensate
  • MTTR mean time to repair (or replace) (hours
    for disks)
  • If MTTR is small then the arrays MTTF can be
    pushed out significantly with a fairly small
    redundancy factor

65
Data Striping
  • Bit-level striping split the bit of each bytes
    across multiple disks
  • No. of disks can be a multiple of 8 or divides 8
  • Block-level striping blocks of a file are
    striped across multiple disks with n disks,
    block i goes to disk (i mod n)1
  • Every disk participates in every access
  • Number of I/O per second is the same as a single
    disk
  • Number of data per second is improved
  • Provide high data-transfer rates, but not improve
    reliability

66
Redundant Arrays of Disks
  • Files are "striped" across multiple disks
  • Availability is improved by adding redundant
    disks
  • If a single disk fails, the lost information can
    be reconstructed from redundant information
  • Capacity penalty to store redundant information
  • Bandwidth penalty to update
  • RAID
  • Redundant Arrays of Inexpensive Disks
  • Redundant Arrays of Independent Disks

67
Raid Levels, Reliability, Overhead
Redundantinformation
68
RAID Levels 0 - 1
  • RAID 0 No redundancy (Just block striping)
  • Cheap but unable to withstand even a single
    failure
  • RAID 1 Mirroring
  • Each disk is fully duplicated onto its "shadow
  • Files written to both, if one fails flag it and
    get data from the mirror
  • Reads may be optimized use the disk delivering
    the data first
  • Bandwidth sacrifice on write Logical write two
    physical writes
  • Most expensive solution 100 capacity overhead
  • Targeted for high I/O rate , high availability
    environments
  • RAID 01 stripe first, then mirror the stripe
  • RAID 10 mirror first, then stripe the mirror

69
RAID Levels 2 3
  • RAID 2 Memory style ECC
  • Cuts down number of additional disks
  • Actual number of redundant disks will depend on
    correction model
  • RAID 2 is not used in practice
  • RAID 3 Bit-interleaved parity
  • Reduce the cost of higher availability to 1/N (N
    of disks)
  • Use one additional redundant disk to hold parity
    information
  • Bit interleaving allows corrupted data to be
    reconstructed
  • Interesting trade off between increased time to
    recover from a failure and cost reduction due to
    decreased redundancy
  • Parity sum of all relative disk blocks (module
    2)
  • Hence all disks must be accessed on a write
    potential bottleneck
  • Targeted for high bandwidth applications
    Scientific, Image Processing

70
RAID Level 3 Parity Disk (Cont.)
10010011 11001101 10010011 . . .
P
logical record
1 0 0 1 0 0 1 1
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
0 0 1 1 0 0 0 0
Striped physical records
25 capacity cost for parity in this
configuration (1/N)
71
RAID Levels 4 5 6
  • RAID 4 Block interleaved parity
  • Similar idea as RAID 3 but sum is on a per block
    basis
  • Hence only the parity disk and the target disk
    need be accessed
  • Problem still with concurrent writes since parity
    disk bottlenecks
  • RAID 5 Block interleaved distributed parity
  • Parity blocks are interleaved and distributed on
    all disks
  • Hence parity blocks no longer reside on same disk
  • Probability of write collisions to a single drive
    are reduced
  • Hence higher performance in the consecutive write
    situation
  • RAID 6
  • Similar to RAID 5, but stores extra redundant
    information to guard against multiple disk
    failures

72
Raid 4 5 Illustration
RAID 4
RAID 5
Targeted for mixed applications A logical write
becomes four physical I/Os
73
Small Write Update on RAID 3
74
Small Writes Update on RAID 4/5
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)

XOR

XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
75
7.6 Errors and Failures in Real Systems
76
Examples
  • Berkeleys Tertiary Disk
  • Tandem
  • VAX
  • FCC

77
Berkeleys Tertiary Disk
18 months of operation
SCSI backplane, cables, Ethernetcables were no
more reliable thandata disks
78
7.7 I/O Performance Measures
79
I/O Performance Measures
  • Some similarities with CPU performance measures
  • Bandwidth - 100 utilization is maximum
    throughput
  • Latency - often called response time in the I/O
    world
  • Some unique
  • Diversity - what types can be connected to the
    system
  • Capacity - how many and how much storage on each
    unit
  • Usual relationship between Bandwidth Latency

80
Latency VS. Throughput
  • Response time (latency) the time a task takes
    from the moment it is placed in the buffer until
    the server finishes the task
  • Throughput the average number of tasks completed
    by the server over a time period
  • Knee of the curve (L VS. T) the area where a
    little more throughput results in much longer
    response time, or, a little shorter response time
    results in much lower throughput

Response time Queue Device Service time
81
Latency VS. Throughput
Latency
82
Transaction Model
  • In an interactive environment, faster response
    time is important
  • Impact of inherent long latency
  • Transaction time sum of 3 components
  • Entry time - time it takes user (usually human)
    to enter command
  • System response time - command entry to response
    out
  • Think time - user reaction time between response
    and next entry

83
The Impact of Reducing Response Time
84
Transaction Time Oddity
  • As system response time goes down
  • Think time goes down even more
  • Could conclude
  • That system performance magnifies human talent
  • OR conclude that with a fast system less thinking
    is necessary
  • OR conclude that with a fast system less thinking
    is done

85
7.8 A Little Queuing Theory
86
Introduction
  • Help calculate response time and throughput
  • More interested in long term, steady state than
    in startup ?
  • No. of tasks entering the system No. of tasks
    leaving the system
  • Littles Law
  • Mean number tasks in system arrival rate x mean
    response time
  • Applies to any system in equilibrium, as long as
    nothing in black box is creating or destroying
    tasks

87
Little's Law
  • Mean no. of tasks in system arrival rate mean
    response time
  • We observe a system for Timeobserve
  • No. of tasks completed during Timeobserve is
    Numbertask
  • Sum of the times each task in the system
    Timeaccumulated

Timeaccumulated
Mean number of tasks in system
Timeobserve
Timeaccumulated
Mean response time
Numbertasks
Numbertasks
Timeaccumulated
Timeaccumulated


Timeobserve
Numbertasks
Timeobserve
88
Queuing Theory Notation
  • Queuing models assume state of equilibrium input
    rate output rate
  • Notation
  • Timeserver average time to service a task
  • Service rate 1/Timeserver
  • Timequeue average time/task in queue
  • Timesystem reseponse, average time/task in
    system
  • Timesystem Timeserver Timequeue
  • Arrival rate average number of arriving
    tasks/second
  • Lengthserver average number of tasks in service
  • Lengthqueue average number of tasks in queue
  • Lengthsystem average number of tasks in system
  • Lengthsystem Lengthserver Lengthqueue
  • Server Utilization Arrival Rate / Service Rate
    (0 1) (equilibrium)
  • Littles Law ? Lengthsystem Arrivial Rate
    Timesystem

89
Example
  • An I/O system with a single disk
  • 10 I/O requests per second, average time to
    service 50ms
  • Arrival Rate 10 IOPS Service Rate 1/50ms
    20 IOPS
  • Server Utilization 10/20 0.5
  • Lengthqueue Arrivial Rate Timequeue
  • Lengthserver Arrivial Rate Timeserver
  • Average time to satisfy a disk request 50ms,
    Arrival Rate 200 IOPS
  • Lengthserver Arrivial Rate Timeserver 200
    0.05 10

90
Response Time
  • Service time completions vs. waiting time for a
    busy server randomly arriving event joins a
    queue of arbitrary length when server is busy,
    otherwise serviced immediately (Assume unlimited
    length queues)
  • A single server queue combination of a servicing
    facility that accomodates 1 task at a time
    (server) waiting area (queue) together called
    a system
  • Timequeue (suppose FIFO queue)
  • Timequeue Lengthqueue Timeserver M
  • M mean time to complete service of current task
    when new task arrives if the server is busy
  • A new task can arrive at any instant
  • Use distribution of a random variable histogram?
    curve?
  • M is also called Average Residual Service Time
    (ARST)

91
Response Time (Cont.)
  • Server spends a variable amount of time with
    tasks
  • Weighted mean m1 (f1 x T1 f2 x T2 ... fn x
    Tn)/F (Ff1 f2...)
  • variance (f1 x T12 f2 x T22 ... fn x Tn2)/F
    m12
  • Must keep track of unit of measure (100 ms2 vs.
    0.1 s2 )
  • Squared coefficient of variance C variance/m12
  • Unitless measure (100 ms2 vs. 0.1 s2)
  • Three Distributions
  • Exponential distribution C 1 most short
    relative to average, few others long 90 lt 2.3 x
    average, 63 lt average
  • Hypoexponential distribution C lt 1 most close
    to average, C0.5 ? 90 lt 2.0 x average, only
    57 lt average
  • Hyperexponential distribution C gt 1 further
    from average C2.0 ? 90 lt 2.8 x average, 69 lt
    average
  • ARST 0.5 Weighted Mean Time (1 C)

Avg.
92
Characteristics of Three Distributions
Memory-less C does not vary over time and does
not consider past historyof events
93
Timequeue
  • Derive Timequeue in terms of Timeserver, server
    utilization, and C
  • Timequeue Lengthqueue Timeserver ARST
    server utilization
  • Timequeue (arrival rate Timequeue )
    Timeserver (0.5 Timeserver
    (1C)) server utilization
  • Timequeue Timequeue server utilization
    (0.5 Timeserver (1C)) server
    utilization
  • TimequeueTimeserver(1C)server utilization /
    (2(1-server utilization))
  • For exponential distribution, the C 1.0 ?
  • TimequeueTimeserver server utilization /
    (1-server utilization)

94
Queuing Theory
  • Predict approximate behavior of random variables
  • Make a sharp distinction between past events
    (arithmetic measurements) and future events
    (mathematical predictions)
  • In computer system, future rely on past ?
    arithmetic measurements and mathematical
    predictions (distribution) are blurred
  • Queuing model assumption ? M/G/1
  • Equilibrium system
  • Exponential inter-arrival time (time between two
    successive tasks arriving) or arrival rate
  • Unlimited sources of requests (infinite
    population model)
  • Unlimited queue length, and FIFO queue
  • Server starts on next task immediately after
    finish the prior one
  • All tasks must be completed
  • One server

95
M/G/1 and M/M/1
  • M/G/1 queue
  • M exponentially random request arrival (C 1)
  • M for memoryless or Markovian
  • G general service distribution (no restrictions)
  • 1 server
  • M/M/1 queue
  • Exponential service distribution (C1)
  • Why exponential distribution (used often in
    queuing theory)
  • A collection of many arbitrary distributions acts
    as an exponential distribution (A computer system
    comprises many interacting components)
  • Simpler math

96
Example
  • Processor sends 10 disk I/Os per second requests
    service are exponentially distributed avg.
    disk service 20 ms
  • On average, how utilized is the disk?
  • What is the average time spent in the queue?
  • What is the 90th percentile of the queuing time?
  • What is the number of requests in the queue?
  • What is the average response time for a disk
    request?
  • Answer
  • Arrival rate 10 IOPS, Service rate 1/0.02
    50 IOPS
  • Service utilization 10/50 0.2
  • TimequeueTimeserver server utilization /
    (1-server utilization) 20 0.2 / (1 0.2)
    20 0.25 5ms
  • 90th percentile of the queuing time 2.3 (slide
    91) 5 11.5ms
  • Lengthqueue Arrival rate Timequeue 10
    0.05 0.5
  • Average response time 5 20 25 ms
  • Lengthsystem Arrival rate Timesystem 10
    0.25 2.5

97
7.9 Benchmarks of Storage Performance and
Availability
98
Transaction Processing (TP) Benchmarks
  • TP database applications, OLTP
  • Concerned with I/O rate ( of disk accesses per
    second)
  • Started with anonymous gang of 24 members in 1985
  • DebitCredit benchmark simulate bank tellers and
    has as it bottom line the number of debit/credit
    transactions per second (TPS)
  • Tighter more standard benchmark versions
  • TPC-A, TPC-B
  • TPC-C complex query processing - more accurate
    model of a real bank which models credit analysis
    for loans
  • TPC-D, TPC-H, TPC-R, TPC-W
  • Also must report the cost per TPS
  • Hence machine configuration is considered

99
TP Benchmarks
100
TP Benchmark -- DebitCredit
  • Disk I/O is random reads and writes of 100-byte
    records along with occasional sequential writes
  • 210 disk I/Os per transaction
  • 5000 20000 CPU instructions per disk I/O
  • Performance relies on
  • The efficiency of TP software
  • How many disk accesses can be avoided by keeping
    information in main memory (cache) !!! ? wrong
    for measuring disk I/O
  • Peak TPS
  • Restriction 90 of transactions have lt 2sec.
    response time
  • For TPS to increase, of tellers and the size of
    the account file must also increase (more TPS
    requires more users)
  • To ensure that the benchmark really measure disk
    I/O (not cache)

101
Relationship Among TPS, Tellers, and Account File
Size
The data set generally must scale in size as the
throughput increases
102
SPEC System-Level File Server (SFS) Benchmark
  • SPECsfs - system level file server
  • 1990 agreement by 7 vendors to evaluate NFS
    performance
  • Mix of file reads, writes, and file operations
  • Write 50 done on 8KB blocks, 50 on partial (1,
    2, 4KB)
  • Read 85 full block, 15 partial block
  • Scales the size of FS according to the reported
    throughput
  • For every 100 NFS operations per second, the
    capacity must increase by 1GB
  • Limit average response time, such as 40ms
  • Does not normalize for different configuration
  • Retired in June 2001 due to bugs

103
SPECsfs
Unfair configuration
OverallResponse time(ms)
104
SPECWeb
  • Benchmark for evaluating the performance of WWW
    servers
  • SPECWeb99 workload simulates accesses to a Web
    server provider supporting HP for several
    organizations
  • For each HP, nine files in each of the four
    classes
  • Less than 1KB (small icons) 35 of activity
  • 110KB 50 of activity
  • 10100KB 14 of activity
  • 100KB1MB (large document and image) 1 of
    activity
  • SPECWeb99 results in 2000 for Dell Computers
  • Large memory is used for a file cache to reduce
    disk I/O
  • Impact of Web server software and OS

105
SPECWeb99 Results for Dell
106
Examples of Benchmarks of Dependability and
Availability
  • TPC-C has a dependability requirement must
    handle a single disk failure
  • Brown and Patterson 2000
  • Focus on the effectiveness of fault tolerance in
    systems
  • Availability can be measured by examining the
    variations in system QOS metrics over time as
    faults are injected into the system
  • The initial experiment injected a single disk
    fault
  • Software RAID by Linux, Solaris, and Windows 2000
  • Reconstruct data onto a hot spare disk
  • Disk emulator injects faults
  • SPECWeb99 workload

107
Availability Benchmark for Software RAID
(Red Hat 6.0)
(Solaris 7)
108
Availability Benchmark for Software RAID (Cont.)
(Windows 2000)
109
Availability Benchmark for Software RAID (Cont.)
  • The longer the reconstruction (MMTF), the lower
    the availability
  • Increased reconstruction speed implies decreased
    application performance
  • Linux VS. Solaris and Windows 2000
  • RAID reconstruction
  • Linux and Solaris initiate reconstruction
    automatically
  • Windows 2000 initiate reconstruction manually by
    operators
  • Managing transient faults
  • Linux paranoid
  • Solaris and Windows ignore most transient faults

110
7.10 Crosscutting Issues Interface to OS
111
I/O Interface to the OS
  • OS controls what I/O technique implemented by HW
    will actually be used
  • Early Unix head wedge
  • 16 bit controllers could only transfer 64KB at a
    time
  • Later controllers go to 32 bit devices
  • And are optimized for much larger blocks
  • Unix however did not want to distinguish ? so it
    kept the 64KB bias
  • A new I/O controller designed to efficiently
    transfer 1 MB files would never see more than
    63KB at a time under early UNIX

112
Cache Problems -- Stale Data
  • 3 potential copies - cache, memory, and disk
  • Stale data CPU or I/O system could modify one
    copy without updating the other copies
  • Where the I/O system is connected to the
    computer?
  • CPU cache no stale-data problem
  • All I/O devices and CPU see the most accurate
    data
  • Cache systems multi-level inclusion
  • Disadvantages
  • Lost CPU performance ? all I/O data goes through
    the cache, but little is referenced
  • Arbitration between CPU and I/O for accessing
    cache
  • Memory stale-data problem occurs

113
Connect I/O to Cache
114
Cache-Coherence Problem
Output
Input
A stale
B' stale
115
Stale Data Problem
  • I/O sees stale data on output because memory data
    is not up-to-date
  • Write-through cache OK
  • Write-back cache
  • OS flushes data to make sure they are not in
    cache before output
  • HW checks cache tags to see if they are in cache,
    and only interact with cache if the output tries
    to use in-cache data
  • CPU sees stale data in cache on input after I/O
    has updated memory
  • OS guarantees input data area cannot possibly be
    in cache
  • OS flushes data to make sure they are not in
    cache before input
  • HS checks tags during an input and invalidate the
    data if conflict

116
DMA and Virtual Memory
  • 2 types of addresses Virtual (VA) and Physical
    address (PA)
  • Physically addressed I/O problems for DMA
  • Block size larger than a page
  • Will likely not fall on consecutive physical page
    numbers
  • What happens if the OS victimizes a page when DMA
    is in progress
  • Pinned the page in the memory (not allow to be
    replaced)
  • OS copy user data into the kernel address space
    and then transfer between the kernel address
    space to I/O space

117
Virtual DMA
  • DMA uses VAs that are mapped to PAs during the
    DMA
  • DMA buffer sequential in VM, but can be scattered
    in PM
  • Virtual addresses provide the protection of other
    processes
  • OS updates the address tables of a DMA if a
    process is moved using virtual DMA
  • Virtual DMA requires a register for each page to
    be transferred in the DMA controller, showing the
    protection bits and the physical page
    corresponding to each virtual page

118
Virtual DMA Illustration
119
7.11 Designing an I/O System
120
I/O Design Complexities
  • Huge variety of I/O devices
  • Latency
  • Bandwidth
  • Block size
  • Expansion is a must longer buses, larger power
    and cabinets
  • Balanced Performance and Cost
  • Yet another n-dimensional conflicting
  • Constraint problem
  • Yep - its NP hard just like all the rest
  • Experience plays a big role since the solutions
    are heuristic

121
7 Basic I/O Design Steps
  • List types of I/O devices and buses to be
    supported
  • List physical requirements of I/O devices
  • Volume, power, bus slots, expansion slots or
    cabinets, ...
  • List cost of each device and associated
    controller
  • List the reliability of each I/O device
  • Record CPU resource demands - e.g. cycles
  • Start, support, and complete I/O operation
  • Stalls due to I/O waits
  • Overhead - e.g. cache flushes and context
    switches
  • List memory and bus bandwidth demands
  • Assess the performance of different ways to
    organize I/O devices
  • Of course youll need to get into queuing theory
    to get it right

122
An Example
  • Impact on CPU of reading a disk page directly
    into cache.
  • Assumptions
  • 16KB page, 64-bytes cache-block
  • Addresses of the new page are not in the cache
  • CPU will not access data in the new page
  • 95 displaced cache block will be read in again
    (miss)
  • Write-back cache, 50 are dirty
  • I/O buffers a full cache block before writing to
    cache
  • Access and misses are spread uniformly to all
    cache blocks
  • No other interference between CPU and I/O for the
    cache slots
  • 15,000 misses per 1 million clock cycles when no
    I/O
  • Miss penalty 30 CC, 30 CC mores to write
    dirty-blocks
  • 1 page is brought in every 1 million clock cycles

123
An Example (Cont.)
  • Each page fills 16,384/64 or 256 blocks
  • 0.5 256 30 CCs to write dirty blocks to
    memory
  • 95 256 (244) are referenced again and misses
  • All of them are dirty and will need to be written
    back when replaced
  • 244 60 more CCs to write back
  • In total 128 30 244 60 more CCs than
    1,000,00015,000307,50030
  • 1 decrease in performance

124
Five More Examples
  • Naive cost-performance design and evaluation
  • Availability of the first example
  • Response time of the first example
  • Most realistic cost-performance design and
    evaluation
  • More realistic design for availability and its
    evaluation
Write a Comment
User Comments (0)
About PowerShow.com