Chapter 7 Storage Systems

About This Presentation

Title:

Chapter 7 Storage Systems

Description:

Buses Connecting I/O Devices to CPU/Memory. Reliability, Availability, and ... IOP steals memory cycles. OP Device Address. target device. where cmnds are ... – PowerPoint PPT presentation

Number of Views:1202

Avg rating:3.0/5.0

Slides: 125

Provided by: ccNct

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 7 Storage Systems

1
Chapter 7 Storage Systems
2
Outline

Introduction
Types of Storage Devices
Buses Connecting I/O Devices to CPU/Memory
Reliability, Availability, and Dependability
RAID Redundant Arrays of Inexpensive Disks
Errors and Failures in Real Systems
I/O Performance Measures
A Little Queuing Theory
Benchmarks of Storage Performance and
Availability
Crosscutting Issues
Design An I/O System

3
7.1 Introduction
4
Motivation Who Cares About I/O?

CPU Performance 2 times very 18 months
I/O performance limited by mechanical delays
(disk I/O)
lt 10 per year (I/O per sec or MB per sec)
Amdahl's Law system speed-up limited by the
slowest part!
10 I/O 10x CPU ? 5x Performance (lose
50)
10 I/O 100x CPU ? 10x Performance (lose 90)
I/O bottleneck
Diminishing fraction of time in CPU
Diminishing value of faster CPUs

5
Position of I/O in Computer Architecture Past

An orphan in the architecture domain
I/O meant the non-processor and memory stuff
Disk, tape, LAN, WAN, etc.
Performance was not a major concern
Devices characterized as
Extraneous, non-priority, infrequently used, slow
Exception is swap area of disk
Part of the memory hierarchy
Hence part of system performance but youre hosed
if you use it often

6
Position of I/O in Computer Architecture Now

Trends I/O is the bottleneck
Communication is frequent
Voice response transaction systems, real-time
video
Multimedia expectations
Even standard networks come in gigabit/sec
flavors
For multi-computers
Result
Significant focus on system bus performance
Common bridge to the memory system and the I/O
systems
Critical performance component for the SMP server
platforms

7
System vs. CPU Performance

Care about speed at which user jobs get done
Throughput - how many jobs/time (system view)
Latency - how quick for a single job (user view)
Response time time between when a command is
issued and results appear (user view)
CPU performance main factor when
Job mix fits in the memory ? there are very few
page faults
I/O performance main factor when
The job is too big for the memory - paging is
dominant
When the job reads/writes/creates a lot of
unexpected files
OLTP Decision support -- Database
And then there is graphics specialty I/O devices

8
System Performance

Depends on many factors in the worst case
CPU
Compiler
Operating System
Cache
Main Memory
Memory-IO bus
I/O controller or channel
I/O drivers and interrupt handlers
I/O devices there are many types
Level of autonomous behavior
Amount of internal buffer capacity
Device specific parameters for latency and
throughput

9
I/O Systems
May the same or differentMemory I/O Bus
interrupts
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
I/O Controller
I/O Controller
Graphics
Disk
Disk
Network
10
Keys to a Balanced System

Its all about overlap - I/O vs CPU
Timeworkload Timecpu TimeI/O - Timeoverlap
Consider the benefit of just speeding up one
Amdahls Law (see P4 as well)
Latency vs. Throughput

11
I/O System Design Considerations

Depends on type of I/O device
Size, bandwidth, and type of transaction
Frequency of transaction
Defer vs. do now
Appropriate memory bus utilization
What should the controller do
Programmed I/O
Interrupt vs. polled
Priority or not
DMA
Buffering issues - what happens on over-run
Protection
Validation

12
Types of I/O Devices

Behavior
Read, Write, Both
Once, multiple
Size of average transaction
Bandwidth
Latency
Partner - the speed of the slowest link theory
Human operated (interactive or not)
Machine operated (local or remote)

13
Some I/O Device Characteristics
14
Is I/O Important?

Depends on your application
Business - disks for file system I/O
Graphics - graphics cards or special
co-processors
Parallelism - the communications fabric
Our focus mainline uniprocessing
Storage subsystems (Chapter 7)
Networks (Chapter 8)
Noteworthy Point
The traditional orphan
But now often viewed more as a front line topic

15
7.2 Types of Storage Devices
16
Magnetic Disks

2 important Roles
Long term, non-volatile storage file system and
OS
Lowest level of the memory hierarchy
Most of the virtual memory is physically resident
on the disk
Long viewed as a bottleneck
Mechanical system ? slow
Hence they seem to be an easy target for improved
technology
Disk improvement w.r.t. density have done better
than Moores law

17
Disks are organized into platters, tracks, and
sectors
(1-12 2 sides)
(5000 30000 each surface)
(100 500)
A sector is the smallestunit that can be read or
written
18
Physical Organization Options

Platters one or many
Density - fixed or variable
All tracks have the same no. of sectors?)
Organization - sectors, cylinders, and tracks
Actuators - 1 or more
Heads - 1 per track or 1 per actuator
Access - seek time vs. rotational latency
Seek related to distance but not linearly
Typical rotation 3600 RPM or 15000 RPM
Diameter 1.0 to 3.5 inches

19
Typical Physical Organization

Multiple platters
Metal disks covered with magnetic recording
material on both sides
Single actuator (since they are expensive)
Single R/W head per arm
One arm per surface
All heads therefore over same cylinder
Fixed sector size
Variable density encoding
Disk controller usually built in processor
buffering

20
Characteristics of Three Magnetic Disks of 2000
21
Anatomy of a Read Access

Steps
Memory mapped I/O over bus to controller
Controller starts access
Seek rotational latency wait
Sector is read and buffered (validity checked)
Controller says ready or DMAs to main memory and
then says ready

22
Access Time

Access Time
Seek time time to move the arm over the proper
track
Very non-linear accelerate and decelerate times
complicate
Rotation latency (delay) time for the requested
sector to rotate under the head (on average 0.5
RPM)
Transfer time time to transfer a block of bits
(typically a sector) under the read-write head
Controller overhead the overhead the controller
imposes in performing an I/O access
Queuing delay time spent waiting for a disk to
become free

23
Access Time Example

Assumption average seek time 5ms transfer
rate 40MB/sec 10,000 RPM controller overhead
0.1ms no queuing delay
What is the average time to r/w a 512-byte
sector?
Answer

24
Cost VS Performance

Large-diameter drives have many more data to
amortize the cost of electronics ? lowest cost
per GB
Higher sales volume ? lower manufacturing cost
3.5-inch drive, the largest surviving drive in
2001, also has the highest sales volume, so it
unquestionably has the best price per GB

25
Future of Magnetic Disks

Areal density bits/unit area is common
improvement metric
Trends
Until 1988 29 improvement per year
1988 1996 60 per year
1997 2001 100 per year
2001
20 billion bits per square inch
60 billion bit per square inch demonstrated in
labs

26
Disk Price Trends by Capacity
27
Disk Price Trends Dollars Per MB
28
Cost VS Access Time for SRAM, DRAM, and Magnetic
Disk
29
Disk Alternatives

Optical Disks
Optical compact disks (CD) 0.65GB
Digital video discs, digital versatile disks
(DVD) 4.7GB 2 sides
Rewritable CD (CD-RW) and write-once CD (CD-R)
Rewritable DVD (DVD-RAM) and write-once DVD
(DVD-R)
Robotic Tape Storage
Optical Juke Boxes
Tapes DAT, DLT
Flash memory
Good for embedded systems
Nonvolatile storage and rewritable ROM

30
7.3 Bus Connecting I/O Devices to CPU/Memory
31
I/O Connection Issues
Connecting the CPU to the I/O device world

Shared communication link between subsystems
Typical choice is a bus
Advantages
Shares a common set of wires and protocols ? low
cost
Often based on standard - PCI, SCSI, etc. ?
portable and versatility
Disadvantages
Poor performance
Multiple devices imply arbitration and therefore
contention
Can be a bottleneck

32
I/O Connection Issues Multiple Buses

I/O bus
Lengthy
Many types of connected devices
Wide range in device bandwidth
Follow a bus standard
Accept devices varying in latency and bandwidth
capabilities

CPU-memory bus
Short
High speed
Match to the memory system to maximize CPU-memory
bandwidth
Knows all types of devices that must connect
together

33
Typical Bus Synchronous Read Transaction
34
Bus Design Decisions

Other things to standardize as well
Connectors
Voltage and current levels
Physical encoding of control signals
Protocols for good citizenship

35
Bus Design Decisions (Cont.)

Bus master devices that can initiate a R/W
transaction
Multiple multiple CPUs, I/O device initiate bus
transactions
Multiple bus masters need arbitration (fixed
priority or random)
Split transaction for multiple masters
Use packets for the full transaction (does not
hold the bus)
A read transaction is broken into read-request
and memory-reply transactions
Make the bus available for other masters while
the data is read/written from/to the specified
address
Transactions must be tagged
Higher bandwidth, but also higher latency

36
Split Transaction Bus
37
Bus Design Decisions (Cont.)

Clocking Synchronous vs. Asynchronous
Synchronous
Include a clock in the control lines, and a fixed
protocol for address and data relative to the
clock
Fast and inexpensive (little or no logic to
determine what's next)
Everything on the bus must run at the same clock
rate
Short length (due to clock skew)
CPU-memory buses
Asynchronous
Easier to connect a wide variety of devices, and
lengthen the bus
Scale better with technological changes
I/O buses

38
Synchronous or Asynchronous?
39
Standards

The Good
Let the computer and I/O-device designers work
independently
Provides a path for second party (e.g. cheaper)
competition
The Bad
Become major performance anchors
Inhibit change
How to create a standard
Bottom-up
Company tries to get standards committee to
approve its latest philosophy in hopes that
theyll get the jump on the others (e.g. S bus,
PC-AT bus, ...)
De facto standards
Top-down
Design by committee (PCI, SCSI, ...)

40
Some Sample I/O Bus Designs
41
Some Sample Serial I/O Bus
Often used in embedded computers
42
CPU-Memory Buses Found in 2001 Servers
Crossbar Switch
43
Connecting the I/O Bus

To main memory
I/O bus and CPU-memory bus may the same
I/O commands on bus could interfere with CPU's
access memory
Since cache misses are rare, does not tend to
stall the CPU
Problem is lack of coherency
Currently, we consider this case
To cache
Access
Memory-mapped I/O or distinct instruction (I/O
opcodes)
Interrupt vs. Polling
DMA or not
Autonomous control allows overlap and latency
hiding
However there is a cost impact

44
A typical interface of I/O devices and an I/O bus
to the CPU-memory bus
45
Processor Interface Issues

Processor interface
Interrupts
Memory mapped I/O
I/O Control Structures
Polling
Interrupts
DMA
I/O Controllers
I/O Processors
Capacity, Access Time, Bandwidth
Interconnections
Busses

46
I/O Controller
Ready, done, error
I/O Address
Command, Interrupt
47
Memory Mapped I/O
Some portions of memory address space are
assigned to I/O device.Reads/Writes to these
space cause data transfer
48
Programmed I/O

Polling
I/O module performs the action, on behalf of the
processor
But I/O module does not interrupt CPU when I/O is
done
Processor is kept busy checking status of I/O
module
not an efficient way to use the CPU unless the
device is very fast!
Byte by Byte

49
Interrupt-Driven I/O

Processor is interrupted when I/O module ready to
exchange data
Processor is free to do other work
No needless waiting
Consumes a lot of processor time because every
word read or written passes through the processor
and requires an interrupt
Interrupt per byte

50
Direct Memory Access (DMA)

CPU issues request to a DMA module (separate
module or incorporated into I/O module)
DMA module transfers a block of data directly to
or from memory (without going through CPU)
An interrupt is sent when the task is complete
Only one interrupt per block, rather than one
interrupt per byte
The CPU is only involved at the beginning and end
of the transfer
The CPU is free to perform other tasks during
data transfer

51
Input/Output Processors
52
7.4 Reliability, Availability, and Dependability
53
Dependability, Faults, Errors, and Failures

Computer system dependability is the quality of
delivered service such that reliance can
justifiably be placed on this service. The
service delivered by a system is its observed
actual behavior as perceived by other system(s)
interacting with this system's users. Each module
also has an ideal specified behavior, where a
service specification is an agreed description of
the expected behavior. A system failure occurs
when the actual behavior deviates from the
specified behavior. The failure occurred because
of an error, a defect in that module. The cause
of an error is a fault. When a fault occurs, it
creates a latent error, which becomes effective
when it is activated when the error actually
affects the delivered service, a failure occurs.
The time between the occurrence of an error and
the resulting failure is the error latency. Thus,
an error is the manifestation in the system of a
fault, and a failure is the manifestation on the
service of an error.

54
Faults, Errors, and Failures

A fault creates one or more latent errors
The properties of errors are
A latent error becomes effective once activated
An error may cycle between its latent and
effective states
An effective error often propagates from one
component to another, thereby creating new
errors.
A component failure occurs when the error affects
the delivered service.
These properties are recursive and apply to any
component

55
Example of Faults, Errors, and Failures

Example 1
A programming mistake fault
The consequence is an error or latent error
Upon activation, the error becomes effective
When this effective error produces erroneous data
that affect the delivered service, a failure
occurs
Example 2
An alpha particle hitting a DRAM ? fault
It changes the memory ? latent error
Affected memory word is read ? effective error
The effective error produces erroneous data that
affect the delivered service ? failure (If ECC
corrected the error, a failure would not occur)

56
Service Accomplishment and Interruption

Service accomplishment service is delivered as
specified
Service interruptiondelivered service is
different from the specified service
Transitions between these two states are caused
by failures or restorations

57
Measure Reliability And Availability

Reliability measure of the continuous service
accomplishment from a reference initial instant
Mean time to failure (MTTF)
The reciprocal of MTTF is a rate of failures
Service interruption is measured as mean time to
repair (MTTR)
Availability measure of the service
accomplishment w.r.t the alternation between the
above-mentioned two states
Measured as MTTF/(MTTF MTTR)
Mean time between failure MTTF MTTR

58
Example

A disk subsystem
10 disks, each rated at 1,000,000-hour MTTF
1 SCSI controller, 500,000-hour MTTF
1 power supply, 200,.000-hour MTTF
1 fan, 200,000-hour MTTF
1 SCSI cable, 1000,000-hour MTTF
Component lifetimes are exponentially distributed
(the component age is not important in
probability of failure), and independent failure

59
Cause of Faults

Hardware faults devices that fail
Design faults faults in software (usually) and
hardware design (occasionally)
Operation faults mistakes by operations and
maintenance personnel
Environmental faults fire, flood, earthquake,
power failure, and sabotage

60
Classification of Faults

Transient faults exist for a limited time and are
not recurring
Intermittent faults cause a system to oscillate
between faulty and fault-free operation
Permanent faults do not correct themselves with
the passing of time

61
Reliability Improvements

Fault avoidance how to prevent, by construction,
fault occurrence
Fault tolerance how to provide, by redundancy,
service complying with the service specification
in spite of faults having occurred or that are
occurring
Error removal how to minimize, by verification,
the presence of latent errors
Error forecasting how to estimate, by
evaluation, the presence, creation, and
consequences of errors

62
7.5 RAID Redundant Arrays of Inexpensive Disks
63
3 Important Aspects of File Systems

Reliability is anything broken?
Redundancy is main hack to increased reliability
Availability is the system still available to
the user?
When single point of failure occurs is the rest
of the system still usable?
ECC and various correction schemes help (but
cannot improve reliability)
Data Integrity
You must know exactly what is lost when something
goes wrong

64
Disk Arrays

Multiple arms improve throughput, but not
necessarily improve latency
Striping
Spreading data over multiple disks
Reliability
General metric is N devices have 1/N reliability
Rule of thumb MTTF of a disk is about 5 years
Hence need to add redundant disks to compensate
MTTR mean time to repair (or replace) (hours
for disks)
If MTTR is small then the arrays MTTF can be
pushed out significantly with a fairly small
redundancy factor

65
Data Striping

Bit-level striping split the bit of each bytes
across multiple disks
No. of disks can be a multiple of 8 or divides 8
Block-level striping blocks of a file are
striped across multiple disks with n disks,
block i goes to disk (i mod n)1
Every disk participates in every access
Number of I/O per second is the same as a single
disk
Number of data per second is improved
Provide high data-transfer rates, but not improve
reliability

66
Redundant Arrays of Disks

Files are "striped" across multiple disks
Availability is improved by adding redundant
disks
If a single disk fails, the lost information can
be reconstructed from redundant information
Capacity penalty to store redundant information
Bandwidth penalty to update
RAID
Redundant Arrays of Inexpensive Disks
Redundant Arrays of Independent Disks

67
Raid Levels, Reliability, Overhead
Redundantinformation
68
RAID Levels 0 - 1

RAID 0 No redundancy (Just block striping)
Cheap but unable to withstand even a single
failure
RAID 1 Mirroring
Each disk is fully duplicated onto its "shadow
Files written to both, if one fails flag it and
get data from the mirror
Reads may be optimized use the disk delivering
the data first
Bandwidth sacrifice on write Logical write two
physical writes
Most expensive solution 100 capacity overhead
Targeted for high I/O rate , high availability
environments
RAID 01 stripe first, then mirror the stripe
RAID 10 mirror first, then stripe the mirror

69
RAID Levels 2 3

RAID 2 Memory style ECC
Cuts down number of additional disks
Actual number of redundant disks will depend on
correction model
RAID 2 is not used in practice
RAID 3 Bit-interleaved parity
Reduce the cost of higher availability to 1/N (N
of disks)
Use one additional redundant disk to hold parity
information
Bit interleaving allows corrupted data to be
reconstructed
Interesting trade off between increased time to
recover from a failure and cost reduction due to
decreased redundancy
Parity sum of all relative disk blocks (module
2)
Hence all disks must be accessed on a write
potential bottleneck
Targeted for high bandwidth applications
Scientific, Image Processing

70
RAID Level 3 Parity Disk (Cont.)
10010011 11001101 10010011 . . .
P
logical record
1 0 0 1 0 0 1 1
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
0 0 1 1 0 0 0 0
Striped physical records
25 capacity cost for parity in this
configuration (1/N)
71
RAID Levels 4 5 6

RAID 4 Block interleaved parity
Similar idea as RAID 3 but sum is on a per block
basis
Hence only the parity disk and the target disk
need be accessed
Problem still with concurrent writes since parity
disk bottlenecks
RAID 5 Block interleaved distributed parity
Parity blocks are interleaved and distributed on
all disks
Hence parity blocks no longer reside on same disk
Probability of write collisions to a single drive
are reduced
Hence higher performance in the consecutive write
situation
RAID 6
Similar to RAID 5, but stores extra redundant
information to guard against multiple disk
failures

72
Raid 4 5 Illustration
RAID 4
RAID 5
Targeted for mixed applications A logical write
becomes four physical I/Os
73
Small Write Update on RAID 3
74
Small Writes Update on RAID 4/5
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)

XOR

XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
75
7.6 Errors and Failures in Real Systems
76
Examples

Berkeleys Tertiary Disk
Tandem
VAX
FCC

77
Berkeleys Tertiary Disk
18 months of operation
SCSI backplane, cables, Ethernetcables were no
more reliable thandata disks
78
7.7 I/O Performance Measures
79
I/O Performance Measures

Some similarities with CPU performance measures
Bandwidth - 100 utilization is maximum
throughput
Latency - often called response time in the I/O
world
Some unique
Diversity - what types can be connected to the
system
Capacity - how many and how much storage on each
unit
Usual relationship between Bandwidth Latency

80
Latency VS. Throughput

Response time (latency) the time a task takes
from the moment it is placed in the buffer until
the server finishes the task
Throughput the average number of tasks completed
by the server over a time period
Knee of the curve (L VS. T) the area where a
little more throughput results in much longer
response time, or, a little shorter response time
results in much lower throughput

Response time Queue Device Service time
81
Latency VS. Throughput
Latency
82
Transaction Model

In an interactive environment, faster response
time is important
Impact of inherent long latency
Transaction time sum of 3 components
Entry time - time it takes user (usually human)
to enter command
System response time - command entry to response
out
Think time - user reaction time between response
and next entry

83
The Impact of Reducing Response Time
84
Transaction Time Oddity

As system response time goes down
Think time goes down even more
Could conclude
That system performance magnifies human talent
OR conclude that with a fast system less thinking
is necessary
OR conclude that with a fast system less thinking
is done

85
7.8 A Little Queuing Theory
86
Introduction

Help calculate response time and throughput
More interested in long term, steady state than
in startup ?
No. of tasks entering the system No. of tasks
leaving the system
Littles Law
Mean number tasks in system arrival rate x mean
response time
Applies to any system in equilibrium, as long as
nothing in black box is creating or destroying
tasks

87
Little's Law

Mean no. of tasks in system arrival rate mean
response time
We observe a system for Timeobserve
No. of tasks completed during Timeobserve is
Numbertask
Sum of the times each task in the system
Timeaccumulated

Timeaccumulated
Mean number of tasks in system
Timeobserve
Timeaccumulated
Mean response time
Numbertasks
Numbertasks
Timeaccumulated
Timeaccumulated

Timeobserve
Numbertasks
Timeobserve
88
Queuing Theory Notation

Queuing models assume state of equilibrium input
rate output rate
Notation
Timeserver average time to service a task
Service rate 1/Timeserver
Timequeue average time/task in queue
Timesystem reseponse, average time/task in
system
Timesystem Timeserver Timequeue
Arrival rate average number of arriving
tasks/second
Lengthserver average number of tasks in service
Lengthqueue average number of tasks in queue
Lengthsystem average number of tasks in system
Lengthsystem Lengthserver Lengthqueue
Server Utilization Arrival Rate / Service Rate
(0 1) (equilibrium)
Littles Law ? Lengthsystem Arrivial Rate
Timesystem

89
Example

An I/O system with a single disk
10 I/O requests per second, average time to
service 50ms
Arrival Rate 10 IOPS Service Rate 1/50ms
20 IOPS
Server Utilization 10/20 0.5

Lengthqueue Arrivial Rate Timequeue
Lengthserver Arrivial Rate Timeserver
Average time to satisfy a disk request 50ms,
Arrival Rate 200 IOPS
Lengthserver Arrivial Rate Timeserver 200
0.05 10

90
Response Time

Service time completions vs. waiting time for a
busy server randomly arriving event joins a
queue of arbitrary length when server is busy,
otherwise serviced immediately (Assume unlimited
length queues)
A single server queue combination of a servicing
facility that accomodates 1 task at a time
(server) waiting area (queue) together called
a system
Timequeue (suppose FIFO queue)
Timequeue Lengthqueue Timeserver M
M mean time to complete service of current task
when new task arrives if the server is busy
A new task can arrive at any instant
Use distribution of a random variable histogram?
curve?
M is also called Average Residual Service Time
(ARST)

91
Response Time (Cont.)

Server spends a variable amount of time with
tasks
Weighted mean m1 (f1 x T1 f2 x T2 ... fn x
Tn)/F (Ff1 f2...)
variance (f1 x T12 f2 x T22 ... fn x Tn2)/F
m12
Must keep track of unit of measure (100 ms2 vs.
0.1 s2 )
Squared coefficient of variance C variance/m12
Unitless measure (100 ms2 vs. 0.1 s2)
Three Distributions
Exponential distribution C 1 most short
relative to average, few others long 90 lt 2.3 x
average, 63 lt average
Hypoexponential distribution C lt 1 most close
to average, C0.5 ? 90 lt 2.0 x average, only
57 lt average
Hyperexponential distribution C gt 1 further
from average C2.0 ? 90 lt 2.8 x average, 69 lt
average
ARST 0.5 Weighted Mean Time (1 C)

Avg.
92
Characteristics of Three Distributions
Memory-less C does not vary over time and does
not consider past historyof events
93
Timequeue

Derive Timequeue in terms of Timeserver, server
utilization, and C
Timequeue Lengthqueue Timeserver ARST
server utilization
Timequeue (arrival rate Timequeue )
Timeserver (0.5 Timeserver
(1C)) server utilization
Timequeue Timequeue server utilization
(0.5 Timeserver (1C)) server
utilization
TimequeueTimeserver(1C)server utilization /
(2(1-server utilization))
For exponential distribution, the C 1.0 ?
TimequeueTimeserver server utilization /
(1-server utilization)

94
Queuing Theory

Predict approximate behavior of random variables
Make a sharp distinction between past events
(arithmetic measurements) and future events
(mathematical predictions)
In computer system, future rely on past ?
arithmetic measurements and mathematical
predictions (distribution) are blurred
Queuing model assumption ? M/G/1
Equilibrium system
Exponential inter-arrival time (time between two
successive tasks arriving) or arrival rate
Unlimited sources of requests (infinite
population model)
Unlimited queue length, and FIFO queue
Server starts on next task immediately after
finish the prior one
All tasks must be completed
One server

95
M/G/1 and M/M/1

M/G/1 queue
M exponentially random request arrival (C 1)
M for memoryless or Markovian
G general service distribution (no restrictions)
1 server
M/M/1 queue
Exponential service distribution (C1)
Why exponential distribution (used often in
queuing theory)
A collection of many arbitrary distributions acts
as an exponential distribution (A computer system
comprises many interacting components)
Simpler math

96
Example

Processor sends 10 disk I/Os per second requests
service are exponentially distributed avg.
disk service 20 ms
On average, how utilized is the disk?
What is the average time spent in the queue?
What is the 90th percentile of the queuing time?
What is the number of requests in the queue?
What is the average response time for a disk
request?
Answer
Arrival rate 10 IOPS, Service rate 1/0.02
50 IOPS
Service utilization 10/50 0.2
TimequeueTimeserver server utilization /
(1-server utilization) 20 0.2 / (1 0.2)
20 0.25 5ms
90th percentile of the queuing time 2.3 (slide
91) 5 11.5ms
Lengthqueue Arrival rate Timequeue 10
0.05 0.5
Average response time 5 20 25 ms
Lengthsystem Arrival rate Timesystem 10
0.25 2.5

97
7.9 Benchmarks of Storage Performance and
Availability
98
Transaction Processing (TP) Benchmarks

TP database applications, OLTP
Concerned with I/O rate ( of disk accesses per
second)
Started with anonymous gang of 24 members in 1985
DebitCredit benchmark simulate bank tellers and
has as it bottom line the number of debit/credit
transactions per second (TPS)
Tighter more standard benchmark versions
TPC-A, TPC-B
TPC-C complex query processing - more accurate
model of a real bank which models credit analysis
for loans
TPC-D, TPC-H, TPC-R, TPC-W
Also must report the cost per TPS
Hence machine configuration is considered

99
TP Benchmarks
100
TP Benchmark -- DebitCredit

Disk I/O is random reads and writes of 100-byte
records along with occasional sequential writes
210 disk I/Os per transaction
5000 20000 CPU instructions per disk I/O
Performance relies on
The efficiency of TP software
How many disk accesses can be avoided by keeping
information in main memory (cache) !!! ? wrong
for measuring disk I/O
Peak TPS
Restriction 90 of transactions have lt 2sec.
response time
For TPS to increase, of tellers and the size of
the account file must also increase (more TPS
requires more users)
To ensure that the benchmark really measure disk
I/O (not cache)

101
Relationship Among TPS, Tellers, and Account File
Size
The data set generally must scale in size as the
throughput increases
102
SPEC System-Level File Server (SFS) Benchmark

SPECsfs - system level file server
1990 agreement by 7 vendors to evaluate NFS
performance
Mix of file reads, writes, and file operations
Write 50 done on 8KB blocks, 50 on partial (1,
2, 4KB)
Read 85 full block, 15 partial block
Scales the size of FS according to the reported
throughput
For every 100 NFS operations per second, the
capacity must increase by 1GB
Limit average response time, such as 40ms
Does not normalize for different configuration
Retired in June 2001 due to bugs

103
SPECsfs
Unfair configuration
OverallResponse time(ms)
104
SPECWeb

Benchmark for evaluating the performance of WWW
servers
SPECWeb99 workload simulates accesses to a Web
server provider supporting HP for several
organizations
For each HP, nine files in each of the four
classes
Less than 1KB (small icons) 35 of activity
110KB 50 of activity
10100KB 14 of activity
100KB1MB (large document and image) 1 of
activity
SPECWeb99 results in 2000 for Dell Computers
Large memory is used for a file cache to reduce
disk I/O
Impact of Web server software and OS

105
SPECWeb99 Results for Dell
106
Examples of Benchmarks of Dependability and
Availability

TPC-C has a dependability requirement must
handle a single disk failure
Brown and Patterson 2000
Focus on the effectiveness of fault tolerance in
systems
Availability can be measured by examining the
variations in system QOS metrics over time as
faults are injected into the system
The initial experiment injected a single disk
fault
Software RAID by Linux, Solaris, and Windows 2000
Reconstruct data onto a hot spare disk
Disk emulator injects faults
SPECWeb99 workload

107
Availability Benchmark for Software RAID
(Red Hat 6.0)
(Solaris 7)
108
Availability Benchmark for Software RAID (Cont.)
(Windows 2000)
109
Availability Benchmark for Software RAID (Cont.)

The longer the reconstruction (MMTF), the lower
the availability
Increased reconstruction speed implies decreased
application performance
Linux VS. Solaris and Windows 2000
RAID reconstruction
Linux and Solaris initiate reconstruction
automatically
Windows 2000 initiate reconstruction manually by
operators
Managing transient faults
Linux paranoid
Solaris and Windows ignore most transient faults

110
7.10 Crosscutting Issues Interface to OS
111
I/O Interface to the OS

OS controls what I/O technique implemented by HW
will actually be used
Early Unix head wedge
16 bit controllers could only transfer 64KB at a
time
Later controllers go to 32 bit devices
And are optimized for much larger blocks
Unix however did not want to distinguish ? so it
kept the 64KB bias
A new I/O controller designed to efficiently
transfer 1 MB files would never see more than
63KB at a time under early UNIX

112
Cache Problems -- Stale Data

3 potential copies - cache, memory, and disk
Stale data CPU or I/O system could modify one
copy without updating the other copies
Where the I/O system is connected to the
computer?
CPU cache no stale-data problem
All I/O devices and CPU see the most accurate
data
Cache systems multi-level inclusion
Disadvantages
Lost CPU performance ? all I/O data goes through
the cache, but little is referenced
Arbitration between CPU and I/O for accessing
cache
Memory stale-data problem occurs

113
Connect I/O to Cache
114
Cache-Coherence Problem
Output
Input
A stale
B' stale
115
Stale Data Problem

I/O sees stale data on output because memory data
is not up-to-date
Write-through cache OK
Write-back cache
OS flushes data to make sure they are not in
cache before output
HW checks cache tags to see if they are in cache,
and only interact with cache if the output tries
to use in-cache data
CPU sees stale data in cache on input after I/O
has updated memory
OS guarantees input data area cannot possibly be
in cache
OS flushes data to make sure they are not in
cache before input
HS checks tags during an input and invalidate the
data if conflict

116
DMA and Virtual Memory

2 types of addresses Virtual (VA) and Physical
address (PA)
Physically addressed I/O problems for DMA
Block size larger than a page
Will likely not fall on consecutive physical page
numbers
What happens if the OS victimizes a page when DMA
is in progress
Pinned the page in the memory (not allow to be
replaced)
OS copy user data into the kernel address space
and then transfer between the kernel address
space to I/O space

117
Virtual DMA

DMA uses VAs that are mapped to PAs during the
DMA
DMA buffer sequential in VM, but can be scattered
in PM
Virtual addresses provide the protection of other
processes
OS updates the address tables of a DMA if a
process is moved using virtual DMA
Virtual DMA requires a register for each page to
be transferred in the DMA controller, showing the
protection bits and the physical page
corresponding to each virtual page