Title: IO Performance Measures:
1I/O Performance Measures
- Austin Orgah
- Chapter 8.6,7,8,9
2Examples from Disk and File Systems
- How should we compare I/O systems?
- - This is complex because I/O performance
depends on many aspects of the I/O system. - - Design can also make complex trade-offs
between response time and throughput, making it
impossible to measure just one aspect in
isolation.
3Examples from Disk and File Systems contd
- For Example
- Handling a request as early as possible generally
minimizes response time, although greater
throughput can be achieved handling related
requests together. - Throughput may be increased on a disk by grouping
requests that access locations that are close
together. - This will increase response time for some
requests, probably leading to a larger variation
in response time.
4Examples from Disk and File Systems contd
- Though throughput will be increased, some
benchmarks constrain the maximum response time to
any request, making any of the optimizations(disk
and file) potentially problematic.
5- Some benchmarks are proposed for determining the
performance of disk systems. - These benchmarks are affected by a variety of
system features such as - Disk technology
- How the disks are connected
- The memory system
- The processor
- The file system provided by the operating system
6Important Note
- Terminology/units
- Performance of I/O systems depends on the rate at
which system transfers data. - The transfer rate depends on the clock rate,
which is in GHz109 cycles/sec. It is usually
quoted in GB/sec. - In I/O systems GBs are measured using base 10. So
GB109 1,000,000,000 bytes. - Memory is measured using base 2. GB230
1,073,741,824.
7Important Note Contd
- In base 10 1K 1000
- In base 2 1K 1024
- For calculation, instead of converting between
the two, treating the two as if they are equal
will introduce little error.
8Benchmarks
- Transaction Processing I/O
- File System and Web I/O
9Transaction Processing I/O Benchmarks
- Transaction Processing(TP) A type of
application that involves handling small short
operations(transactions) that require both I/O
and computation. Its applications typically have
both response time requirements and a performance
measurement based on the throughput of
transactions. - TP are mainly concerned with I/O rate measured as
the number of disk accesses/sec instead of data
rate measured in bytes of data per/sec.
10Transaction Processing I/O Benchmarks
- I/O rate Performance measure of I/Os per unit
time, such as reads per/sec. - Data rate performance measure of bytes per unit
time, such as GB/sec. - TP involve changes to a large database, with the
system meeting some response time requirements as
well as gracefully handling certain types of
failures. For example banks use TP systems.
11Transaction Processing I/O Benchmarks
- The best-known set of benchmarks is developed by
the Transaction Processing Council (TPC). - TPC-C created in 1992, simulates a complex
query environment. - TPC-H models ad hoc decision support- the
queries are unrelated and knowledge of past
queries cannot be used to optimize future queries.
12Transaction Processing I/O Benchmarks
- TPC-R simulates a business decision support
system where users run a standard set of queries. - TPC-W web based transaction benchmark that
simulates the activities of a business-oriented
transactional web server. - Pour plus information visiter sur le internet
www.tpc.org.
13File System and Web I/O Benchmarks
- File systems stored on disks have a different
access pattern. - Measurement of UNIX file systems (engineering
environment) show that - 80 of accesses are to files
- 90 of all file accesses are to data with
sequential. addresses on the disk. - 67 of the accesses are reads.
- 27 were writes.
- 6 were read-modify accesses which read, modified
and rewrote data to the same location. - These measurements have led to the creation of
synthetic file system benchmarks.
14File System and Web I/O Benchmarks
- A popular synthetic file system benchmark with
its 5 phases using 70 files - MakeDir Constructs a directory subtree that is
identical in structure to the given directory
subtree. - Copy Copies every file from the source subtree
to the target subtree. - ScanDir Recursively traverses a directory
subtree and examines the status of every file in
it. - ReadAll Scans every byte of every file in a
subtree once. - Make Compiles and links all the files in a
subtree.
15File System and Web I/O Benchmarks
- In addition to processor benchmarks, SPEC offers
a file server and a web server benchmarks.
(SPECSFS) and (SPECWeb). - SPECSFS is a benchmark for measuring NFS(Network
File System) performance using a script of file
server requests. It tests performance of the I/O
system, disk, and network I/O and the processor.
It is a throughput-oriented benchmark with
important response time requirements. - SPECWeb is a web server benchmark that simulates
multiple clients requesting both static and
dynamic pages from a server. Also clients posting
data to the server.
16I/O Performance Versus Processor Performance
- Impact of I/O on System Performance
- Suppose we have a benchmark that executes in 100s
of elapsed time, where 90s is CPU time the rest
is I/O time. If CPU time improves by 50 per year
for the next five years but I/O time doesnt ,
how much faster will our program run at the end
of five years? - Elapsed time CPU time I/O time
- 100 90 I/O time
- Therefore I/O time 10s.
17(No Transcript)
18- CPU improvement over 5 years is
- 90/12 7.5
- The improvement in elapsed time is
- 100/22 4.5
- So the I/O time increased from 10 to 45 of the
elapsed time.
19Designing an I/O System
- Two primary specifications that designers
encounter in I/O systems - Latency Constraints
- Bandwidth Constraints
- Knowledge of the traffic pattern affects the
design and analysis.
20- Latency Constraints involve ensuring that the
latency to complete an I/O operation is bounded
by a certain amount. - Designing an I/O system to meet a set of
bandwidth constraints given a workload. - Find the weakest link in the I/O system which is
the component in the I/O path that will constrain
the design. Depending on the workload, this
component can be anywhere, including the CPU, the
memory system, the back plane bus, the I/O bus,
the I/O controllers or the devices. The workload
and configuration limits may dictate where the
weakest link is located. - Configure this component to sustain the required
bandwidth. - Determine the requirements for the rest of the
system and configure them to support this
bandwidth.
21I/O System Design Example
- A CPU that sustains 3 billion instructions/sec
and averages 100,000 instructions in the
operation system per I/O operation. - A memory backplane bus capable of sustaining a
transfer rate of 1000 MB/sec. - SCSI Ultra320 controllers with a transfer rate of
320 MB/sec and accommodating up to 7 disks. - Disk drives with read/write bandwidths of 75
MB/sec and an average seek plus rotational
latency of 6 ms. - If the workload consists of 64 KB reads(where
the block is sequential in a track) and the user
program needs 200,000 instructions per I/O
operation, find the max sustainable I/O rate and
the number of disks and SCSI controllers
required. Assume that the reads can always be
done on an idle disk if one exists(i.e, ignore
disk conflicts).
22Real Stuff A Digital Camera
- Digital cameras are embedded computers with
removable, writable, nonvolatile, storage, and
interesting I/O devices. See Sanyo VPC-SX500
23(No Transcript)
24Digital Camera Contd
- When powered on, the microprocessor first runs
diagnostics on all components and writes any
errors messages to the liquid crystal
display(LCD). When a picture is about to be
taken, the photographer holds the shutter halfway
so that the microprocessor can take a light
reading. The microprocessor then keeps the
shutter open to get the necessary light which is
captured by a charged couple device(CCD) as red,
green, and blue pixels.
25Digital Camera Contd
- The pixels are then scanned out row and then
passed through routines for white balance, color
and aliasing correction and then stored in a 4MB
frame buffer. The next step is to compress the
image into a standard format such as JPEG and
store it in the removable flash memory. The
microprocessor updates the LCD display to show
that there is room for one less picture. The
camera has other features such as video
recording, sleep mode, LCD display amongst many.
26Digital Camera Contd
- The camera allows the use of a Microdrive disk
instead of CompactFlash memory. Fig 8.15 shows
the comparison of both.
27Digital Camera Contd
- The electronic brain of the Sanyo camera is an
embedded computer with several special functions
embedded on the chip. These kind of chips are
called systems on a chip(SOC). The SOC integrate
into a single chip all the parts that were found
on a small printed circuit board of the past.
They reduce size and lowers the power compared to
less integrated solutions. The SOC enables the
camera to operate on half the number of batteries
and to offer a smaller form factor than
competitors cameras. - Fig 8.16
28(No Transcript)
29- The SOC has two buses, the 16-bit bus is for the
many slower I/O devices like the Smart Media
interface, program and data memory, and DMA. The
32-bit bus is for the SDRAM, the signal
processor(which is connected to the CCD), the
Motion JPEG encoder, and the NTSC/PAL
encoder(which is connected to the LCD). The SOC
has a large variety of I/O buses it must
integrate unlike desktop microprocessors. This
700 mW chip contains 1.8M transistors in a 10.5 x
10.5 mm die implemented using a 0.35-micron
process
30Fallacies and Pitfalls
- Fallacy the rated mean time to failure of disks
is 1,200,000 hours or almost 140 years so disks
practically never fail. - This number exceeds the lifetime of a disk. For
this large MTTF to make some sense, the
manufacturer's argue that this calculation will
correspond to a user who buys a disk, and keeps
replacing it every 5 years. (lifespan of the
disk).
31- Fallacy Magnetic disk storage is on its last
legs and will be replaced shortly. - This is a fallacy and a pitfall. Magnetic bubbles
memories, optical storage, and holographic
storage are unsuccessful contenders. None have
matched the combination of the characteristics
that favor magnetic disks high reliability,
nonvolatility, low cost, reasonable access time
etc. magnetic storage rather improves at the same
or faster pace that is sustained over the past 25
years.
32- Fallacy A 100 MB/sec bus can transfer 100 MB of
data in 1 sec. - First you cannot use 100 of any computer
resource. For a bus you would be fortunate to get
70 to 80 of the peak bandwidth. Time to send
the address, time to acknowledge the signals and
stalls while waiting to use a busy bus are
deterrents to 100 utilization of a bus. Also the
MB of storage and the MB/sec of bandwidth do not
agree.
33- Pitfalls Using the peak transfer rate of a
portion of the I/O system to make performance
projections or performance comparisons. - The components of an I/O system, from the devices
to the controllers to the buses are specified
using their peak bandwidth. These peak bandwidths
measurements are often based on unrealistic
assumptions about the system or are unattainable
because of other system limitations. Amdahls law
tells us that the throughput of an I/O system
will be limited by the lowest-performance
component in the I/O path.
34- Pitfall Using magnetic tapes to back up disks.
- This is a fallacy and a pitfall. Tapes use
similar technology to disks. The cost difference
between disks and tapes is based on the fact that
the rotating disk have lower access times than
sequential tape access. Though tapes could hold
the contents of many disks and since it was 10 to
100 times cheaper per gigabyte than disks it was
a useful backup. Today, disks have improved much
rapidly than tapes that tapes have compatibility
problems that are not imposed on disks.
35- Pitfall Trying to provide features only within
the network versus end to end. - The concern is providing at a lower level
features that can only be accomplished at the
highest level, thus only partially satisfying the
communication demand. - Pitfall Moving functions from the CPU to the I/O
processor, expecting to improve performance
without a careful analysis. - A frequent instance of this fallacy is the use of
intelligent I/O interfaces, which, because of the
higher overhead to set up an I/O request, can
turn out to have worse latency than a processor
directed activity.