IO System

About This Presentation

Title:

IO System

Description:

... used for decoding commands from the processor such as read, write, scan, address ... See diagram in class first and example next. 1. The CPU. periodically ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 34

Provided by: ahmed67

Learn more at: http://elearning.najah.edu

Category:

more less

Transcript and Presenter's Notes

Title: IO System

1
IO System

CPU Performance 60 per year
I/O system performance limited by mechanical
delays (disk I/O)
lt 10 per year
Amdahl's Law system speed-up limited by the
slowest part!
Suppose we have a difference of 10 between CPU
time and response time and suppose we speed up
the CPU by a factor of 10, while
neglecting I/O
We get a speedup of only 5 times!
5x Performance (or a loss of 50 of CPU
potential)
Suppose we speedup the CPU by a factor of 100,
while neglecting I/O
we get a speedup of only 10- times,
10x Performance (loosing 90 of CPU potential)
A detailed numerical example given in class.
I/O bottleneck
Diminishing value of faster CPUs
The analogy is with a car very fast engine will
get nowhere if the movement of the wheels is too
slow!

2
Motivation Who Cares About I/O?

Some people still maintain that I/O is really not
important for the overall performance.
The argument is that I/O Speed does not matter
because the CPU can always switch to another
process if the running process requests an I/O
operation. This argument is valid only in systems
where the throughput is the measure of
performance!
If response time is a critical measure of
performance then the argument is no more valid!
Response time is critical in Personal computers
(only a single user), in workstations since there
is only one person (and often time one process)
per CPU!
Also the price of switching could be very high in
terms of storage and switch time.

3
I/O Systems
interrupts
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
I/O Controller
I/O Controller
Graphics
Disk
Disk
Network
There are several ways of interfacing I/O devices
to the CPU Either through the cache, the memory
bus or through a separate I/O bus. In the figure
this is a low-cost option the memory bus is the
I/O bus.
4
I/O Interface
CPU
Memory
memory bus
Independent I/O Bus connected through the cache
Seperate I/O instructions (in,out)
Interface
Interface
Adv less state-data problem Disa slow
Peripheral
Peripheral
CPU
Lines distinguish between I/O and memory
transfers
common memory I/O bus
40 Mbytes/sec optimistically 10 MIP
processor completely saturates the bus!
VME bus Multibus-II Nubus
Memory
Interface
Interface
Peripheral
Peripheral
See one more figure in class bridge-based bus
architecture
5
Technology Trends
Disk Capacity now doubles every 18
months before 1990 every 36 months
Today Processing Power Doubles Every 18
months Today Memory Size Doubles Every 18
months(4X/3yr) Today Disk Capacity Doubles
Every 18 months Disk Positioning Rate (Seek
Rotate) Doubles Every Ten Years!
The I/O GAP
6
Storage Technology Drivers

Driven by the prevailing computing paradigm
1950s migration from batch to on-line processing
1990s migration to ubiquitous computing
computers in phones, books, cars, video cameras,
nationwide fiber optical network with wireless
tails
Effects on storage industry
Embedded storage
smaller, cheaper, more reliable, lower power
Data utilities
high capacity, hierarchically managed storage

7
Disk Device Terminology
Purpose 1 Long-term non-volatile storage 2
Large, inexpensive, slow level in the memory
hierarchy

3. A collection of platters rotating on a spindle
at a certain RPM (3600 - 7200) Each platter is a
metal disk covered with a magnetic recording
material on both sides. Reading and writing
involves mechanical movement seeking and
rotating, to be explained next.
8
Devices Magnetic Disks

Purpose
Long-term, nonvolatile storage
Large, inexpensive, slow level in the storage
hierarchy
Characteristics
Seek Time (8 ms avg)
positional latency (track)
rotational latency (sector within track)
Transfer rate
About a sector per ms (5-15 MB/s) (in
Blocks)
Queuing Delay time waiting for the disk to
become free
Controller time
Capacity
Gigabytes
Quadruples every 3 years

Track
Sector the smallest unit that can be
read/ written
Cylinder
Platter
Head
7200 RPM 120 RPS gt 8 ms per rev ave rot.
latency 4 ms 128 sectors per track gt 0.25 ms
per sector 1 KB per sector gt 16 MB / s
Lest see some numbers page 490
9
Disk Device Terminology
Disk Latency Queuing Time Controller time
Seek Time Rotation Time Xfer Time
Order of magnitude times for 4K byte transfers
See some pictures of disk design
Seek 8 ms or less Rotate 4.2 ms _at_ 7200
rpm Xfer 1 ms _at_ 7200 rpm
10
Disk Time Example

Disk Parameters
Transfer size is 8K bytes
Advertised average seek is 12 ms
Disk spins at 7200 RPM
Transfer rate is 4 MB/sec
Controller overhead is 2 ms
Assume that disk is idle so no queuing delay
What is Average Disk Access Time for a Sector?
Ave seek ave rot delay transfer time
controller overhead
12 ms 0.5/(7200 RPM/60) 8 KB/4 MB/s 2 ms
12 4.15 2 2 20 ms.

11
Relative Cost of Storage TechnologyLate
1995/Early 1996

Magnetic Disks
5.25 9.1 GB 2129 0.23/MB 1985 0.22/M
B
3.5 4.3 GB 1199 0.27/MB 999 0.23/MB
2.5 514 MB 299 0.58/MB 1.1
GB 345 0.33/MB
Optical Disks
5.25 4.6 GB 1695199 0.41/MB 1499189
0.39/MB
PCMCIA Cards
Static RAM 4.0 MB 700 175/MB
Flash RAM 40.0 MB 1300 32/MB
175 MB 3600 20.50/MB

12
Processor Interface Issues

An interface answers the following questions for
us
1) how is a user I/O request transformed into a
device command and communicated to the device?
2) how is data actually transferred to or from a
memory location?
3) what is the role of the operating system in
this?
The OS is important since the I/O system is
shared by multiple programs using the CPU. This
sharing needs to be implemented in a fair way.
The CPU can not do that, it is busy executing
programs.

13
Processor Interface Issues

Processor interface
Interrupts
Memory mapped I/O
I/O Control Structures
Polling
Interrupts
DMA
I/O Controllers
I/O Processors
Capacity, Access Time, Bandwidth
Interconnections
Busses

14
A Need for an I/O Interface

One may wonder why we dont connect peripherals
directly to the system bus. Reasons for not doing
that
There are a wide variety of peripherals with
various methods of operation. It would be very
impractical to incorporate the necessary logic
within the processor to control each device.
The data transfer rate of the peripheral is much
slower than that of the memory or the processor.
Thus it is impractical to use a high-speed system
bus to communicate directly with a peripheral.
Peripherals often use different data formats and
word lengths than the computer to which they are
attached.
Next question is how to connect the I/O interface
that may be attached to an I/O bus to the CPU?

15
Example of an Interface

Interface to system bus
data registers
control/status registers
I/O logic used for decoding commands from the
processor such as read, write, scan, address
recognition, status reporting etc.
External device interface (data, status, control)
function of the interface
control and timing
processor communication
device communication
data buffering
error correction
Next how does the CPU address an I/O device to
send or receive data?

See figures in class.
16
Memory Mapped I/O
CPU
Single Memory I/O Bus No Separate I/O
Instructions
ROM
RAM
Memory
Interface
Interface
Peripheral
Peripheral
I/O
In this mode, there is a single address space for
memory locations and I/O devices. Each I/O device
will have unique addresses for its data
and status registers which are treated just like
any other memory location. The bus will contain
data and address lines and some I/O command
lines The command line specifies whether the
address refers to a memory location or an I/O
device. The alternative solution is isolated I/O
address space and I/O opcodes. In this case I/O
ports are only accessible by special I/O
instructions.
17
Benefits of Memory-Mapped I/O

Data Transfer to and from the Processor is
standardized.
The number of connections to the processor chip
or board are reduced.
With the increasing number of address bits (32,
64) etc. there is sufficient extra room to
apportion some of the memory space to I/O.

18
I/O Addressing

In both cases (memory-mapped, isolated I/O),
each I/O device has registers for status (busy,
ready, idle, etc.), and control information.
The CPU sets flags to determine the operation the
I/O device will perform, either through
load/store instructions in memory-mapped, or
through special I/O instructions for the isolated
I/O.
The next question is how is this interaction done?

19
Programmed I/O (Polling)
See diagram in class first and example next.
1. The CPU periodically checks status bits to
see if there is I/O operation.
CPU
Memory
IOC
2. busy wait loop
device
3. The CPU ends up doing all the work!
4.Not an efficient way to use the CPU unless the
device is very fast!
The problem with this method is that the
processor has to wait for a long time for the I/O
module of concern to be ready for either
reception or transmission of data. The processor
while waiting must repeatedly interrogate the
module.
20
Polling

1. CPU interrogates the I/O module to check
status of the attached device.
2. The I/O module returns device status
3. If the device is operational and ready to
transmit, the CPU requests the transfer of data,
by means of a command to the I/O module.
4. The I/O module obtains a unit of data from the
external device
5. The data are transferred from the I/O module
to the processor.

21
Overhead of Polling

Three different devices mouse, floppy disk,
hard disk.
Assume the polling operation (transferring to the
polling routine, accessing the device, and
restarting the user program) takes 400 CCs.
Processor is 500-MHz.
Mouse must be polled 30 times/second.
Floppy disk transfers data to the processor in
16-bits units and has a data rate of 50 KB/sec.
No data can be missed.
Hard disk transfers data in 16-byte (four-word)
chunks and can transfer at 4 MB/sec. Again no
data can be missed.
Devices always busy.

22
Overhead of Polling

Mouse
clock cycles per second for polling 30 X 400
12,000 Cycles per second.
Fraction of the processor clock cycles consumed
12,000/500 X 106 0.002
Polling is good for the mouse in this computer.
It does not degrade the performance
significantly.
Floppy disk
the rate at which we must poll is 50 KB/s
divided by 2 bytes per polling access, we get
25K polling accesses per second.
Cycles per second for polling 25k X 400 10 X
106
Fraction of processor clock cycles consumed
10 X 106 / 500 X 106 2 , could be tolerable.

23
Overhead of Polling

Hard disk
polling rate is 250 K times per second (why?)
(4 MB per second/ 16 bytes per transfer) 250 K
(a quarter of a mega).
cycles per second for polling 250 K X 400
fraction of processor consumed 100 Mega. /500
MHz 20.
One-fifth of the processor is used just for
polling the disk. This is clearly not acceptable.
Alternative solution to polling is
interrupt-driven I/O next!

24
Interrupt Driven Data Transfer
CPU
add sub and or nop
user program
(1) I/O interrupt
(2) save PC
Memory
IOC
(3) interrupt service addr
device
read store ... rti
interrupt service routine
(4)
User program progress only halted during
actual transfer to deal with different I/O
devices, interrupt mechanisms have several levels
of priority. These priorities indicate the order
in which the processor should process the
interrupts.
memory
Interrupt algorithm given in class.
25
Overhead of Interrupt-driven I/O

Suppose we have same hard disk and processor as
before.
The overhead for each transfer including the
interrupt is 500 clock cycles.
Lets find the fraction of the processor consumed
if the hard disk is only transferring data 5 of
the time.
The interrupt rate when the disk is busy is the
same as the polling rate, hence
Cycles per second for disk 250K X500
125 X 106 cycles per second. (see
previous example for this).

26
Overhead of Interrupt-driven I/O

Fraction of the processor consumed during a
transfer 125 x 106 / 500 X 106 25
assuming that the disk is only transferring data
at 5 of the time,
Fraction of the processor consumed is 25 X5
1.25
so the absence of overhead when the I/O device is
not actually transferring is the major advantage
of interrupt-driven interface versus polling.
Interrupt-driven I/O relieves the CPU from having
to wait for every I/O event. However, if we use
this method and the disk is transferring it still
costs 25.

27
Direct Memory Access Controllers

A solution to that is the DMAa mechanism for
off-loading the processor and having the device
controller transfer data directly to or from
memory without involving the processor.
The interrupt mechanism is still used by the I/O
device to communicate with the processor but only
on completion of an I/O transfer.
DMA is implemented with a specialized controller
that transfers data between an I/O device and the
memory independent of the processor.

28
Direct Memory Access Controllers

Step 1 CPU sets up the DMA by supplying the
identity of the device, the operation to perform,
the memory address, and the number of bytes to
transfer.
Step2 DMA starts operation and arbitrates for
the bus, and transfers the data.
Step3 Once DMA transfer is complete, the
controller interrupts the processor.

29
Direct Memory Access
CPU sends a starting address, direction, and
length count to DMAC. Then issues "start".
CPU
Memory
DMAC
IOC
device
DMAC provides handshake signals for
Peripheral Controller, and Memory Addresses and
handshake signals for Memory.
30
Overheard of I/O using DMA

Suppose same processor and hard disk as before.
Assume that the initial setup of a DMA transfer
takes 1000 clock cycles for the processor, and
assume the handling of the interrupt at DMA
completion requires 500 clock cycles for the
processor.
Hard disk has transfer rate of 4MB/sec.
Average transfer from disk is 8KB.
Disk is transferring 100.
What fraction of the 500MHz processor is consumed?

31
Overheard of I/O using DMA

Each DMA transfer takes 8 KB / 4MB/sec 0.002
sec.
If the disk is constantly transferring, it
requires
1000 500 cycles/transfer / 0.002 second per
transfer 750,000 clock cycles/second
processor is 500MHz, fraction of processor
consumed 750,000 / 500 X106 0.2.
Of course the disk is not always transferring and
this number will be even lower.
To further relieve the processor from I/O, the
I/O controller could be made more intelligent.
Such a controller is often called and I/O
processor. This processor executes I/O programs
already stored.

32
Input/Output Processors
D1
IOP
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
CPU IOP
issues instruction to IOP interrupts when done
(4)
(1)
(2)
(3)
memory
Device to/from memory transfers are controlled by
the IOP directly. IOP steals memory cycles.
33
Summary

Disk industry growing rapidly, improves
bandwidth 40/yr ,
areal density 60/year, /MB faster?
queue controller seek rotate transfer
Advertised average seek time benchmark much
greater than average seek time in practice
Response time vs. Bandwidth tradeoffs
Value of faster response time
0.7sec off response saves 4.9 sec and 2.0 sec
(70) total time per transaction gt greater
productivity
everyone gets more done with faster response,
but novice with fast response expert with slow
Processor Interface today peripheral processors,
DMA, I/O bus, interrupts