IO System - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

IO System

Description:

... used for decoding commands from the processor such as read, write, scan, address ... See diagram in class first and example next. 1. The CPU. periodically ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 34
Provided by: ahmed67
Category:
Tags: scan | system

less

Transcript and Presenter's Notes

Title: IO System


1
IO System
  • CPU Performance 60 per year
  • I/O system performance limited by mechanical
    delays (disk I/O)
  • lt 10 per year
  • Amdahl's Law system speed-up limited by the
    slowest part!
  • Suppose we have a difference of 10 between CPU
    time and response time and suppose we speed up
    the CPU by a factor of 10, while
  • neglecting I/O
  • We get a speedup of only 5 times!
  • 5x Performance (or a loss of 50 of CPU
    potential)
  • Suppose we speedup the CPU by a factor of 100,
    while neglecting I/O
  • we get a speedup of only 10- times,
  • 10x Performance (loosing 90 of CPU potential)
  • A detailed numerical example given in class.
  • I/O bottleneck
  • Diminishing value of faster CPUs
  • The analogy is with a car very fast engine will
    get nowhere if the movement of the wheels is too
    slow!

2
Motivation Who Cares About I/O?
  • Some people still maintain that I/O is really not
    important for the overall performance.
  • The argument is that I/O Speed does not matter
    because the CPU can always switch to another
    process if the running process requests an I/O
    operation. This argument is valid only in systems
    where the throughput is the measure of
    performance!
  • If response time is a critical measure of
    performance then the argument is no more valid!
  • Response time is critical in Personal computers
    (only a single user), in workstations since there
    is only one person (and often time one process)
    per CPU!
  • Also the price of switching could be very high in
    terms of storage and switch time.

3
I/O Systems
interrupts
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
I/O Controller
I/O Controller
Graphics
Disk
Disk
Network
There are several ways of interfacing I/O devices
to the CPU Either through the cache, the memory
bus or through a separate I/O bus. In the figure
this is a low-cost option the memory bus is the
I/O bus.
4
I/O Interface
CPU
Memory
memory bus
Independent I/O Bus connected through the cache
Seperate I/O instructions (in,out)
Interface
Interface
Adv less state-data problem Disa slow
Peripheral
Peripheral
CPU
Lines distinguish between I/O and memory
transfers
common memory I/O bus
40 Mbytes/sec optimistically 10 MIP
processor completely saturates the bus!
VME bus Multibus-II Nubus
Memory
Interface
Interface
Peripheral
Peripheral
See one more figure in class bridge-based bus
architecture
5
Technology Trends
Disk Capacity now doubles every 18
months before 1990 every 36 months
Today Processing Power Doubles Every 18
months  Today Memory Size Doubles Every 18
months(4X/3yr)  Today Disk Capacity Doubles
Every 18 months  Disk Positioning Rate (Seek
Rotate) Doubles Every Ten Years!
The I/O GAP
6
Storage Technology Drivers
  • Driven by the prevailing computing paradigm
  • 1950s migration from batch to on-line processing
  • 1990s migration to ubiquitous computing
  • computers in phones, books, cars, video cameras,
  • nationwide fiber optical network with wireless
    tails
  • Effects on storage industry
  • Embedded storage
  • smaller, cheaper, more reliable, lower power
  • Data utilities
  • high capacity, hierarchically managed storage

7
Disk Device Terminology
Purpose 1 Long-term non-volatile storage 2
Large, inexpensive, slow level in the memory
hierarchy

3. A collection of platters rotating on a spindle
at a certain RPM (3600 - 7200) Each platter is a
metal disk covered with a magnetic recording
material on both sides. Reading and writing
involves mechanical movement seeking and
rotating, to be explained next.
8
Devices Magnetic Disks
  • Purpose
  • Long-term, nonvolatile storage
  • Large, inexpensive, slow level in the storage
    hierarchy
  • Characteristics
  • Seek Time (8 ms avg)
  • positional latency (track)
  • rotational latency (sector within track)
  • Transfer rate
  • About a sector per ms (5-15 MB/s) (in
    Blocks)
  • Queuing Delay time waiting for the disk to
    become free
  • Controller time
  • Capacity
  • Gigabytes
  • Quadruples every 3 years

Track
Sector the smallest unit that can be
read/ written
Cylinder
Platter
Head
7200 RPM 120 RPS gt 8 ms per rev ave rot.
latency 4 ms 128 sectors per track gt 0.25 ms
per sector 1 KB per sector gt 16 MB / s
Lest see some numbers page 490
9
Disk Device Terminology
Disk Latency Queuing Time Controller time
Seek Time Rotation Time Xfer Time
Order of magnitude times for 4K byte transfers
See some pictures of disk design
Seek 8 ms or less Rotate 4.2 ms _at_ 7200
rpm Xfer 1 ms _at_ 7200 rpm
10
Disk Time Example
  • Disk Parameters
  • Transfer size is 8K bytes
  • Advertised average seek is 12 ms
  • Disk spins at 7200 RPM
  • Transfer rate is 4 MB/sec
  • Controller overhead is 2 ms
  • Assume that disk is idle so no queuing delay
  • What is Average Disk Access Time for a Sector?
  • Ave seek ave rot delay transfer time
    controller overhead
  • 12 ms 0.5/(7200 RPM/60) 8 KB/4 MB/s 2 ms
  • 12 4.15 2 2 20 ms.

11
Relative Cost of Storage TechnologyLate
1995/Early 1996
  • Magnetic Disks
  • 5.25 9.1 GB 2129 0.23/MB 1985 0.22/M
    B
  • 3.5 4.3 GB 1199 0.27/MB 999 0.23/MB
  • 2.5 514 MB 299 0.58/MB 1.1
    GB 345 0.33/MB
  • Optical Disks
  • 5.25 4.6 GB 1695199 0.41/MB 1499189
    0.39/MB
  • PCMCIA Cards
  • Static RAM 4.0 MB 700 175/MB
  • Flash RAM 40.0 MB 1300 32/MB
  • 175 MB 3600 20.50/MB

12
Processor Interface Issues
  • An interface answers the following questions for
    us
  • 1) how is a user I/O request transformed into a
    device command and communicated to the device?
  • 2) how is data actually transferred to or from a
    memory location?
  • 3) what is the role of the operating system in
    this?
  • The OS is important since the I/O system is
    shared by multiple programs using the CPU. This
    sharing needs to be implemented in a fair way.
    The CPU can not do that, it is busy executing
    programs.

13
Processor Interface Issues
  • Processor interface
  • Interrupts
  • Memory mapped I/O
  • I/O Control Structures
  • Polling
  • Interrupts
  • DMA
  • I/O Controllers
  • I/O Processors
  • Capacity, Access Time, Bandwidth
  • Interconnections
  • Busses

14
A Need for an I/O Interface
  • One may wonder why we dont connect peripherals
    directly to the system bus. Reasons for not doing
    that
  • There are a wide variety of peripherals with
    various methods of operation. It would be very
    impractical to incorporate the necessary logic
    within the processor to control each device.
  • The data transfer rate of the peripheral is much
    slower than that of the memory or the processor.
    Thus it is impractical to use a high-speed system
    bus to communicate directly with a peripheral.
  • Peripherals often use different data formats and
    word lengths than the computer to which they are
    attached.
  • Next question is how to connect the I/O interface
    that may be attached to an I/O bus to the CPU?

15
Example of an Interface
  • Interface to system bus
  • data registers
  • control/status registers
  • I/O logic used for decoding commands from the
    processor such as read, write, scan, address
    recognition, status reporting etc.
  • External device interface (data, status, control)
  • function of the interface
  • control and timing
  • processor communication
  • device communication
  • data buffering
  • error correction
  • Next how does the CPU address an I/O device to
    send or receive data?

See figures in class.
16
Memory Mapped I/O
CPU
Single Memory I/O Bus No Separate I/O
Instructions
ROM
RAM
Memory
Interface
Interface
Peripheral
Peripheral
I/O
In this mode, there is a single address space for
memory locations and I/O devices. Each I/O device
will have unique addresses for its data
and status registers which are treated just like
any other memory location. The bus will contain
data and address lines and some I/O command
lines The command line specifies whether the
address refers to a memory location or an I/O
device. The alternative solution is isolated I/O
address space and I/O opcodes. In this case I/O
ports are only accessible by special I/O
instructions.
17
Benefits of Memory-Mapped I/O
  • Data Transfer to and from the Processor is
    standardized.
  • The number of connections to the processor chip
    or board are reduced.
  • With the increasing number of address bits (32,
    64) etc. there is sufficient extra room to
    apportion some of the memory space to I/O.

18
I/O Addressing
  • In both cases (memory-mapped, isolated I/O),
    each I/O device has registers for status (busy,
    ready, idle, etc.), and control information.
  • The CPU sets flags to determine the operation the
    I/O device will perform, either through
    load/store instructions in memory-mapped, or
    through special I/O instructions for the isolated
    I/O.
  • The next question is how is this interaction done?

19
Programmed I/O (Polling)
See diagram in class first and example next.
1. The CPU periodically checks status bits to
see if there is I/O operation.
CPU
Memory
IOC
2. busy wait loop
device
3. The CPU ends up doing all the work!
4.Not an efficient way to use the CPU unless the
device is very fast!
The problem with this method is that the
processor has to wait for a long time for the I/O
module of concern to be ready for either
reception or transmission of data. The processor
while waiting must repeatedly interrogate the
module.
20
Polling
  • 1. CPU interrogates the I/O module to check
    status of the attached device.
  • 2. The I/O module returns device status
  • 3. If the device is operational and ready to
    transmit, the CPU requests the transfer of data,
    by means of a command to the I/O module.
  • 4. The I/O module obtains a unit of data from the
    external device
  • 5. The data are transferred from the I/O module
    to the processor.

21
Overhead of Polling
  • Three different devices mouse, floppy disk,
    hard disk.
  • Assume the polling operation (transferring to the
    polling routine, accessing the device, and
    restarting the user program) takes 400 CCs.
  • Processor is 500-MHz.
  • Mouse must be polled 30 times/second.
  • Floppy disk transfers data to the processor in
    16-bits units and has a data rate of 50 KB/sec.
    No data can be missed.
  • Hard disk transfers data in 16-byte (four-word)
    chunks and can transfer at 4 MB/sec. Again no
    data can be missed.
  • Devices always busy.

22
Overhead of Polling
  • Mouse
  • clock cycles per second for polling 30 X 400
    12,000 Cycles per second.
  • Fraction of the processor clock cycles consumed
  • 12,000/500 X 106 0.002
  • Polling is good for the mouse in this computer.
    It does not degrade the performance
    significantly.
  • Floppy disk
  • the rate at which we must poll is 50 KB/s
    divided by 2 bytes per polling access, we get
  • 25K polling accesses per second.
  • Cycles per second for polling 25k X 400 10 X
    106
  • Fraction of processor clock cycles consumed
  • 10 X 106 / 500 X 106 2 , could be tolerable.

23
Overhead of Polling
  • Hard disk
  • polling rate is 250 K times per second (why?)
  • (4 MB per second/ 16 bytes per transfer) 250 K
    (a quarter of a mega).
  • cycles per second for polling 250 K X 400
  • fraction of processor consumed 100 Mega. /500
    MHz 20.
  • One-fifth of the processor is used just for
    polling the disk. This is clearly not acceptable.
  • Alternative solution to polling is
    interrupt-driven I/O next!

24
Interrupt Driven Data Transfer
CPU
add sub and or nop
user program
(1) I/O interrupt
(2) save PC
Memory
IOC
(3) interrupt service addr
device
read store ... rti
interrupt service routine
(4)
User program progress only halted during
actual transfer to deal with different I/O
devices, interrupt mechanisms have several levels
of priority. These priorities indicate the order
in which the processor should process the
interrupts.
memory
Interrupt algorithm given in class.
25
Overhead of Interrupt-driven I/O
  • Suppose we have same hard disk and processor as
    before.
  • The overhead for each transfer including the
    interrupt is 500 clock cycles.
  • Lets find the fraction of the processor consumed
    if the hard disk is only transferring data 5 of
    the time.
  • The interrupt rate when the disk is busy is the
    same as the polling rate, hence
  • Cycles per second for disk 250K X500
  • 125 X 106 cycles per second. (see
    previous example for this).

26
Overhead of Interrupt-driven I/O
  • Fraction of the processor consumed during a
    transfer 125 x 106 / 500 X 106 25
  • assuming that the disk is only transferring data
    at 5 of the time,
  • Fraction of the processor consumed is 25 X5
    1.25
  • so the absence of overhead when the I/O device is
    not actually transferring is the major advantage
    of interrupt-driven interface versus polling.
  • Interrupt-driven I/O relieves the CPU from having
    to wait for every I/O event. However, if we use
    this method and the disk is transferring it still
    costs 25.

27
Direct Memory Access Controllers
  • A solution to that is the DMAa mechanism for
    off-loading the processor and having the device
    controller transfer data directly to or from
    memory without involving the processor.
  • The interrupt mechanism is still used by the I/O
    device to communicate with the processor but only
    on completion of an I/O transfer.
  • DMA is implemented with a specialized controller
    that transfers data between an I/O device and the
    memory independent of the processor.

28
Direct Memory Access Controllers
  • Step 1 CPU sets up the DMA by supplying the
    identity of the device, the operation to perform,
    the memory address, and the number of bytes to
    transfer.
  • Step2 DMA starts operation and arbitrates for
    the bus, and transfers the data.
  • Step3 Once DMA transfer is complete, the
    controller interrupts the processor.

29
Direct Memory Access
CPU sends a starting address, direction, and
length count to DMAC. Then issues "start".
CPU
Memory
DMAC
IOC
device
DMAC provides handshake signals for
Peripheral Controller, and Memory Addresses and
handshake signals for Memory.
30
Overheard of I/O using DMA
  • Suppose same processor and hard disk as before.
  • Assume that the initial setup of a DMA transfer
    takes 1000 clock cycles for the processor, and
    assume the handling of the interrupt at DMA
    completion requires 500 clock cycles for the
    processor.
  • Hard disk has transfer rate of 4MB/sec.
  • Average transfer from disk is 8KB.
  • Disk is transferring 100.
  • What fraction of the 500MHz processor is consumed?

31
Overheard of I/O using DMA
  • Each DMA transfer takes 8 KB / 4MB/sec 0.002
    sec.
  • If the disk is constantly transferring, it
    requires
  • 1000 500 cycles/transfer / 0.002 second per
    transfer 750,000 clock cycles/second
  • processor is 500MHz, fraction of processor
    consumed 750,000 / 500 X106 0.2.
  • Of course the disk is not always transferring and
    this number will be even lower.
  • To further relieve the processor from I/O, the
    I/O controller could be made more intelligent.
    Such a controller is often called and I/O
    processor. This processor executes I/O programs
    already stored.

32
Input/Output Processors
D1
IOP
CPU
D2
main memory bus
Mem
. . .
Dn
I/O bus
CPU IOP
issues instruction to IOP interrupts when done
(4)
(1)
(2)
(3)
memory
Device to/from memory transfers are controlled by
the IOP directly. IOP steals memory cycles.
33
Summary
  • Disk industry growing rapidly, improves
  • bandwidth 40/yr ,
  • areal density 60/year, /MB faster?
  • queue controller seek rotate transfer
  • Advertised average seek time benchmark much
    greater than average seek time in practice
  • Response time vs. Bandwidth tradeoffs
  • Value of faster response time
  • 0.7sec off response saves 4.9 sec and 2.0 sec
    (70) total time per transaction gt greater
    productivity
  • everyone gets more done with faster response,
    but novice with fast response expert with slow
  • Processor Interface today peripheral processors,
    DMA, I/O bus, interrupts
Write a Comment
User Comments (0)
About PowerShow.com