Title: CS 2200 IO
1CS 2200 I/O
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)
2What is it exactly?
- To anyone in computer science or computer
engineering I/O probably has many different
meanings - My research in computer architecture focuses on
processor design - So I/O generally just involves a processor/memory
interface - For a DRAM chip designer, I/O might involve
- A processor/memory interface
- A memory/disk interface
- For an OS designer, I/O might be
- An interrupt from a device, input from the user,
etc. etc. - Basically it can mean lots of different things
- In computer architecture levels of memory
hierarchy beyond main memory are often ignored
3Why study I/O?
- Weve talked a lot about the CPU time metric
- (In fact Ive probably stressed it quite a bit!)
- CPU time is important
- for measuring how fast an instruction or program
is actually executed - But whats perhaps more important is response
time - The time between when the user types a command
and when the results appear - This might be a better measure of performance
- A brief study of I/O will help complete the
picture of a general computer architecture or
organization
4A quick example
- Response time is 10 longer than CPU time
- (So, I/O overhead adds 10 to our execution time)
- Can speed up CPU by a factor of 10,but I/O
overhead/time will stay the same - Amdahls law!
- Only a speedup of 5.5! ½ of CPU improvement is
wasted - What if we make the CPU 100 times faster?
- Speedup of 10! 90 of speedup wasted!
- With CPU performance skyrocketing, if we dont
improve I/O, all tasks will just become I/O
bound
5Our Road Map
Processor
Memory Hierarchy
I/O Subsystem
Parallel Systems
Networking
6Five Classic Components of a Computer Systemall
computers since 1946
7... and the software abstractions atop them!
operating systems, networking
Computation Processes Threads
Communication I/O devices, the internet
Storage Virtual Memory, Files
8I/0 Plan
- I/O devices in general
- magnetic disks in particular
- networks in particular
- Hardware interface issues
- tradeoff of performance and convenience
- dealing with external events
- Software abstractions
- example filesystems
- disk head scheduling
- POSIX models all I/O as files device drivers
9I/O Types and Rates
- Device Behavior Partner DataRate
kb/s - Keyboard I Human 0.01
- Mouse I Human 0.02
- Voice Input I Human 0.02
- Scanner I Human 400
- Voice Output O Human 0.6
- Line Printer O Human 1
- Laser Printer O Human 200
- Graphics Display O Human 60,000
- Modem IO Machine 8
- Network IO Machine 6,000
- Floppy Disk S Machine 100
- Optical Disk S Machine 1,000
- Magnetic Tape S Machine 2,000
- Magnetic Disk S Machine 10,000
10Mouse
I got the idea for the mouse while attending a
talk at a computer conference. The speaker was so
boring that I started daydreaming and hit upon
the idea. Doug Englebart
- Uses mechanical counters or optical devices to
generate pulses which increment or decrement
counters - Counter values determined by polling.
11Magnetic Disks
- Drums
- Disks
- Removable disk packs
- Floppy disk
- Invented for IBM Field Engineers
- Contact
- Slow speed
12Magnetic Disks
- Most common form of long term, rewriteable
storage devices - Usually considered the lowest level of memory
hierarchy - How does a magnetic disk work?
- Collection of platters rotates on a spindle at
some RPM - Platters are metal disks covered with magnetic
recording material on both sides - Disk diameters can vary
- Usually the wider faster, narrower cheaper
- Disk surface divided into tracks which are
divided into sectors - Sectors are the smallest unit that can be written
13A disk, pictorially
- When accessing data we read or write to a sector
- All sectors the same size, outer tracks just less
dense - To read or write, moveable arm with read/write
head moves over each surface - Cylinder all tracks under the arms at a given
point on all surfaces - To read or write
- Disk controller moves arm over proper track a
seek - The time to move is called the seek time
- When sector found, data is transferred
14Disk Terminology
Cylinder Track 'x' on all platters/surfaces
15The speed of light? No.
- Time required for a requested track sector to
rotate under the read/write head is called the
rotation latency or rotational delay - Mechanical components on the order of
milliseconds - No longer moving at the speed of light like in
our CPU! - Time required to actually write or read data is
called the transfer time - (a function of block size, rotation speed,
recording density on a track, and speed of the
electronics connecting the disk to the computer)
16Disk odds n ends
- Often transfer time is a very small portion of a
full access - Its possible to use techniques (discussed in
caches) to help reduce disk overhead. Any
thoughts? - To help reduce complexity theres usually
additional HW called a disk controller - Disk controller helps manage disk accesses
- but also adds more overhead controller time
- (Can also have a queuing delay)
- (Time spent waiting for a disk to become free if
its already in use for another access)
17Example average disk access time
- What is the average time to read or write a
512-byte sector for a typical disk? - The average seek time is given to be 9 ms
- The transfer rate is 4 MB per second
- The disk rotates at 7200 RPM
- The controller overhead is 1 ms
- The disk is currently idle before any requests
are made (so there is no queuing delay) - Average disk access time average seek time
average rotational delay transfer time
controller overhead
18Capacity trends and disks
- Capacity of disks usually referred to as areal
density
Cost for 1GB of magnetic disk space has
decreased/ will decrease almost exponentially
over time!
19Magnetic Disks short overview
- Hard disk
- Higher speed (3600 - 7200)
- Larger
- Higher Density
- Multiple platters
- Performance
- Seek time (8-20 ms or faster)
- Rotational latency (4-8 ms)
- Transfer rate 2-40 MB/sec
20Disk Latency
Disk Latency Queuing Time Controller time
Seek Time Rotation Time Transfer Time
Order of magnitude times for 4K byte transfers
Seek 8 ms or less Rotate 4.2 ms _at_ 7200
rpm Transfer 1 ms _at_ 7200 rpm
21Technology Trends
Disk Capacity now doubles every 18
months before 1990 every 36 months
Today Processing Power Doubles Every 18
months Today Memory Size Doubles Every 18
months(4X/3yr) Today Disk Capacity Doubles
Every 18 months Disk Positioning Rate (Seek
Rotate) Doubles Every Ten Years!
The I/O GAP
22Historical Perspective
- 1956 IBM Ramac early 1970s Winchester
- Developed for mainframes
- Had proprietary interfaces
- Steady shrink in form factor 27 in. to 14 in.
- 1970s developments
- 5.25 inch floppy disk formfactor (microcode into
mainframe) - early emergence of industry standard disk
interfaces - ST506, SASI, SMD, ESDI
23Historical Perspective
- Early 1980s
- PCs and first generation workstations
- Mid 1980s
- Client/server computing
- Centralized storage on file server
- accelerates disk downsizing 8 inch to 5.25 inch
- Mass market disk drives become a reality
- industry standards SCSI, IPI, IDE
- 5.25 inch drives for standalone PCs, End of
proprietary interfaces
24Disk History
Data density Mbit/sq. in.
Capacity of Unit Shown Megabytes
1973 1. 7 Mbit/sq. in 140 MBytes
1979 7. 7 Mbit/sq. in 2,300 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
25Historical Perspective
- Late 1980s/Early 1990s
- Laptops, notebooks, (palmtops)
- 3.5 inch, 2.5 inch, (1.8 inch formfactors)
- Formfactor plus capacity drives market, not so
much performance - Recently Bandwidth improving at 40/ year
- Challenged by DRAM, flash RAM in PCMCIA cards
- still expensive
- unattractive MBytes per cubic inch
- Optical disk fails on performace (e.g., NEXT) but
finds niche (CD ROM)
26Disk History
1989 63 Mbit/sq. in 60,000 MBytes
1997 1450 Mbit/sq. in 2300 MBytes
1997 3090 Mbit/sq. in 8100 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even more data into
even smaller spaces
27Magnetic Disks
illustration source unknown
28Second Major Example Networks
- Examples
- System Area Networks (SP2) 100s nodes 25
meters per link - Local Area Networks (Ethernet) 100s nodes
1000 meters - Wide Area Network (ATM) 1000s nodes 5,000,000
meters
a.k.a. end systems, hosts
a.k.a. network, communication subnet
Interconnection Network
29ABCs of Networks
- Starting Point Send bits between 2 computers
- Queue (FIFO) on each end
- Information sent called a message
- Can send both ways (Full Duplex)
- Rules for communication? protocol
- Inside a computer
- Loads/Stores Request (Address) Response (Data)
- Need Request Response signaling
30Trivial Example
- What is the format of mesage?
- Fixed? Number bytes?
Request/ Response
Address/Data
1 bit
32 bits
0 Please send data from Address 1 Packet
contains data corresponding to request
- Header/Trailer information to deliver a message
- Payload data in message (1 word above)
31Extensions
- What if more than 2 computers want to
communicate? - Need computer address field (destination) in
packet - What if packet is garbled in transit?
- Add error detection field in packet (e.g., CRC)
- What if packet is lost?
- More elaborate protocols to detect loss
- What if multiple processes/machine?
- Queue per process to provide protection
- Simple questions such as these lead to elaborate
protocols and packet formats gt complexity - note complexity often gt slow
32A Simple Example Revisted
- What is the format of packet?
- Fixed? Number bytes?
Address/Data
CRC
Code
2 bits
32 bits
4 bits
00 RequestPlease send data from Address 01
ReplyPacket contains data corresponding to
request 10 Acknowledge request 11 Acknowledge
reply
33Network Media
- There are different ways to connect computers
together - Can kind of think of it like a memory hierarchy
- Different kinds of media vary in cost,
performance, and reliability - There are several different kinds well consider
- Twisted Pair
- Coaxial Cable
- Fiber Optics
- Air
- (first, see board for summary discussion)
34Twisted pair media
- Just a twisted pair of copper wires
- Insulated, about 1mm thick
- Data transfer speeds of
- A few Mb/s over a few kilometers
- 10s of Mb/s over shorter distances
- Uses
- Used lots in the telephone industry
- OK for LANs because of reasonable data transfer
rates
35Coaxial (coax) cable
- A picture of it is included below
- Pretty complicated (and expensive) for a wire
- But very good signal propagation properties
- Good bandwidth
- 10s Mbs over a kilometer
- Good for LAN
36Fiber optics
- Replaces copper with plastic and electrons with
light - Usually, 3 basic components
- Transmission medium fiber optic cable
- Light source LED or laser diode
- Light detector photodiode
- A simplex media data can only go in 1 direction
- But goes really fast (many Gb/s) and far (100s of
km)
37Some comparisons
38The bottom line
- Bandwidth problems can be fixed
- More money More wires
- Improving your latency is somewhat more
difficult - After all, 299792.5 km/s is kinda fixed
39I/O Device Summary
- Disks/Networks very different but consider these
similarities - Data handled in batches (sectors, messages)
- Lots of waiting around for external events
- Compatibility is important (more than
performance) - Reliability is important (and requires work to
achieve) - Slow devices are simple (and boring)
- Fast devices may be substantially autonomous
- graphics
40I/O Hardware Interface Issues
41I/O Hardware
- Basic memory-map w/polling and/or interrupts
- Project 2!
- Advanced bus issues
- Performance vs. compatibility -gt multiple busses
- Namespaces
- Smart device controllers
- Direct Memory Access (DMA)
- Arbitration
- Caching issues
- I/O processors
- the wheel of reincarnation
- (see board for preliminary examples)
42Basic I/O devices as memorya la project 2
43Performance vs. Compatibility
- Problem
- Processor - memory is a performance-crucial path
... improve as often as possible! - I/O controllers made by many vendors ... change
is expensive!
44(No Transcript)
45Multiple Busses
Cache Bus e.g. 256b, 533MHz
Memory Bus e.g. 64b, 533MHz
Processor
interrupts
Cache
I/O Bus e.g. 64b, 66MHz
Memory Bus
bridge
Main Memory
I/O Bus (e.g. PCI)
I/O Controller
I/O Controller
I/O Controller
Disk Drive Bus e.g. SCSI 16b, 20MHz
Graphics
Disk
Disk
Network
46Smart Device Controllers
47Polling
- Computer
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
48Polling
- Computer
- Busy bit set? No.
- Set write bit in command register
- Write a byte (or word) of data to Data-out
- Set command ready bit in control register
- Busy bit set? Yes.
- Busy bit set? Yes.
- Controller
- Controller clears busy bit
- Sees command ready
- Set busy bit
49Polling
- Computer
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? Yes.
- Busy bit set? No.
- Controller
- Checks write bit
- Reads data-out
- Does I/O with device
- Clears command ready bit
- Clears error bit
- Clears busy bit
50Polling
- Appropriate when controller and device very fast
- Very inefficient when controller mostly busy
- Better solution...Interrupts
51(Recall) Interrupt Mechanism Hardware
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If the processor decides to handle the interrupt
it asserts the inta (interrupt acknowledege) line
52(Recall) Interrupts/Exceptions/Traps protection
I/O (kernel) space
a loop
user space
PC (mem. addr.)
a system call
kernel space
an interrupt
time
53(Recall) Process States
New
Terminated
Ready
Running
A longer example using these states later on in
lecture
Waiting
54Interrupts
- Used by I/O controllers to communicate to
Processor - Also used by applications to communicate with OS
- Software Interrupt or Trap
- OS can now use same device registers as before
55Interrupts
- Processor
- Initiate I/O
- Context switch to something else
- Receive interrupt transfer to handler
- Interrupt handler processes data, returns from
interrupt - Resume processing of interrupted task
- I/O Controller
- Initiate I/O with physical device
- Completion (good or bad)
- Generate interrupt
56DMA
- Preceding scheme effective but transfers oflarge
blocks of data causes lots of interrupts - And uses a sophisticated general-purpose
processorfor a very specialized function (moving
data around) - Solution Add enough processing power to device
controller (and possibly bus controller) to allow
direct transfer between device and memory.
57DMA
N
Processor tells controller to make DMA
transfer. Assume disk to memory. (Includes N
number of bytes)
58DMA
N
Controller gets sector of data from disk.
59DMA
N-1
Controller transfers one word to memory and
updates count. Checks for termination. If not...
60DMA
N-2
Controller transfers one word to memory and
updates count. Checks for termination. If not...
61DMA
N-3
Controller transfers one word to memory and
updates count. Checks for termination. If not...
62DMA
N-4
Controller transfers one word to memory and
updates count. Checks for termination. If not...
63DMA
N-5
Controller transfers one word to memory and
updates count. Checks for termination. If not...
64DMA
0
Controller transfers one word to memory and
updates count. Checks for termination. If done...
65DMA
Controller interrupts processor
66DMA
Processor acknowledges interrupt
67DMA
Controller sends interrupt vector
68DMA
Processor can now have scheduler take
appropriate action (i.e. move process waiting
for I/O into ready queue, etc.)
69Arbitration
- DMA implies multiple owners of the bus
- must decide who owns the bus from cycle to cycle
- Arbitration
- Daisy chain
- Centralized parallel arbitration
- Distributed arbitration by self selection
- Distributed arbitration by collision detection
- (see board for detailed examples and pictures)
70Daisy Chain
Simple but not fair and slow.
71Centralized Parallel Arbitration
- Requires central arbiter
- Each device has separate line
- Central arbiter may become bottleneck
- Used in PCI bus
72Distributed Arbitration by Self Selection
- Each device sees all requestors
- Priority scheme allows each to know if they get
bus - Requires lots of request lines
- Used by Apple NuBus (backplane)
73Distributed Arbitration by Collision Detection
- Devices independently request bus
- Devices have ability to detect simultaneous
requests or Collisions. - Upon collision a variety of schemes are used to
select among requestors - Used by Ethernet
74Caching Issues
- What happens if the processor has a cached copy
of data when a device does DMA? - Short answer is that theres a cache coherance
problem the DMA may change memory and the
processor doesnt see the change. Two solutions - Device driver (software) flushes cache before
using DMA - Elaborate bus hardware maintains consistency by
checking the cache on every external bus
transaction
75wheel of reincarnation
- Start with simple devices
- Add cute functionality
- Add lots of functionality
- Declare it to be a processor in its own right
- Repeat...
- Graphics community has been around this wheel a
couple of times now.
76Summary
- Example Devices
- often work in blocks
- spend lots of time waiting
- Bus Issues
- memory map w/polling and/or interrupts (project
2) - Performance vs. compatibility -gt multiple busses
- Namespaces
- Smart device controllers
- Direct Memory Access (DMA)
- Arbitration
- Caching issues
- I/O processors
- the wheel of reincarnation