Title: I/O Systems
1I/O Systems
- I/O Hardware
- Application I/O Interface
- Kernel I/O Subsystem
- Transforming I/O Requests to Hardware Operations
- Streams
- Performance
2I/O Hardware
- Wide variety of I/O devices
- Common concepts
- port
- bus (daisy chain or shared single medium)
- controller (host adapter)
- I/O instructions control devices
- Devices have addresses, used by
- direct I/O instructions
- memory-mapped I/O
3A Typical PC Bus Structure
4Device I/O Port Locations on PCs (partial)
5Polling
- Determines state of device
- command-ready
- busy
- error
- Busy-wait cycle to wait for I/O from device
6Interrupts
- CPU interrupt request line triggered by I/O
device - Interrupt handler receives interrupts
- Maskable to ignore or delay some interrupts
- Interrupt vector to dispatch interrupt to correct
handler - based on priority
- some unmaskable
- Interrupt mechanism also used for exceptions,
traps
7Interrupt-Driven I/O Cycle
8Intel Pentium Processor Event-Vector Table
9Direct Memory Access (DMA)
- Avoid programmed I/O to move large data
- Requires DMA controller
- Bypass CPU
- transfer data directly I/O device ltgt memory
10Six Step Process to Perform DMA Transfer
11Application I/O Interface
- I/O system calls abstract device behaviors in
generic classes - Device-driver layer hides I/O-controller
differences from kernel - Devices vary in many dimensions
- character-stream or block
- sequential or random-access
- sharable or dedicated
- speed of operation
- read-write, read only, or write only
12A Kernel I/O Structure
13Characteristics of I/O Devices
14Block and Character Devices
- Block devices include disk drive
- commands include read, write, seek
- raw I/O or file-system access
- file system maps location i onto block offset
- memory-mapped file access possible
- Character devices include keyboard, mouse, serial
port - commands include get, put
- libraries layered on top allow line editing
15Network Devices
- Differ enough from block, character to have own
interface - Unix and Windows NT/9x/2000 have socket interface
- separates network protocol from network operation
- includes select functionality (eliminates
polling) - Wide variations, e.g.
- pipes, FIFOs, streams, queues, mailboxes
16Clocks and Timers
- Provide current time, elapsed time, timer
- programmable interval timer for periodic
interrupts - UNIX system calls
- timer_create
- get_itimer
- Check their man pages
17Blocking and Nonblocking I/O
- Blocking process suspended until I/O completed
- Easy to use and understand
- Insufficient for some needs
- Nonblocking I/O call returns as much as
available - user interface, data copy (buffered I/O)
- implemented via multi-threading code for I/O call
- returns quickly with count of bytes transferred
- Asynchronous process runs while I/O executes
- difficult to use
- I/O subsystem signals process when I/O completed
- e.g., callbacks pointer to completion code
18Kernel I/O Subsystem
- Scheduling
- some I/O request ordering via per-device queue
- some OSs attempt fairness
- Buffering data in memory between transfers
- cope with device speed mismatch
- cope with device transfer size mismatch
- maintain copy semantics
- buffer guaranteed to be same as at start of I/O
call
19Sun Enterprise 6000 Device-Transfer Rates
20Kernel I/O Subsystem
- Caching fast memory with copy of data
- must (eventually) update original
- key to performance
- Spooling hold output for a device
- if device can serve only one request at a time
- e.g., printing
- Device reservation exclusive access to device
- system calls for allocation, deallocation
- potential for deadlock
21Error Handling
- OS can recover from disk read, device
unavailable, transient write failures - Most return error number or code if I/O request
fails - System error logs hold problem reports
22Kernel Data Structures
- Kernel keeps state info for I/O components,
including open file tables, network connections,
character device state - Many complex data structures to track buffers,
memory allocation, dirty blocks - Some use object-oriented methods and message
passing to implement I/O
23UNIX I/O Kernel Structure
24I/O Requests to Hardware Operations
- Consider reading file from disk for process
- determine device holding file
- translate name to device representation
- read data from disk into buffer (slow)
- make data available to requesting process
- return control to process
25Life Cycle of I/O Request
26STREAMS
- STREAM full-duplex communication channel between
user-level process and device - STREAM consists of
- - STREAM head interfaces with user process
- - driver end interfaces with device- zero or
more STREAM modules between them - Each module contains read queue and write queue
- Message passing used to communicate between queues
27The STREAMS Structure
28Performance
- I/O major factor in system performance
- CPU executes device driver, kernel I/O code
- context switches due to interrupts
- data copying
- network traffic especially stresses system
29Intercomputer Communications
30Improving Performance
- Reduce number of context switches
- Reduce data copying
- Reduce interrupts by large transfers, smart
controllers, polling - Use DMA
- Balance CPU, memory, bus, and I/O performance for
highest throughput
31Device-Functionality Progression
32Mass-Storage Systems
- Disk Structure
- Disk Scheduling
- Disk Management
- Swap-Space Management
- RAID Structure
- Disk Attachment
- Stable-Storage Implementation
- Tertiary Storage Devices
- OS Issues
- Performance Issues
33Disk Structure
- Disk drives addressed as large 1-dimensional
arrays of logical blocks (smallest transfer unit)
- 1-dimensional array of logical blocks mapped onto
sectors of disk sequentially - sector 0 1st sector of 1st track on outermost
cylinder - mapping in order through that track, then rest of
tracks in that cylinder, then through rest of
cylinders from outermost to innermost
34Disk Scheduling
- OS responsible for using hardware efficiently
for disk drives having low access time, high
bandwidth - Access times two major components
- seek time time for disk arm to move head to
desired cylinder - rotational latency time for disk to rotate
desired sector to head - Minimize seek time
- Seek time ? proportional to seek distance
- Disk bandwidth total number of bytes
transferred, divided by total time from start to
end of transfer - Total transaction time access time bytes /
bandwidth - realistically add OS overheads, queuing delay,
etc.
35Disk Scheduling
- Several algorithms schedule disk requests
- Illustrate with request queue
- 200 blocks, numbered 0..199
- head initially at block 53
- 98, 183, 37, 122, 14, 124, 65, 67
36First-Come, First-Served (FCFS)
total head movement 640 cylinders (add over all
moves)
37Shortest Seek-Time First (SSTF)
- Selects request with minimum seek time from
current head position - SSTF scheduling form of SJF scheduling may cause
starvation - Next slide total head movement 236 cylinders
(compare FCFS 640) - not optimal can e.g. cut to 208 cylinders
38SSTF example
39SCAN
- Arm starts at end of disk, moves to other end,
servicing requests up to far end head movement
and service order reversed - Sometimes called elevator algorithm
- Next slide total head movement 242 cylinders
- is this better than SSTF?
- why use this algorithm?
40SCAN Example
41C-SCAN
- Provides more uniform wait time than SCAN
- head moves from one end of disk to other
servicing requests as it goes - when reaches other end immediately returns to
beginning without servicing requests on return
trip - Treats cylinders as circular list that wraps
around from last to first - whats total head movement now?
far end of disk more likely to have more new
requests
42C-SCAN
43C-LOOK
- Version of C-SCAN
- Arm only goes as far as last request in each
direction, then right back to first request at
other end of disk - Is this a good idea?
44C-LOOK (Cont.)
45Selecting a Disk-Scheduling Algorithm
- SSTF common, natural appeal
- SCAN and C-SCAN perform better if heavy load on
disk - Performance depends on number and types of
requests - Requests for disk service influenced by
file-allocation method - Disk-scheduling should be separate module of OS,
allowing replacement with different algorithm if
necessary - SSTF and LOOK both reasonable default choices
46Disk Management
- Low-level formatting, or physical formatting
divide disk into sectors that disk controller can
read and write - To use disk to hold files, OS needs to record own
data structures on disk - partition disk into 1 groups of cylinders
- logical formatting or making a file system
- Boot block to start up system
- bootstrap code in ROM
- bootstrap loader program minimum in ROM
- Methods such as sector sparing to handle bad
blocks
47MS-DOS Disk Layout
48Swap-Space Management
- Swap-space VM uses disk to extend main memory
- Swap-space can be part of normal file system, or
(traditionally) separate partition - Swap-space management
- 4.3BSD allocates swap space when process starts
holds text segment (code) and data segment - kernel uses swap maps to track swap-space use
- Solaris 2 allocates swap space only when page
forced out of physical memory
494.3 BSD Text-Segment Swap Map
504.3 BSD Data-Segment Swap Map
- Size can grow
- Efficient support for small and large processes
- each new block allocated 2x previous size
51RAID Structure
- RAID multiple disk drives provide high
availability via redundancy - History
- large drives used to be cheaper per bit
- in 1980s small drives became cheaper per bit
- using many drives unreliable (more parts to fail)
- RAID arranged into 7 different levels
redundant array of inexpensive disks
52RAID
- Multiple disks working cooperatively
- Disk striping uses group of disks as 1 unit
- data split across drives
- RAID schemes improve performance and availability
(dependability) with redundant data - mirroring or shadowing duplicates each disk
- block interleaved parity (ECC) much less
redundancy
book refers to reliability not correct
reliable nothing breaks
53RAID Levels
54RAID (0 1) and (1 0)
55Disk Attachment
- Disks may be attached two ways
- Host attached via I/O port
- Network attached via network connection
56Network-Attached Storage
57Storage-Area Network
58Stable-Storage Implementation
- Write-ahead log requires stable storage
- To implement stable storage
- replicate information on gt 1 nonvolatile storage
medium sequentially - on recovery, only successful if both writes
complete (i.e., pair of blocks equal)
59Tertiary Storage Devices
- Low cost defining characteristic
- Generally uses removable media
- examples floppy disks, CD, DVD, tapes
60Removable Disks
- Floppy disk flexible disk coated with magnetic
material in protective plastic case - most floppies about 1 MB similar technology for
removable disks gt 1 GB - removable magnetic disks can be nearly as fast as
hard disks, but greater risk of damage
61Removable Disks
- Magneto-optic disk records data on rigid platter
coated with magnetic material - laser heat amplifies large, weak magnetic field
to record bit - laser light also used to read data (Kerr effect)
- magneto-optic head much further from disk surface
than magnetic disk head, and magnetic material
covered with protective clear layer resistant to
head crashes - Optical disks do not use magnetism use materials
altered by laser light examples - CD-RW
- DVD-RW
62WORM Disks etc.
- Data on read-write disks can be modified over and
over - WORM (Write Once, Read Many) disks written once
- thin aluminum film sandwiched between two clear
layers - to write bit drive uses laser light to burn small
hole through aluminum recording can be destroyed
not altered - very durable and reliable
- Examples CD-R, DVD-R
- Read-only disks, e.g., CD-ROM, DVD, factory
recorded
63Tapes
- Compared to disk, tape less expensive, holds
more, but random access much slower - Tape economical medium if fast random access not
needed, e.g., backup of disk, archive of huge
data - Large tape installations can use robotic tape
changers move tapes between drives and slots in
tape library - stacker library holds few tapes
- silo library holds thousands
- Disk-resident file can be archived to tape for
low cost storage computer can stage it back to
disk for active use
64OS Issues
- Major OS tasks manage physical devices, virtual
machine abstraction for applications - For hard disks, OS abstractions
- raw device array of data blocks
- file system names, seeks, reads, writes, etc.
65Application Interface
- Most OSes removable disks almost exactly like
fixed new cartridge formatted, empty file
system generated - Tapes presented as raw storage medium, i.e.,
application does not not see files on tape, opens
whole tape drive as raw device - Usually drive reserved for exclusive use of 1
application - Since OS doesnt provide file system services,
application must decide how to use array of
blocks - Since every application makes up own tape
organization, tape generally only usable by
program that created it - Some standards, e.g., tar format (give or take
variations)
66Tape Drives
- Basic operations for tape drive differ from those
of disk - locate positions tape to specific logical block,
not entire track (corresponds to seek) - read position operation returns logical block
number where tape head is - space operation for relative motion
- Tapes append-only devices updating block in
middle effectively erases everything beyond that
block - EOT mark placed after block that is written
67File Naming
- Naming files on removable media especially
difficult write data on removable cartridge on
one computer, then use on another - Current OSes generally leave name space problem
up to applications, users to work out how to
access, interpret data - Some kinds of removable media (e.g., CD ISO
format) standardized
68Hierarchical Storage Management (HSM)
- Hierarchical storage system extends hierarchy
beyond primary memory and secondary to tertiary
often jukebox (tapes, removable disks) - Incorporate tertiary storage extend file system
- small and frequently used files on disk
- large, old, inactive files archived to jukebox
- HSM usually found in supercomputing centers,
other large installations with enormous data
69Speed
- 2 aspects of tertiary storage speed
- Bandwidth
- latency
- Bandwidth measured in bytes per second
- sustained bandwidth average data rate during a
large transfer of bytes/transfer timedata
rate when data stream actually flowing - effective bandwidth average over entire I/O
time, including seek or locate, and cartridge
switchingdrives overall data rate
70Speed
- Access latency time to locate data
- access time for disk move arm to selected
cylinder, wait for rotational latency lt 15
milliseconds - access on tape requires winding tape reels until
selected block reaches tape head tens or
hundreds of seconds - generally random access within tape cartridge
about 1,000 times slower than disk random access - Low cost of tertiary storage is result of many
cheap cartridges etc. sharing few expensive
drives - Removable library best to store infrequently used
data library can relatively low rate of requests
(per hour)
71Reliability
- Fixed disk drive likely to be more reliable than
removable disk or tape - Optical disk likely to be more reliable than
magnetic disk or tape - Head crash in fixed hard disk generally destroys
data, whereas failure of tape drive or optical
disk drive often leaves data unharmed
72Cost
- Main memory much more expensive than disk
storage - Cost / megabyte of hard disk competitive with
tape if 1 tape per drive - Cheapest tape drives and disk drives about same
storage capacity historically - Tertiary storage gives savings only when number
of removable media gtgt number of drives
73DRAM Price / MB 1981 2000
cheaper x 4 every 3 years recently x 2 every 3
years
74Hard Disk Price / MB 1981 2000
cheaper x 2 every 3 years until c.1985 since x 4
every 3 years
75Tape Drive Price per MB 1984 2000
of most significance if small number of tapes
76Summary
- OS hides variations in I/O devices where
possible - I/O interface critical to performance
- Dependability
- RAID (high availability, fault recovery)
- stable storage (crash recovery)
- Idea of HSM under threat
- price trend favours using disks increasingly
- Long-term trends useful to watch could be new
breakthrough