Title: Sync IO causes Deceptive Idleness''
1Sync I/O causes Deceptive Idleness..
- solution Non work conserving disk scheduler
Sitaram Iyer, Peter Druschel
2Outline
- Identify and analyze a fundamental problem in OS
disk subsystems - Deceptive Idleness
- Propose and evaluate a general solution
- Non work conserving disk scheduler
3Role of an Operating System
- One of the functions of an OS is to
manage/schedule hardware resources - efficiently
- fairly
- satisfying QoS constraints, etc.
- The OS should do all the right things.
4e.g. Disk service
- Many disk-intensive applications
- e.g. databases, webservers, streaming
- media, scientific computation, etc.
- Important for OS to handle disk I/O well
- (disks slow, often performance critical)
5Disk Subsystem (anatomy)
App
App
I/O
FS
VM
Disk Scheduler Accepts and queues disk requests
Selects and dispatches requests to driver
Disk Scheduler
Disk Driver
disk subsystem
6Disk Scheduler operation
Work-conserving
Enqueue
(immediately...)
idle?
more?
yes
yes
decision point
(immediately...)
Schedule
finish
Disk Service
(scheduling policy)
7Theres a problem!
- Applications that generate requests one after
another (i.e. synchronously).
app
Deceptive Idleness
app
scheduler
Scheduler never gets a chance to schedule two
consecutive requests from a process.
(next)
(prev)
8Why is this a problem?
- Wait and watch!!
- Analyze effect on two scheduler types
- Seek reducing e.g. Elevator
- Proportional e.g. WFQ
9Seek reducing scheduler (1/2)
disk layout
Sequential 64 KB requests issued by application A
...issued by application B
e.g. achieved throughput on our disk 21 MB/s
...pretty good!
(next)
(prev)
10Seek reducing scheduler (2/2)
disk layout
Achieved throughput 5 MB/s, i.e. only 25
(next)
(prev)
11Proportional scheduler
Deceptive Idleness alternates between requests
from the two processes (achieves 11).
Stride typically delivers cumulative disk service
in proportion to some requested ratio (12)
12Two solution approaches
- Prefetch cumbersome, imperfect
- mispredictions are expensive
- Non-work-conserving scheduler
- at decision points, sometimes wait for
- the subsequent request to arrive!
13NWCS - basic method
new request
Enqueue
Select
new request
Evaluate
expecting better
timeoutexpired
as good as it gets
Proceed with current best
14NWCS - evaluate (1/2)
- Estimate, for each process
- 1. Expected thinktime
- 2. 95-percentile thinktime
-
- 3. Expected positioning time
15NWCS - evaluate (2/2)
- LP last request issuing process
- Benefit
- Calculate(current_positioning_time)
- -- LP.expected_positioning_time
- Cost LP.expected_thinktime
- Waiting_duration
- (Benefit gt Cost ? LP.95ile_thinktime 0)
16Experimental setup
- 550MHz Pentium-III system
- 7200 rpm IBM Deskstar IDE disk
- Slightly modified FreeBSD-4.0 kernel
- Kernel module, 1500 lines of C code
17Microbenchmark
- Two processes, accessing large files using read
or mmap. - Various access patterns
- sequential, alternate, random
- mmap doesnt implement prefetch.
18Microbenchmark
19Apache webserver
Large working set. Files from the CS webserver
trace.
20Andrew Benchmark (fileserver)
- Typical fileserver-like workload
- Five phases mkdir, cp, stat, scan, gcc
- Many clients, separate repositories
21Andrew Benchmark (fileserver)
3x
22TPC-B database benchmark
- Random update queries into a large MySQL
database. Record size 64 KB. - Many simultaneous clients.
- Variant 1 two separate databases
- Variant 2 replace Update by Select
23TPC-B database benchmark
24Conclusion
- 1. Deceptive Idleness is a problem for many kinds
of disk intensive applications. - 2. Non work conserving schedulers fix it.
- 3. The one we propose aint bad at all.
25Sensitivity analysis (bkp)
- Varied application thinktime in three different
ways - NWCS almost always performs better
- Wrote an intelligent adversary
- NWCS sometimes performs slightly
- worse in a very uncommon case
26Proportional scheduler (bkp)