Title: Concurrency case studies in UNIX
1Concurrency case studies in UNIX
- John Chapin6.894October 26, 1998
2OS kernel inherently concurrent
- From 60s multiprogramming
- Context switch on I/O wait
- Reentrant interrupts
- Threads simplified the implementation
- 90s servers, 00s PCs multiprocessing
- Multiple CPUs executing kernel code
3Thread (-centric concurrency) control
- Single CPU kernel
- Only one thread in kernel at a time
- No locks
- Disable interrupts to control concurrency
- MP kernels inherit this mindset
- 1. Control concurrency of threads
- 2. Add locks to objects only where required
4Case study memory mapping
- Background
- Page faults
- challenges
- pseudocode
- Page victimization
- challenges
- pseudocode
- Discussion design lessons
5Other interesting patterns
- nonblocking queues
- asymmetric reader-writer locks
- one lock/object, different lock for chain
- priority donation locks
- immutable message buffers
6Page mapping --- background
virtual addr space
pfdat array
physicalmemory
7Life cycle of a page frame
invalid
IO_pending
Allocate
Read from disk
unallocated
Victimize
valid
Write todisk
Modify
Victimize
pushout
dirty
8Page fault challenges
- Multiple processes fault to same file page
- Multiple processes fault to same pfdat
- Multiple threads of same process fault to same
segment (low frequency) - Bidirectional mapping between segment pointers
and pfdats - Stop only minimal process set during disk I/O
- Minimize locking/unlocking on fast path
9Page fault stage 1
- vfault(virtual_address addr)
- segment.lock()
- if ((pfdat segment.fetch(addr)) null)
- pfdat lookup(s.file, s.pageNum(addr))
- / returns locked pfdat /
- if (pfdat.status PUSHOUT)
- / do something complicated /
- install pfdat in segment
- add segment to pfdat owner list
- else
- pfdat.lock()
10Page fault stage 2
- if (pfdat.status IO_PENDING)
- segment.unlock()
- pfdat.wait()
- goto top of vfault
- else if (pfdat.status INVALID)
- pfdat.status IO_PENDING
- pfdat.unlock()
- fetch_from_disk(pfdat)
- pfdat.lock()
- pfdat.status VALID
- pfdat.notify_all()
11Page fault stage 3
- segment.insert_TLB(addr, pfdat.paddr())
- pfdat.unlock()
- segment.unlock()
- restart application
12Page victimization challenges
- Bidirectional mapping between segment pointers
and pfdats - Stop no processes during batch writes
- Deadlock caused by paging thread racing with
faulting thread
13Page victimization stage 1
- next_victim
- pfdat p choose_victim()
- p.lock()
- if (! (p.status valid
- p.status dirty))
- p.unlock()
- goto next_victim
14Page victimization stage 2
- foreach segment s in p.owner_list
- if (s.trylock() ALREADY_LOCKED)
- p.unlock()
- / do something! (p.r.d.) /
- remove p from s
- / also deletes any TLB mappings /
- delete s from p.owner_list
- s.unlock()
15Page victimization stage 3
- if (p.status DIRTY)
- p.status PUSHOUT
- schedule p for disk write
- p.unlock()
- goto next_victim
- else
- unbind(p.file, p.pageNum, p)
- p.status UNALLOCATED
- add_to_free_list(p)
- p.unlock()
16Discussion questions (1)
- Why have IO_PENDING state why not just keep
pfdat locked until data valid? - What happens when
- Some thread discovers IO_PENDING and blocks.
Before it restarts, that page is victimized. - Page chosen as victim is being actively used by
application code.
17Discussion questions (2)
- What mechanisms ensure that a page is only read
from disk once despite multiple processes
faulting at the same time? - Why is it safe to skip checking for PUSHOUT in
fault stage 2? - Write out the invariants that support your
reasoning.
18Discussion questions (3)
- Louis Reasoner suggests releasing the segment
lock at the end of fault stg 1 and reacquiring it
for stg 3. This will speed up parallel threads.
What could go wrong? - At the point marked p.r.d. (victim stg 2), Louis
suggests goto next_victimWhat could go wrong?
19Design lessons
- Causes of complexity
- data structure traversed in multiple directions
- high level of concurrency for performance
- Symptoms of complexity
- nontrivial mapping from locks to objects
- invariants relating thread, lock, and object
states across multiple data structures
20Loose vs tight concurrency
- Loose
- Separate subsystems connected by simple protocols
- Use often, for performance or simplicity
- Tight
- Shared data structures with complex invariants
- Only use where you have to
- Minimize code and states involved
21Page frame sample invariants
- All pfdat p
- (p.status UNALLOCATED)
- lookup(p.file, p.pageNum) p
- all processes will find same pfdat
- p.status ! INVALID
- therefore only 1 process will read disk
- (p.status UNALLOCATED
- p.status PUSHOUT)
- gt p.owner_list empty
- therefore no TLB mappings to PUSHOUT
- avoiding cache consistency problems