Title: VM Design Issues
1VM Design Issues
- Vivek Pai / Kai Li
- Princeton University
2Mini-Gedankenexperimenten
- Whats the refresh rate of your monitor?
- What is the access time of a hard drive?
- What response time determines sluggishness or
speediness? Whats the relation? - What determines the running speed of a program
thats paging heavily? - If you have a program that pages heavily, what
are your options to improve the situation?
3Mechanics
- Lets finish off last lecture
- Memory mapping, Unified VM next time
- No assigned reading yet, may not exist
- Mid-term on track
- Covers everything before it
- Open QA session?
- Is there interest?
- If so, when?
4Where We Left Off Last Time
- Various approaches to evicting pages
- Some discussion about why doing even well is
hard to implement - Beladys algorithm for off-line analysis
- We just finished variations on FIFO
- In particular, enhanced FIFO with 2nd chance
5Lessons From Enhanced FIFO
- Observation its easier to evict a clean page
than a dirty page - 2nd observation sometimes the disk and CPU are
idle - Optimization when systems free, write dirty
pages back to disk, but dont evict - Called flushing often falls to pager daemon
6Least Recently Used (LRU)
- Algorithm
- Replace page that hasnt been used for the
longest time - Question
- What hardware mechanisms required to implement
LRU?
7Implementing LRU
Least recently used
Mostly recently used
5
3
4
7
9
11
2
1
15
- Perfect
- Use a timestamp on each reference
- Keep a list of pages ordered by time of reference
8Approximate LRU
Most recently used
Least recently used
LRU
N categories
pages in order of last reference
Crude LRU
2 categories
pages referenced since the last page fault
pages not referenced since the last page fault
8-bit count
. . .
256 categories
255
254
0
1
2
3
9Aging Not Frequently Used (NFU)
00000000
00000000
10000000
01000000
10100000
00000000
10000000
01000000
10100000
01010000
10000000
11000000
11100000
01110000
00111000
00000000
00000000
00000000
10000000
01000000
- Algorithm
- Shift reference bits into counters
- Pick the page with the smallest counter
- Main difference between NFU and LRU?
- NFU has a short history (counter length)
- How many bits are enough?
- In practice 8 bits are quite good
- Pros Require one reference bit
- Cons Require looking at all counters
10Where Do We Get Storage?
- 32 bit VA to 32 bit PA no space, right?
- Offset within page is the same
- No need to store offset
- 4KB page 12 bits of offset
- Those 12 bits are free in PTE
- Page other info lt 32 bits
- Makes storing info easy
11x86 Page Table Entry
Page frame number
D
L
Gl
Cw
P
U
A
Cd
Wt
O
W
V
12
31
- Valid
- Writable
- Owner (user/kernel)
- Write-through
- Cache disabled
- Accessed (referenced)
- Dirty
- PDE maps 4MB
- Global
Reserved
12What Happens on Diagonal Lines
- My screen is 1024768 pixels
- 256 colors 1 byte per pixel .75MB
- 64K colors 2 bytes/pixel 1.5MB
- Page size is 4KB
- Screen is 192 or 384 pages
- 1 page several horizontal lines
- Diagonal/vertical lines TLB badness
- Superpages to the rescue
13The Big Picture
- Weve talked about single evictions
- Most computers are multiprogrammed
- Single eviction decision still needed
- New concern allocating resources
- How to be fair enough and achieve good overall
throughput - This is a competitive world local and global
resource allocation decisions
14Program Behaviors
- 80/20 rule
- gt 80 memory references are made by lt 20 of code
- Locality
- Spatial and temporal
- Working set
- Keep a set of pages in memory would avoid a lot
of page faults
Working set
page faults
pages in memory
15Observations re Working Set
- Working set isnt static
- There often isnt a single working set
- Multiple plateaus in previous curve
- Program coding style affects working set
- Working set is hard to gauge
- Whats the working set of an interactive program?
16Working Set
- Main idea
- Keep the working set in memory
- An algorithm
- On a page fault, scan through all pages of the
process - If the reference bit is 1, record the current
time for the page - If the reference bit is 0, check the last use
time - If the page has not been used within d, replace
the page - Otherwise, go to the next
- Add the faulting page to the working set
17WSClock Paging Algorithm
- Follow the clock hand
- If the reference bit is 1, set reference bit to
0, set the current time for the page and go to
the next - If the reference bit is 0, check last use time
- If page has been used within d, go to the next
- If page hasnt been used within d and modify bit
is 1 - Schedule the page for page out and go to the next
- If page hasnt been used within d and modified
bit is 0 - Replace this page
18Simulating Modify Bit with Access Bits
- Set pages read-only if they are read-write
- Use a reserved bit to remember if the page is
really read-only - On a read fault
- If it is not really read-only, then record a
modify in the data structure and change it to
read-write - Restart the instruction
19Implementing LRU without Reference Bit
- Some machines have no reference bit
- VAX, for example
- Use the valid bit or access bit to simulate
- Invalidate all valid bits (even they are valid)
- Use a reserved bit to remember if a page is
really valid - On a page fault
- If it is a valid reference, set the valid bit and
place the page in the LRU list - If it is a invalid reference, do the page
replacement - Restart the faulting instruction
20Demand Paging
- Pure demand paging relies only on faults to bring
in pages - Problems?
- Possibly lots of faults at startup
- Ignores spatial locality
- Remedies
- Loading groups of pages per fault
- Prefetching/preloading
21Speed and Sluggishness
- Slow is gt .1 seconds (100 ms)
- Speedy is ltlt .1 seconds
- Monitors tend to be 60 Hz
- lt16.7ms between screen paints
- Disks have seek rotational delay
- Seek is somewhere between 7-16 ms
- At 7200rpm, one rotation 1/120 sec 8ms.
Half-rotation is 4ms - Conclusion? One disk access OK, six are bad
22Disk Address
- Use physical memory as a cache for disk
- Where to find a page on a page fault?
- PPage field is a disk address
Virtual address space
Physical memory
invalid
23Imagine a Global LRU
- Global across all processes
- Idea when a page is needed, pick the oldest
page in the system - Problems? Process mixes?
- Interactive processes
- Active large-memory sweep processes
- Mitigating damage?
24Amdahls Law
- Gene Amdahl (IBM, then Amdahl)
- Noticed the bottlenecks to speedup
- Assume speedup affects one component
- New time
- (1-not affected) affected/speedup
- In other words, diminishing returns
25NT x86 Virtual Address Space Layouts
00000000
Application code Globals Per-thread stacks DLL
code
3-GB user space
7FFFFFFF 80000000
Kernel exec HAL Boot drivers
C0000000 C0800000
Process page tables Hyperspace
BFFFFFFF C0000000
System cache Paged pool Nonpaged pool
1-GB system space
FFFFFFFF
FFFFFFFF
26Virtual Address Space in Win95 and Win98
00000000
User accessible
Unique per process (per application), user mode
7FFFFFFF 80000000
Shared, process-writable (DLLs, shared
memory, Win16 applications)
Systemwide user mode
C0000000
Win95 and Win98
Systemwide kernel mode
Operating system (Ring 0 components)
FFFFFFFF
27Details with VM Management
- Create a processs virtual address space
- Allocate page table entries (reserve in NT)
- Allocate backing store space (commit in NT)
- Put related info into PCB
- Destroy a virtual address space
- Deallocate all disk pages (decommit in NT)
- Deallocate all page table entries (release in NT)
- Deallocate all page frames
28Page States (NT)
- Active Part of a working set and a PTE points to
it - Transition I/O in progress (not in any working
sets) - Standby Was in a working set, but removed. A
PTE points to it, not modified and invalid. - Modified Was in a working set, but removed. A
PTE points to it, modified and invalid. - Modified no write Same as modified but no write
back - Free Free with non-zero content
- Zeroed Free with zero content
- Bad hardware errors
29Dynamics in NT VM
Demand zero fault
Page in or allocation
Standby list
Free list
Zero list
Bad list
Process working set
Modified writer
Zero thread
Soft faults
Modified list
Working set replacement
30Shared Memory
- How to destroy a virtual address space?
- Link all PTEs
- Reference count
- How to swap out/in?
- Link all PTEs
- Operation on all entries
- How to pin/unpin?
- Link all PTEs
- Reference count
w
. . .
. . .
Page table
. . .
Process 1
w
Physical pages
. . .
. . .
Page table
Process 2
31Copy-On-Write
- Childs virtual address space uses the same page
mapping as parents - Make all pages read-only
- Make child process ready
- On a read, nothing happens
- On a write, generates an access fault
- map to a new page frame
- copy the page over
- restart the instruction
r
r
. . .
. . .
Page table
. . .
Parent process
r
r
Physical pages
. . .
. . .
Page table
Child process
32Issues of Copy-On-Write
- How to destroy an address space
- Same as shared memory case?
- How to swap in/out?
- Same as shared memory
- How to pin/unpin
- Same as shared memory