Title: Paging, Page Tables, and Such
1Paging, Page Tables, and Such
2Todays Topics
- Page Replacement Strategies
- Making Paging Fast
- Reducing the Overhead of Page Tables
3Review Working Sets
Request / second of throughput
thrashing )-
Over-allocation
Number of page frames allocated to process
4Page Replacement
- What happens when we take a page fault and weve
run out of memory? - Goal Keep each processs working set in memory
- Giving more than the working set not necessary
- Key issue how do we identify working sets?
5Beladys Algorithm
- Evict the page that wont be used for the longest
time in the future - This page is probably not in the working set
- If it is in the working set, were thrashing
- This is optimal!
- Minimizes the number of page faults
- Major problem this requires a crystal ball
- There is no good way to predict future memory
accesses
6How Good are These Page Replacement Algorithms?
- LIFO
- Newest page is kicked out
- FIFO
- Oldest page is kicked out
- Random
- Random page is kicked out
- LRU
- Least recently used page is kicked out
7Temporal Locality
- Assumption recently accessed pages will be
accessed again soon - Use the past to predict the future
- LIFO is horrendous
- Random is also pretty bad
- LRU is pretty good
- FIFO is mediocre
- VAX VMS used a form of FIFO because of hardware
limitations
8Implementing LRU Approach 1
on each memory reference long timeStamp
System.currentTimeMillis() sortedList.insert(p
ageFrameNumber,timeStamp)
- Problem this is too inefficient
- Time stamp data structure manipulation on each
memory operation - Too complex for hardware
9Making LRU Efficient
- Use hardware support
- Reference bit is set when pages are accessed
- Can be cleared by the OS
- Trade off accuracy for speed
- It suffices to find a pretty old page
20
2
1
1
1
page frame number
prot
M
R
V
10Approach 2 LRU Approximation with Reference Bits
- For each page, maintain a set of reference bits
- Lets call it a reference byte
- Periodically, shift the HW reference bit into the
highest-order bit of the reference byte - Suppose the reference byte was 10101010
- If the HW bit was set, the new reference bit
become 11010101 - Frame with the lowest value is the LRU page
11Analyzing Reference Bits
- Pro Does not impose overhead on every memory
reference - Interval rate can be configured
- Con Scanning all page frames can still be
inefficient - e.g., 4 GB of memory, 4KB pages gt 1 million page
frames
12Approach 3 LRU Clock
- Use only a single bit per page frame
- Basically, this is a degenerate form of reference
bits - On page eviction
- Scan through the list of reference bits
- If the value is zero, replace this page
- If the value is one, set the value to zero
13Why Clock?
Typically implemented with a circular queue
0
0
0
0
1
1
0
0
1
1
0
0
14Analyzing Clock
- Pro Very low overhead
- Only runs when a page needs evicted
- Takes the first page that hasnt been referenced
- Con Isnt very accurate (one measly bit!)
- Degenerates into FIFO if all reference bits are
set - Pro But, the algorithm is self-regulating
- If there is a lot of memory pressure, the clock
runs more often (and is more up-to-date)
15When Does LRU Do Badly?
- LRU performs poorly when there is little temporal
locality
- Example Many database workloads
SELECT FROM Employees WHERE Salary lt 25000
16Todays Topics
- Page Replacement Strategies
- Making Paging Fast
- Reducing the Overhead of Page Tables
17Review Mechanics of address translation
virtual address
offset
virtual page
physical memory
page frame 0
page table
page frame 1
physical address
page frame 2
offset
page frame
page frame
page frame 3
page frame Y
Problem page tables live in memory
18Making Paging Fast
- We must avoid a page table lookup for every
memory reference - This would double memory access time
- Solution Translation Lookaside Buffer
- Fancy name for a cache
- TLB stores a subset of PTEs (page table
translation entries) - TLBs are small and fast (16-48 entries)
- Can be accessed for free
19TLB Details
- In practice, most (gt 99) of memory translations
handled by the TLB - Each processor has its own TLB
- TLB is fully associative
- Any TLB slot can hold any PTE entry
- The full VPN is the cache key
- All entries are searched in parallel
- Who fills the TLB? Two options
- Hardware (x86) walks the page table on a TLB miss
- Software (MIPS, Alpha) routine fills the TLB on a
miss - TLB itself needs a replacement policy
- Usually implemented in hardware (LRU)
20What Happens on a Context Switch?
- Each process has its own address space
- So, each process has its own page table
- So, page-table entries are only relevant for a
particular process - Thus, the TLB must be flushed on a context switch
- This is why context switches are so expensive
21Bens Idea
- We can avoid flushing the TLB if entries are
associated with an address space - When would this work well?
- When would this not work well?
20
2
1
1
1
4
page frame number
prot
M
R
V
ASID
22TLB Management Pain
- TLB is a cache of page table entries
- OS must ensure that page tables and TLB entries
stay in sync - Massive pain TLB consistency across multiple
processors - Q How do we implement LRU if reference bits are
stored in the TLB? - One answer we dont
- Windows uses FIFO for multiprocessor machines
23Todays Topics
- Page Replacement Strategies
- Making Paging Fast
- Reducing the Overhead of Page Tables
24Page Table Overhead
- For large address space, page table sizes can
become enormous - Example Alpha architecture
- 64 bit address space, 8KB pages
Num PTEs 264 / 213 251 Assuming 8 bytes
per PTE Num Bytes 254 16 Petabytes
And, this is per-process!
25Optimizing for Sparse Address Spaces
- Observation very little of the address space is
in use at a given time - This is why virtual memory works
- Basic idea only allocate page tables where we
need to - And, fill in new page tables on demand
virtual address space
26Implementing Sparse Address Spaces
- We need a data structure to keep track of the
page tables we have allocated - And, this structure must be small
- Otherwise, weve defeated our original goal
- Solution multi-level page tables
- Page tables of page tables
- Any problem in CS can be solved with a layer of
indirection
27Two level page tables
virtual address
secondary page
master page
offset
physical memory
page frame 0
master page table
physical address
page frame 1
offset
page frame
secondary page table
secondary page table
page frame 2
page frame 3
empty
page frame number
empty
page frame Y
Key point not all secondary page tables must be
allocated
28Generalizing
- Early architectures used 1-level page tables
- VAX, x86 used 2-level page tables
- SPARC uses 3-level page tables
- Alpha 68030 uses 4-level page tables
- Key thing is that the outer level must be wired
down (pinned in physical memory) in order to
break the recursion
29Cool Paging Tricks
- Basic Idea exploit the layer of indirection
between virtual and physical memory
30Trick 1 Shared Memory
- Allow different processes to share physical memory
Virt Address space 2
Virt Address space 1
Physical memory
31Trick 2 Copy-on-write
- Recall that fork() copies the parents address
space to the client - This is ineffient, especially if the child calls
exec - Copy-on-write allows for a fast copy by using
shared pages - If the child tries to write to a page, the OS
intervenes and makes a copy of the target page - Implementation pages are shared as read-only
- OS intercepts write faults
page frame number
prot
M
R
V
32Trick 3 Memory-mapped Files
- Normally, files are accessed with system calls
- Open, read, write, close
- Memory mapping allows a program to access a file
with load/store operations
Virt Address space
Foo.txt