Title: Practical, transparent operating system support for superpages
1Practical, transparent operating system support
for superpages
- Juan Navarro ? Sitaram Iyer
- Peter Druschel ? Alan Cox
Rice University
OSDI 2002
2Overview
- Increasing cost in TLB miss overhead
- growing working sets
- TLB size does not grow at same pace
- Processors now provide superpages
- one TLB entry can map a large region
- OSs have been slow to harness them
- no transparent superpage support for apps
- This talk a practical and transparent solution
to support superpages
3Translation look-aside buffer
- TLB caches virtual-to-physical address
translations - TLB coverage
- amount of memory mapped by TLB
- amount of memory that can be accessed without TLB
misses
4TLB coverage trend
TLB coverage as percentage of main memory
Factor of 1000 decrease in 15 years
5How to increase TLB coverage
- Typical TLB coverage ? 1 MB
- Use superpages!
- large and small pages
- Increase TLB coverage
- no increase in TLB size
6What are these superpages anyway?
- Memory pages of larger sizes
- supported by most modern CPUs
- Otherwise, same as normal pages
- power of 2 size
- use only one TLB entry
- contiguous
- aligned (physically and virtually)
- uniform protection attributes
- one reference bit, one dirty bit
7A superpage TLB
Alpha 8,64,512KB 4MB Itanium 4,8,16,64,256KB
1,4,16,64,256MB
virtual memory
base page entry (size1)
physical address
virtual address
superpage entry (size4)
TLB
physical memory
8II The superpage problem
9Issue 1 superpage allocation
virtual memory
B
superpage boundaries
physical memory
B
- How / when / what size to allocate?
10Issue 2 promotion
- Promotion create a superpage out of a set of
smaller pages - mark page table entry of each base page
- When to promote?
Forcibly populate pages? May incur I/O cost or
increase internal fragmentation.
11Issue 3 demotion
Demotion convert a superpage into smaller pages
- when page attributes of base pages of a superpage
become non-uniform - during partial pageouts
12Issue 4 fragmentation
- Memory becomes fragmented due to
- use of multiple page sizes
- scattered wired (non-pageable) pages
- Contiguity contended resource
- OS must
- use contiguity restoration techniques
- trade off impact of contiguity restoration
against superpage benefits
13Previous approaches
- Reservations
- one superpage size only
- Relocation
- move pages at promotion time
- must recover copying costs
- Eager superpage creation (IRIX, HP-UX)
- size specified by user non-transparent
- Hardware support
- Contiguous virtual superpage mapped to
discontiguous physical base pages - Demotion issues not addressed
- large pages partially dirty/referenced
14IIIDesign
15Key observation
Once an application touches the first page of a
memory object then it is likely that it will
quickly touch every page of that object
- Example array initialization
- Opportunistic policies
- superpages as large and as soon as possible
- as long as no penalty if wrong decision
16Superpage allocation
Preemptible reservations
virtual memory
B
superpage boundaries
physical memory
B
reserved frames
How much do we reserve? Goal good TLB
coverage,without internal fragmentation.
17Allocation reservation size
- Opportunistic policy
- Go for biggest size that is no larger than the
memory object (e.g., file) - If required size not available, try preemption
before resigning to a smaller size - preempted reservation had its chance
18Allocation managing reservations
largest unused (and aligned) chunk
4
2
1
- best candidate for preemption at front
- reservation whose most recently populated frame
was populated the least recently
19Incremental promotions
- Promotion policy opportunistic
2
4
42
8
20Speculative demotions
- One reference bit per superpage
- How do we detect portions of a superpage not
referenced anymore? - On memory pressure, demote superpages when
resetting ref bit - Re-promote (incrementally) as pages are
referenced - Demote also when the page daemon selects a base
page as a victim page.
21Demotions dirty superpages
- One dirty bit per superpage
- whats dirty and whats not?
- page out entire superpage
- Demote on first write to clean superpage
write
- Re-promote (incrementally) as other pages are
dirtied
22Fragmentation control
- Low contiguity modified page daemon for victim
selection - restore contiguity
- move clean, inactive pages to the free list
- minimize impact
- prefer pages that contribute the most to
contiguity - Cluster wired pages
23IVExperimentalevaluation
24Experimental setup
- FreeBSD 4.3
- Alpha 21264, 500 MHz, 512 MB RAM
- 8 KB, 64 KB, 512 KB, 4 MB pages
- 128-entry DTLB, 128-entry ITLB
- Unmodified applications
25Best-case benefits
- TLB miss reduction usually above 95
- SPEC CPU2000 integer
- 11.2 improvement (0 to 38)
- SPEC CPU2000 floating point
- 11.0 improvement (-1.5 to 83)
- Other benchmarks
- FFT (2003 matrix) 55
- 1000x1000 matrix transpose 655
- 30 in 8 out of 35 benchmarks
26Why multiple superpage sizes
- Improvements with only one superpage size vs. all
sizes
27Conclusions
- Superpages 30 improvement
- transparently realized low overhead
- Contiguity restoration is necessary
- sustains benefits low impact
- Multiple page sizes are important
- scales to very large superpages
28Thanks!
- Source code and more info at
- www.cs.rice.edu/jnavarro/superpages
29Backup slides
30Superpage allocation
virtual memory
B
superpage boundaries
physical memory
B
Copying costs
31Population maps
Populationnone partial full
size 8
size 4
size 2
reservation
- Keep track of population status of reservations
- Also for memory objects, to find reserved frames
- Nodes lazily created ? ops are O(log n)
32Fragmentation control impact
- Run web server concurrently with an app that
continually demands 512 KB chunks
- Impact for web server
- lt1 overhead of daemon
- 3 degradation due to deviation from LRU
- But for the other app
- 30 of requests for 512 KB are granted(9 times
more than with original daemon)