Title: Cpsc 318 Computer Structures Lecture 17 Virtual Memory
1Cpsc 318Computer Structures Lecture 17
Virtual Memory Cache
- Dr. Son Vuong
- (vuong_at_cs.ubc.ca)
- March 23, 2004
2Why Caches?
µProc 60/yr.
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
DRAM 7-9/yr.
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
- 1989 first Intel CPU with cache on chip
- 1998 Pentium III has two levels of cache on chip
3Review (1/2)
- Caches are NOT mandatory
- Processor performs arithmetic
- Memory stores data
- Caches simply make things go faster
- Each level of memory hierarchy is just a subset
of next higher level - Caches speed up due to temporal locality store
data used recently - Block size gt 1 word speeds up due to spatial
locality store words adjacent to the ones used
recently
4Review (2/2)
- Cache design choices
- size of cache speed v. capacity
- direct-mapped v. associative
- for N-way set assoc choice of N
- block replacement policy
- 2nd level cache?
- Write through v. write back?
- Use performance model to pick between choices,
depending on programs, technology, budget, ...
5Another View of the Memory Hierarchy
Regs
Upper Level
Instr. Operands
Faster
Cache
Blocks
L2 Cache
Blocks
Memory
Pages
Disk
Files
Larger
Tape
Lower Level
6Virtual Memory
- If Principle of Locality allows caches to offer
(usually) speed of cache memory with size of DRAM
memory,then recursively why not use at next
level to give speed of DRAM memory, size of Disk
memory? - Called Virtual Memory
- Also allows OS to share memory, protect programs
from each other - Today, more important for protection vs. just
another level of memory hierarchy - Historically, it predates caches
7Virtual to Physical Addr. Translation
Program operates in its virtual address space
Physical memory (incl. caches)
HW mapping
virtual address (inst. fetch load, store)
physical address (inst. fetch load, store)
- Each program operates in its own virtual address
space only program running - Each process is protected from the other
- OS can decide where each goes in memory
- Hardware (HW) provides virtual -gt physical mapping
8Mapping Virtual Memory to Physical Memory
Virtual Memory
- Divide into equal sizedchunks (page of about
4KB)
Stack
Any chunk of Virtual Memory assigned to any chunk
of Physical Memory
Physical Memory
64 MB
0
0
9Virtual Memory Mapping Function
- Cannot have simple function to predict arbitrary
mapping - Use table lookup of mappings
Virtual address
- Use table lookup (Page Table) for mappings
Page number is index - Virtual Memory Mapping Function
- Physical Offset Virtual Offset
- Physical Page Number PageTableVirtual Page
Number - (Physical Page also called Page Frame)
10Paging Organization (assume 1 KB pages)
11Page Table
- A page table is an operating system structure
which contains the mapping of virtual addresses
to physical locations - There are several different ways, all up to the
operating system, to keep this data around - Each process running in the operating system has
its own page table - State of process is PC, all registers, plus
page table - OS changes page tables by changing contents of
Page Table Base Register
12Address Mapping Page Table
(actually concatenation)
Page Table
...
V
A.R.
P. P. A.
Access Rights
Physical Page Address
Val -id
Physical Memory Address
V
A.R.
P. P. A.
V
A.R.
Disk. A.
...
Disk
Page Table located in physical memory
13Notes on Page Table
- Solves Fragmentation problem all chunks same
size, so all holes can be used - OS must reserve Swap Space on disk for each
process - To grow a process, ask Operating System
- If unused pages, OS uses them first
- If not, OS swaps some old pages to disk
- (Least Recently Used to pick pages to swap)
- Each process has own Page Table
- Will add details, but Page Table is essence of
Virtual Memory
14Comparing the 2 levels of hierarchy
- Cache Version Virtual Memory vers.
- Block (or Line) Page
- Miss Page Fault
- Block Size 32-64B Page Size 4K-8KB
- Placement Fully AssociativeDirect Mapped,
N-way Set Associative - Replacement Least Recently UsedLRU or
Random (LRU) - Write Thru or Back Write Back
15Virtual Memory Problem 1
- Map every address ? 1 indirection via Page Table
in memory per virtual address - ? 1 virtual memory accesses 2 physical memory
accesses ? SLOW! - Observation since locality in pages of data,
there must be locality in virtual address
translations of those pages - Since small is fast, why not use a small cache of
virtual to physical address translations to make
translation fast? - For historical reasons, cache is called a
Translation Lookaside Buffer, or TLB
16Translation Look-Aside Buffers
- TLBs usually small, typically 128 - 256 entries
- Like any other cache, the TLB can be direct
mapped, set associative, or fully associative
hit
PA
miss
VA
TLB Lookup
Cache
Main Memory
Processor
miss
hit
Trans- lation
data
17Typical TLB Format
Virtual Physical Dirty Ref Valid
Access Address Address Rights
- TLB just a cache on the page table mappings
- TLB access time comparable to cache (much
less than main memory access time) - Dirty since use write back, need to know
whether or not to write page to disk when
replaced - Ref Used to help calculate LRU on replacement
- Cleared by OS periodically, then checked to see
if page was referenced
18What if we don't have enough memory?
- We chose some other page belonging to a program
and transfer it onto the disk if it is dirty - If clean (disk copy is up-to-date), just
overwrite that data in memory - We chose the page to evict based on replacement
policy (e.g., LRU) - And update that program's page table to reflect
the fact that its memory moved somewhere else
19Virtual Memory Problem 2
- Not enough physical memory!
- Only, say, 64 MB of physical memory
- N processes, each 4 GB (232 B) of virtual memory!
- Could have 1K virtual pages/physical page!
- Spatial Locality to the rescue
- Each page is 4 KB, lots of nearby references
- No matter how big program is, at any time only
accessing a few pages - Working Set recently used pages
20Virtual Memory Problem 3
- Page Table too big!
- 4GB Virtual Memory 4 KB page ? 1 million
Page Table Entries ? 4 MB just for Page Table
for 1 process, 25 processes ? 100 MB for Page
Tables! - Variety of solutions to tradeoff memory size of
mapping function for slower when miss TLB - Make TLB large enough, highly associative so
rarely miss on address translation - CS 315 will go over more options and in greater
depth
212-level Page Table
22Page Table Shrink
Only have second level page table for valid
entries of super level page table
23Space Savings for Multi-Level Page Table
- If only 10 of entries of Super Page Table have
valid entries, then total mapping size is roughly
1/10-th of single level page table - Exercise 7.35 explores exact size
24Three Advantages of Virtual Memory
- 1) Translation
- Program can be given consistent view of memory,
even though physical memory is scrambled - Makes multiple processes reasonable
- Only the most important part of program (Working
Set) must be in physical memory - Contiguous structures (like stacks) use only as
much physical memory as necessary yet still grow
later
25Three Advantages of Virtual Memory
- 2) Protection
- Different processes protected from each other
- Different pages can be given special behavior
- (Read Only, Invisible to user programs, etc).
- Kernel data protected from User programs
- Very important for protection from malicious
programs ? Far more viruses under Microsoft
Windows - Special Mode in processor (Kernel more) allows
processor to change page table/TLB - 3) Sharing
- Can map same physical page to multiple
users(Shared memory)
26Crossing the System Boundary
- System loads user program into memory and gives
it use of the processor - Switch back
- SYSCALL
- request service
- I/O
- TRAP (overflow)
- Interrupt
User
Proc
Mem
System
I/O Bus
data reg.
27Instruction Set Support for VM/OS
- How to prevent user program from changing page
tables and go anywhere? - Bit in Status Register determines whether in user
mode or OS (kernel) mode Kernel/User bit (KU)
(0 ? kernel, 1 ? user)
- On exception/interrupt disable interrupts (IE0)
and go into kernel mode (KU0) - Only change the page table when in kernel mode
(Operating System)
284 Questions for Memory Hierarchy
- Q1 Where can a block be placed in the upper
level? (Block placement) - Q2 How is a block found if it is in the upper
level?(Block identification) - Q3 Which block should be replaced on a miss?
(Block replacement) - Q4 What happens on a write? (Write strategy)
29Q1 Where block placed in upper level?
- Block 12 placed in 8 block cache
- Fully associative, direct mapped, 2-way set
associative - S.A. Mapping Block Number Mod Number Sets
Block no.
0 1 2 3 4 5 6 7
Block no.
0 1 2 3 4 5 6 7
Block no.
0 1 2 3 4 5 6 7
Set 0
Set 1
Set 2
Set 3
Set associative block 12 can go anywhere in set
0 (12 mod 4)
Fully associative block 12 can go anywhere
Direct mapped block 12 can go only into block 4
(12 mod 8)
30Q2 How is a block found in upper level?
Set Select
Data Select
- Direct indexing (using index and block offset),
tag compares, or combination - Increasing associativity shrinks index, expands
tag
31Q3 Which block replaced on a miss?
- Easy for Direct Mapped
- Set Associative or Fully Associative
- Random
- LRU (Least Recently Used)
- Miss RatesAssociativity 2-way 4-way
8-way - Size LRU Ran LRU Ran LRU Ran
- 16 KB 5.2 5.7 4.7 5.3 4.4 5.0
- 64 KB 1.9 2.0 1.5 1.7 1.4 1.5
- 256 KB 1.15 1.17 1.13 1.13 1.12
1.12
32Q4 What to do on a write hit?
- Write-through
- update the word in cache block and corresponding
word in memory - Write-back
- update word in cache block
- allow memory word to be stale
- gt add dirty bit to each line indicating that
memory be updated when block is replaced - gt OS flushes cache before I/O !!!
- Performance trade-offs?
- WT read misses cannot result in writes
- WB no writes of repeated writes
33Virtual Memory Overview
- Lets say were fetching some data
- Check TLB (input VPN, output PPN)
- hit fetch translation
- miss check page table (in memory)
- Page table hit fetch translation
- Page table miss page fault, fetch page from disk
to memory, return translation to TLB - Check cache (input PPN, output data)
- hit return value
- miss fetch value from memory
34Paging/Virtual Memory with TLB
User B Virtual Memory
User A Virtual Memory
Physical Memory
Stack
Stack
64 MB
Heap
Heap
Static
Static
0
Code
Code
0
0
35Virtual Memory Overview
- TLB usually small, typically 128 - 256 entries
- BS 1-2 page table entries (4-8 B each)
- Hit time .5-1 cycle
- Miss penalty 10-30 cycles
- Miss rate .01-1
hit
PA
miss
VA
TLB Lookup
Cache
Main Memory
Processor
miss
hit
Trans- lation
data
36Address Translation 3 Concept tests
TLB
...
P. P. N.
V. P. N.
Physical Page Number
Virtual Page Number
V. P. N.
P. P. N.
37Cache/VM/TLB Summary 1/3
- The Principle of Locality
- Program access a relatively small portion of the
address space at any instant of time. - Temporal Locality Locality in Time
- Spatial Locality Locality in Space
- Caches, TLBs, Virtual Memory all understood by
examining how they deal with 4 questions 1)
Where can block be placed? 2) How is block
found? 3) What block is replaced on miss? 4)
How are writes handled?
38Cache/VM/TLB Summary 2/3
- Virtual Memory allows protected sharing of memory
between processes with less swapping to disk,
less fragmentation than always swap or base/bound - 3 Problems
- 1) Not enough memory Spatial Locality means
small Working Set of pages OK - 2) TLB to reduce performance cost of VM
- 3) Need more compact representation to reduce
memory size cost of simple 1-level page table,
especially for 64-bit address
39Cache/VM/TLB Summary 3/3
- Virtual memory was controversial at the time can
SW automatically manage 64KB across many
programs? - 1000X DRAM growth removed controversy
- Today VM allows many processes to share single
memory without having to swap all processes to
disk VM protection today is more important than
memory hierarchy - Today CPU time is a function of (ops, cache
misses) vs. just f(ops)What does this mean to
Compilers, Data structures, Algorithms?
40Reading quiz
- 1. The page table is a memory data structure that
maps virtual pages to physical pages. It would
seem then that every virtual memory accesses
would result in two physical memory accesses one
to translate into physical addresses via the page
table and a second to get to the actual data. How
do operating systems and processors avoid such a
high overhead for virtual memory? - 2. In addition to having a valid bit and a dirty
bit, some page tables have a reference bit. If
the bit is a one, it means that the page has been
accessed since the last time the operating system
set the bit to zero. What is the purpose of such
a bit? Can you think of a way to get a similar
effect without a reference bit?
41Reading quiz
- 1. The standard four questions for memory
hierarchies emphasize the similarities between
caches and virtual memory. Some combinations of
the options that make sense for caches would be
silly in a virtual memory . Which combinations
would you never expect to see in a real system?
Why? - 2. Is a TLB a cache? If so, what is it a cache
of? Is it OK to use either write through or write
back? Why?
42Bonus rest of slides could appear after slide 43
Impact of Caches?
- 1960-1985 Speed (no. operations)
- 1990s
- Pipelined Execution Fast Clock Rate
- Out-of-Order execution
- Superscalar
- 2001 Speed (non-cached memory accesses)?
43Quicksort vs. Radix as vary number keys
Instructions
Radix sort
Quick sort
Instructions / key
Set size in keys
44Quicksort vs. Radix as vary number keys
Instructions and Time
Radix sort
Time / key
Quick sort
Instructions / key
Set size in keys
45Quicksort vs. Radix as vary number keys Cache
misses
What is proper approach to fast algorithms?
Radix sort
Cache misses / key
Quick sort
Set size in keys
46Bonus slide Kernel/User Mode
- Generally restrict device access, page table to
OS - HOW?
- Add a mode bit to the machine K/U
- Only allow SW in kernel mode to access device
registers, page table - If user programs could access I/O devices and
page tables directly? - could destroy each others data, ...
- might break the devices,