Title: Cache controller, Translation LookAside Buffers, Virtual vs' Physical Cache Design
1Cache controller, Translation Look-Aside
Buffers, Virtual vs. Physical Cache Design
- ECE 411 - Fall 2009
- Lecture 8
2Cache Controller FSM
Could partition into separate states to reduce
clock cycle time
3Cache Performance Example
- Given
- I-cache miss rate 2
- D-cache miss rate 4
- Miss penalty 100 cycles
- Base CPI (ideal cache) 2
- Load stores are 36 of instructions
- Miss cycles per instruction
- I-cache 0.02 100 2
- D-cache 0.36 0.04 100 1.44
- Actual CPI 2 2 1.44 5.44
4Average Access Time
- Hit time is also important for performance
- Average memory access time (AMAT)
- AMAT Hit time Miss rate Miss penalty
- Example
- CPU with hit time 1 cycle, miss penalty 20
cycles, I-cache miss rate 5 - AMAT 1 0.05 20 2 cycles
- 2 cycles per instruction
5Virtual to Physical Address Translation
- Paged Virtual Memory - Fixed-size pages (e.g.,
4K) - Virtual Address is subdivided into Virtual Page
Number and Page Offset - Virtual Page Number is translated into Physical
Page Number - Page Offset remain unchanged
6Page Tables
- A Page Table stores placement information
- An array of page table entries, indexed by
virtual page number - Page table register in CPU points to page table
in physical memory - If page is present in memory
- PTE stores the physical page number
- Plus other status bits (referenced, dirty, )
- If page is not present
- PTE can refer to location in swap space on disk
7Translation Using a Page Table
There is also protection information, etc. in
each entry.
8Mapping Pages to Physical Memory and Disk Storage
9Fast Translation Using a TLB
- Address translation in general requires extra
memory references - One to access the PTE
- Then the actual memory access
- Real translation systems do even more accesses
due to 2-level translation. - But access to page tables has good locality
- Many accesses are made to a small number of pages
- So use a fast cache of PTEs within the CPU
- Called a Translation Look-aside Buffer (TLB)
- Typical 16512 PTEs, 0.51 cycle for hit, 10100
cycles for miss, 0.011 miss rate - Misses could be handled by hardware or software
10Fast Translation Using a TLB
We will work through a 2-way set associative TLB
design on the board.
11TLB Misses
- If page is in memory
- Load the PTE from memory and retry
- Could be handled in hardware
- Can get complex for more complicated page table
structures - Or in software
- Raise a special exception, with optimized handler
- If page is not in memory (page fault)
- OS handles fetching the page and updating the
page table - Then restart the faulting instruction
12TLB and Cache Interaction
- MAR holds virtual address
- If cache tag uses physical address
- Need to translate before cache lookup
- Alternative use virtual address tag
- Complications due to aliasing
13Memory Protection
- Different processes can share parts of their
virtual address spaces - But need to protect against errant access
- Requires OS assistance
- Hardware support for OS protection
- Privileged supervisor mode (aka kernel mode)
- Privileged instructions
- Page tables and other state information only
accessible in supervisor mode - System call exception (e.g., syscall in MIPS)
14Block Placement
- Determined by associativity
- Direct mapped (1-way associative)
- One choice for placement
- n-way set associative
- n choices within a set
- Fully associative
- Any location
- Higher associativity reduces miss rate
- Increases complexity, cost, and access time
152-Level TLB Organization
163-Level Cache Organization
17Previously..
- Discussed Virtual Memory
- Decouples program address space from the physical
implementation of memory - Discussed Caching
- Exploits spatial and temporal locality in
instruction and data accesses - Will Discuss
- How do caches and virtual memory interact?
18Caches and Virtual Memory
- Do we send virtual or physical addresses to the
cache? - Virtual ? faster, because dont have to translate
- Issue Different programs can reference the same
virtual address, either creates security hole or
requires flushing the cache every time you
context switch - Physical ? slower, but no security issue
- Actually, there are four possibilities
19Virtually Addressed, Virtually Tagged
Virtual Address
Only translate address on cache miss
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
20Physically Addressed, Physically Tagged
Virtual Address
Tag
Set
Offset
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
21Physically Addressed, Virtually Tagged
Virtual Address
Tag
Set
Offset
Worst of both worlds, pretty much never used
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
22Virtually Addressed, Physically Tagged
Speed of using virtual address for cache lookup,
security of using physical address for hit/miss
detection. Very common in real systems
Virtual Address
Tag
Set
Offset
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
23Virtually Addressed, Physically Tagged Caches
- Issue Want the set bits of an address to be the
same in both the virtual and physical address - Might have multiple virtual addresses that map
onto the same physical address (example sharing
data between programs) - Only the offset bits of the virtual address are
guaranteed not to change when we translate to
physical address
24Virtually Addressed, Physically Tagged Caches
- Implication log2( of sets cache line length)
must be lt log2(page size) - Length of (set offset) fields in cache lt
length of offset field in each page - Each way in the cache must be lt a page in
capacity - Sometimes leads designers to select
very-associative caches in order to get the
capacity they want
25Putting it all together