Cache controller, Translation LookAside Buffers, Virtual vs' Physical Cache Design PowerPoint PPT Presentation

presentation player overlay
1 / 25
About This Presentation
Transcript and Presenter's Notes

Title: Cache controller, Translation LookAside Buffers, Virtual vs' Physical Cache Design


1
Cache controller, Translation Look-Aside
Buffers, Virtual vs. Physical Cache Design
  • ECE 411 - Fall 2009
  • Lecture 8

2
Cache Controller FSM
Could partition into separate states to reduce
clock cycle time
3
Cache Performance Example
  • Given
  • I-cache miss rate 2
  • D-cache miss rate 4
  • Miss penalty 100 cycles
  • Base CPI (ideal cache) 2
  • Load stores are 36 of instructions
  • Miss cycles per instruction
  • I-cache 0.02 100 2
  • D-cache 0.36 0.04 100 1.44
  • Actual CPI 2 2 1.44 5.44

4
Average Access Time
  • Hit time is also important for performance
  • Average memory access time (AMAT)
  • AMAT Hit time Miss rate Miss penalty
  • Example
  • CPU with hit time 1 cycle, miss penalty 20
    cycles, I-cache miss rate 5
  • AMAT 1 0.05 20 2 cycles
  • 2 cycles per instruction

5
Virtual to Physical Address Translation
  • Paged Virtual Memory - Fixed-size pages (e.g.,
    4K)
  • Virtual Address is subdivided into Virtual Page
    Number and Page Offset
  • Virtual Page Number is translated into Physical
    Page Number
  • Page Offset remain unchanged

6
Page Tables
  • A Page Table stores placement information
  • An array of page table entries, indexed by
    virtual page number
  • Page table register in CPU points to page table
    in physical memory
  • If page is present in memory
  • PTE stores the physical page number
  • Plus other status bits (referenced, dirty, )
  • If page is not present
  • PTE can refer to location in swap space on disk

7
Translation Using a Page Table
There is also protection information, etc. in
each entry.
8
Mapping Pages to Physical Memory and Disk Storage
9
Fast Translation Using a TLB
  • Address translation in general requires extra
    memory references
  • One to access the PTE
  • Then the actual memory access
  • Real translation systems do even more accesses
    due to 2-level translation.
  • But access to page tables has good locality
  • Many accesses are made to a small number of pages
  • So use a fast cache of PTEs within the CPU
  • Called a Translation Look-aside Buffer (TLB)
  • Typical 16512 PTEs, 0.51 cycle for hit, 10100
    cycles for miss, 0.011 miss rate
  • Misses could be handled by hardware or software

10
Fast Translation Using a TLB
We will work through a 2-way set associative TLB
design on the board.
11
TLB Misses
  • If page is in memory
  • Load the PTE from memory and retry
  • Could be handled in hardware
  • Can get complex for more complicated page table
    structures
  • Or in software
  • Raise a special exception, with optimized handler
  • If page is not in memory (page fault)
  • OS handles fetching the page and updating the
    page table
  • Then restart the faulting instruction

12
TLB and Cache Interaction
  • MAR holds virtual address
  • If cache tag uses physical address
  • Need to translate before cache lookup
  • Alternative use virtual address tag
  • Complications due to aliasing

13
Memory Protection
  • Different processes can share parts of their
    virtual address spaces
  • But need to protect against errant access
  • Requires OS assistance
  • Hardware support for OS protection
  • Privileged supervisor mode (aka kernel mode)
  • Privileged instructions
  • Page tables and other state information only
    accessible in supervisor mode
  • System call exception (e.g., syscall in MIPS)

14
Block Placement
  • Determined by associativity
  • Direct mapped (1-way associative)
  • One choice for placement
  • n-way set associative
  • n choices within a set
  • Fully associative
  • Any location
  • Higher associativity reduces miss rate
  • Increases complexity, cost, and access time

15
2-Level TLB Organization
16
3-Level Cache Organization
17
Previously..
  • Discussed Virtual Memory
  • Decouples program address space from the physical
    implementation of memory
  • Discussed Caching
  • Exploits spatial and temporal locality in
    instruction and data accesses
  • Will Discuss
  • How do caches and virtual memory interact?

18
Caches and Virtual Memory
  • Do we send virtual or physical addresses to the
    cache?
  • Virtual ? faster, because dont have to translate
  • Issue Different programs can reference the same
    virtual address, either creates security hole or
    requires flushing the cache every time you
    context switch
  • Physical ? slower, but no security issue
  • Actually, there are four possibilities

19
Virtually Addressed, Virtually Tagged
Virtual Address
Only translate address on cache miss
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
20
Physically Addressed, Physically Tagged
Virtual Address
Tag
Set
Offset
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
21
Physically Addressed, Virtually Tagged
Virtual Address
Tag
Set
Offset
Worst of both worlds, pretty much never used
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
22
Virtually Addressed, Physically Tagged
Speed of using virtual address for cache lookup,
security of using physical address for hit/miss
detection. Very common in real systems
Virtual Address
Tag
Set
Offset
TLB
Physical Address
Tag
Set
Offset
Tag Array
Data Array
Hit?
Hit?
23
Virtually Addressed, Physically Tagged Caches
  • Issue Want the set bits of an address to be the
    same in both the virtual and physical address
  • Might have multiple virtual addresses that map
    onto the same physical address (example sharing
    data between programs)
  • Only the offset bits of the virtual address are
    guaranteed not to change when we translate to
    physical address

24
Virtually Addressed, Physically Tagged Caches
  • Implication log2( of sets cache line length)
    must be lt log2(page size)
  • Length of (set offset) fields in cache lt
    length of offset field in each page
  • Each way in the cache must be lt a page in
    capacity
  • Sometimes leads designers to select
    very-associative caches in order to get the
    capacity they want

25
Putting it all together
Write a Comment
User Comments (0)
About PowerShow.com