The Memory Hierarchy II CPSC 321 PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: The Memory Hierarchy II CPSC 321


1
The Memory Hierarchy II CPSC 321
  • Andreas Klappenecker

2
Todays Menu
  • Cache
  • Virtual Memory
  • Translation Lookaside Buffer

3
Caches
  • Why? How?

4
Memory
  • Users want large and fast memories
  • SRAM is too expensive for main memory
  • DRAM is too slow for many purposes
  • Compromise
  • Build a memory hierarchy

5
Locality
  • If an item is referenced, then
  • it will be again referenced soon
  • (temporal locality)
  • nearby data will be referenced soon
  • (spatial locality)
  • Why does code have locality?

6
Direct Mapped Cache
  • Mapping address modulo the number of blocks in
    the cache, x -gt x mod B

7
Direct Mapped Cache
The index is determined by address mod 1024
  • Cache with 1024210 words
  • tag from cache is compared against upper portion
    of the address
  • If tagupper 20 bits and valid bit is set, then
    we have a cache hit otherwise it is a cache
    missWhat kind of locality are we
    taking advantage of?

8
Direct Mapped Cache
  • Taking advantage of spatial locality

9
Bits in a Cache
  • How many total bits are required for a
    direct-mapped cache with 16 KB of data, 4 word
    blocks, assuming a 32 bit address?
  • 16 KB 4K words 212 words
  • Block size of 4 words gt 210 blocks
  • Each block has 4 x 32 128 bits of data tag
    valid bit
  • tag valid bit (32 10 2 2) 1 19
  • Total cache size 210(128 19) 210 147
  • Therefore, 147 KB are needed for the cache

10
Cache Block Mapping
  • Direct mapped cache
  • a block goes in exactly one place in the cache
  • Fully associative cache
  • a block can go anywhere in the cache
  • it is difficult to find a block
  • parallel comparison to speed-up search
  • Set associative cache
  • a block can go to a (small) number of places
  • compromise between the two extremes above

11
Cache Types
12
Set Associative Caches
  • Each block maps to a unique set,
  • the block can be placed into any element of that
    set,
  • Position is given by
  • (Block number) modulo ( of sets in cache)
  • If the sets contain n elements, then the cache is
    called n-way set associative

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Summary Where can a Block be Placed?
18
Summary How is a Block Found?
19
Virtual Memory
20
Virtual Memory
  • Processor generates virtual addresses
  • Memory is accessed using physical addresses
  • Virtual and physical memory is broken into blocks
    of memory, called pages
  • A virtual page may be
  • absent from main memory, residing on the disk
  • or may be mapped to a physical page

21
Virtual Memory
  • Main memory can act as a cache for the secondary
    storage (disk)
  • Virtual address generated by processor (left)
  • Address translation (middle)
  • Physical addresses (right)

22
Pages virtual memory blocks
  • Page faults if data is not in memory, retrieve
    it from disk
  • huge miss penalty, thus pages should be fairly
    large (e.g., 4KB)
  • reducing page faults is important (LRU is worth
    the price)
  • can handle the faults in software instead of
    hardware
  • using write-through takes too long so we use
    writeback
  • Example page size 2124KB 218 physical pages
  • main memory lt 1GB virtual memory lt 4GB

23
Page Faults
  • Incredible high penalty for a page fault
  • Reduce number of page faults by optimizing page
    placement
  • Use fully associative placement
  • full search of pages is impractical
  • pages are located by a full table that indexes
    the memory, called the page table
  • the page table resides within the memory

24
Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
25
Page Tables

26
Making Memory Access Fast
  • Page tables slow us down
  • Memory access will take at least twice as long
  • access page table in memory
  • access page
  • What can we do?

Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
27
Making Address Translation Fast
  • A cache for address translations translation
    lookaside buffer

28
Translation Lookaside Buffer
  • Some typical values for a TLB
  • TLB size 32-4096
  • Block size 1-2 page table entries (4-8bytes
    each)
  • Hit time 0.5-1 clock cycle
  • Miss penalty 10-30 clock cycles
  • Miss rate 0.01-1

29
TLBs and Caches
30
More Modern Systems
  • Very complicated memory systems

31
Some Issues
  • Processor speeds continue to increase very
    fast much faster than either DRAM or disk
    access times
  • Design challenge dealing with this growing
    disparity
  • Trends
  • synchronous SRAMs (provide a burst of data)
  • redesign DRAM chips to provide higher bandwidth
    or processing
  • restructure code to increase locality
  • use prefetching (make cache visible to ISA)
Write a Comment
User Comments (0)
About PowerShow.com