The Memory Hierarchy II CPSC 321 presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Memory Hierarchy II CPSC 321

1
The Memory Hierarchy II CPSC 321

Andreas Klappenecker

2
Todays Menu

Cache
Virtual Memory
Translation Lookaside Buffer

3
Caches

Why? How?

4
Memory

Users want large and fast memories
SRAM is too expensive for main memory
DRAM is too slow for many purposes
Compromise
Build a memory hierarchy

5
Locality

If an item is referenced, then
it will be again referenced soon
(temporal locality)
nearby data will be referenced soon
(spatial locality)
Why does code have locality?

6
Direct Mapped Cache

Mapping address modulo the number of blocks in
the cache, x -gt x mod B

7
Direct Mapped Cache
The index is determined by address mod 1024

Cache with 1024210 words
tag from cache is compared against upper portion
of the address
If tagupper 20 bits and valid bit is set, then
we have a cache hit otherwise it is a cache
missWhat kind of locality are we
taking advantage of?

8
Direct Mapped Cache

Taking advantage of spatial locality

9
Bits in a Cache

How many total bits are required for a
direct-mapped cache with 16 KB of data, 4 word
blocks, assuming a 32 bit address?
16 KB 4K words 212 words
Block size of 4 words gt 210 blocks
Each block has 4 x 32 128 bits of data tag
valid bit
tag valid bit (32 10 2 2) 1 19
Total cache size 210(128 19) 210 147
Therefore, 147 KB are needed for the cache

10
Cache Block Mapping

Direct mapped cache
a block goes in exactly one place in the cache
Fully associative cache
a block can go anywhere in the cache
it is difficult to find a block
parallel comparison to speed-up search
Set associative cache
a block can go to a (small) number of places
compromise between the two extremes above

11
Cache Types
12
Set Associative Caches

Each block maps to a unique set,
the block can be placed into any element of that
set,
Position is given by
(Block number) modulo ( of sets in cache)
If the sets contain n elements, then the cache is
called n-way set associative

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Summary Where can a Block be Placed?
18
Summary How is a Block Found?
19
Virtual Memory
20
Virtual Memory

Processor generates virtual addresses
Memory is accessed using physical addresses
Virtual and physical memory is broken into blocks
of memory, called pages
A virtual page may be
absent from main memory, residing on the disk
or may be mapped to a physical page

21
Virtual Memory

Main memory can act as a cache for the secondary
storage (disk)
Virtual address generated by processor (left)
Address translation (middle)
Physical addresses (right)

22
Pages virtual memory blocks

Page faults if data is not in memory, retrieve
it from disk
huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
reducing page faults is important (LRU is worth
the price)
can handle the faults in software instead of
hardware
using write-through takes too long so we use
writeback
Example page size 2124KB 218 physical pages
main memory lt 1GB virtual memory lt 4GB

23
Page Faults

Incredible high penalty for a page fault
Reduce number of page faults by optimizing page
placement
Use fully associative placement
full search of pages is impractical
pages are located by a full table that indexes
the memory, called the page table
the page table resides within the memory

24
Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
25
Page Tables

26
Making Memory Access Fast

Page tables slow us down
Memory access will take at least twice as long
access page table in memory
access page
What can we do?

Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
27
Making Address Translation Fast

A cache for address translations translation
lookaside buffer

28
Translation Lookaside Buffer

Some typical values for a TLB
TLB size 32-4096
Block size 1-2 page table entries (4-8bytes
each)
Hit time 0.5-1 clock cycle
Miss penalty 10-30 clock cycles
Miss rate 0.01-1

29
TLBs and Caches
30
More Modern Systems

Very complicated memory systems

31
Some Issues

Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times
Design challenge dealing with this growing
disparity
Trends
synchronous SRAMs (provide a burst of data)
redesign DRAM chips to provide higher bandwidth
or processing
restructure code to increase locality
use prefetching (make cache visible to ISA)

Write a Comment

User Comments (0)

About PowerShow.com

The Memory Hierarchy II CPSC 321 PowerPoint PPT Presentation