Title: The Memory Hierarchy II CPSC 321
1The Memory Hierarchy II CPSC 321
2Todays Menu
- Cache
- Virtual Memory
- Translation Lookaside Buffer
3Caches
4Memory
- Users want large and fast memories
- SRAM is too expensive for main memory
- DRAM is too slow for many purposes
- Compromise
- Build a memory hierarchy
5Locality
- If an item is referenced, then
- it will be again referenced soon
- (temporal locality)
- nearby data will be referenced soon
- (spatial locality)
- Why does code have locality?
6Direct Mapped Cache
- Mapping address modulo the number of blocks in
the cache, x -gt x mod B
7Direct Mapped Cache
The index is determined by address mod 1024
- Cache with 1024210 words
- tag from cache is compared against upper portion
of the address - If tagupper 20 bits and valid bit is set, then
we have a cache hit otherwise it is a cache
missWhat kind of locality are we
taking advantage of?
8Direct Mapped Cache
- Taking advantage of spatial locality
9Bits in a Cache
- How many total bits are required for a
direct-mapped cache with 16 KB of data, 4 word
blocks, assuming a 32 bit address? - 16 KB 4K words 212 words
- Block size of 4 words gt 210 blocks
- Each block has 4 x 32 128 bits of data tag
valid bit - tag valid bit (32 10 2 2) 1 19
- Total cache size 210(128 19) 210 147
- Therefore, 147 KB are needed for the cache
10Cache Block Mapping
- Direct mapped cache
- a block goes in exactly one place in the cache
- Fully associative cache
- a block can go anywhere in the cache
- it is difficult to find a block
- parallel comparison to speed-up search
- Set associative cache
- a block can go to a (small) number of places
- compromise between the two extremes above
11Cache Types
12Set Associative Caches
- Each block maps to a unique set,
- the block can be placed into any element of that
set, - Position is given by
- (Block number) modulo ( of sets in cache)
- If the sets contain n elements, then the cache is
called n-way set associative
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Summary Where can a Block be Placed?
18Summary How is a Block Found?
19Virtual Memory
20Virtual Memory
- Processor generates virtual addresses
- Memory is accessed using physical addresses
- Virtual and physical memory is broken into blocks
of memory, called pages - A virtual page may be
- absent from main memory, residing on the disk
- or may be mapped to a physical page
21Virtual Memory
- Main memory can act as a cache for the secondary
storage (disk) - Virtual address generated by processor (left)
- Address translation (middle)
- Physical addresses (right)
22Pages virtual memory blocks
- Page faults if data is not in memory, retrieve
it from disk - huge miss penalty, thus pages should be fairly
large (e.g., 4KB) - reducing page faults is important (LRU is worth
the price) - can handle the faults in software instead of
hardware - using write-through takes too long so we use
writeback - Example page size 2124KB 218 physical pages
- main memory lt 1GB virtual memory lt 4GB
23Page Faults
- Incredible high penalty for a page fault
- Reduce number of page faults by optimizing page
placement - Use fully associative placement
- full search of pages is impractical
- pages are located by a full table that indexes
the memory, called the page table - the page table resides within the memory
24Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
25Page Tables
26Making Memory Access Fast
- Page tables slow us down
- Memory access will take at least twice as long
- access page table in memory
- access page
- What can we do?
Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
27Making Address Translation Fast
- A cache for address translations translation
lookaside buffer
28Translation Lookaside Buffer
- Some typical values for a TLB
- TLB size 32-4096
- Block size 1-2 page table entries (4-8bytes
each) - Hit time 0.5-1 clock cycle
- Miss penalty 10-30 clock cycles
- Miss rate 0.01-1
29TLBs and Caches
30More Modern Systems
- Very complicated memory systems
31Some Issues
- Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times - Design challenge dealing with this growing
disparity - Trends
- synchronous SRAMs (provide a burst of data)
- redesign DRAM chips to provide higher bandwidth
or processing - restructure code to increase locality
- use prefetching (make cache visible to ISA)