ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7) - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7)

Description:

byte offset. Fall 2006, Dec. 1, 4. ELEC 5200-002/6200-002 Lecture 13. 26. Finding a Word in Cache ... How Many Bits Cache Has? Consider a main memory: ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 54
Provided by: vishwani1
Category:

less

Transcript and Presenter's Notes

Title: ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7)


1
ELEC 5200-002/6200-002Computer Architecture and
DesignFall 2006 Memory Organization (Chapter 7)
  • Vishwani D. Agrawal
  • James J. Danaher Professor
  • Department of Electrical and Computer Engineering
  • Auburn University, Auburn, AL 36849
  • http//www.eng.auburn.edu/vagrawal
  • vagrawal_at_eng.auburn.edu

2
Types of Computer Memories
From the cover of A. S. Tanenbaum, Structured
Computer Organization, Fifth Edition, Upper
Saddle River, New Jersey Pearson Prentice Hall,
2006.
3
Electronic Memory Devices
Memory technology Typical access time Clock rate GHz Cost per GB in 2004
SRAM 0.5-5 ns 0.2-2.0 GHz 4k-10k
DRAM 50-70 ns 15-20 MHz 100-200
Magnetic disk 5-20 ms 50-200 Hz 0.5-2
4
Random Access Memory (RAM)
Address bits
Address decoder
Memory cell array
Read/write circuits
Data bits
5
Static RAM (SRAM) Cell
bit
bit
Word line
Bit line
Bit line
6
Dynamic RAM (DRAM) Cell
Bit line
Word line
7
Cost of 40GB Memory
Type of memory Cost Clock rate
SRAM 160k 0.2-2.0 GHz
DRAM 4k 15-20 MHz
Disk 20 50-200 Hz
8
Trying to Buy a Laptop Computer?Two Years Ago
IBM ThinkPad X40 23717GU1.20 GHz Low Voltage
Intel Pentium M, 1MB L2 Cache
5 Microsoft Windows XP Professional 512 MB
DRAM 100 40 GB Hard Drive 40 2.71
lbs, 12.1" XGA (1024x768) IBM Embedded Security
Subsystem 2.0 Intel PRO/Wireless Network
Connection 802.11b, Gigabit Ethernet Integrated
graphics Intel Extreme Graphics 2 No CD/DVD drive
PROP, Fixed BayAvailability Within 2
weeks2,149.00 IBM web price1,741.65 sale
price
9
2006
Choose a Lenovo 3000 V Series to
customize buy
From 999.00
Sale price 949.00
Processor Intel Core 2 Duo T5500 (1.66GHz, 2MBL2,
667MHzFSB)
Total memory 512MB PC2-5300DDR2 SDRAM
Hard drive 80GB, 5400rpm Serial ATA
Weight 4.0lbs
10
Cache
  • Processor does all memory operations with cache.
  • Miss If requested word is not in cache, a block
    of words containing the requested word is brought
    to cache, and then the processor request is
    completed.
  • Hit If the requested word is in cache, read or
    write operation is performed directly in cache,
    without accessing main memory.
  • Block minimum amount of data transferred
    between cache and main memory.

Processor
words
Cache Small, fast memory
blocks
Main memory large, inexpensive (slow)
11
Invention of Cache
M. V. Wilkes, Slave Memories and Dynamic Storage
Allocation, IEEE Transactions on Electronic
Computers, vol. EC-14, no. 2, pp. 270-271, April
1965.
12
Cache Performance
Processor
  • Average access time
  • T1 h Tm (1 h)
  • Tm (Tm T1) h
  • where
  • T1 cache access time (small)
  • Tm memory access time (large)
  • h hit rate (0 h 1)
  • Hit rate is also known as hit ratio,
  • miss rate 1 hit rate

Access time T1
Cache Small, fast memory
Access time Tm
Main memory large, inexpensive (slow)
13
Average Access Time
Acceptable miss rate lt 10
Tm
Tm (Tm T1) h
Access time
Desirable miss rate lt 5
T1
miss rate, 1 h
0 h 1
1 h 0
14
Comparing Performance
  • Processor without cache, CPI 1
  • Assume memory access time of 10 cycles
  • Assume 30 instructions require memory access
  • Processor with cache
  • Assume hit rate 0.95 for instructions, 0.90 for
    data
  • Assume miss penalty (time to read memory into
    cache and from it) is 17 cycles
  • Comparing times of 100 instructions
  • Time without cache
    10010 3010
  • ------------ -------------------------
    ---
  • Time with cache 100(0.9510.0517)
    30(0.910.117)
  • 5.04

15
Controlling Miss Rate
  • Increase cache size
  • More blocks can be kept in cache chance of miss
    is reduced.
  • Larger cache is slower.
  • Increase block size
  • More data available reduced chance of miss.
  • Fewer blocks in cache increase chance of miss.
  • Larger blocks need more time to swap.

Cache
Blocks
Large memory
Cache
Blocks
Large memory
16
Increasing Hit Rate
  • Hit rate increases with cache size.
  • Hit rate mildly depends on block size.

90 95 100
10
Cache size 4KB
Decreasing chances of getting fragmented data
Improving chances of getting localized data
hit rate, h
5
miss rate 1 hit rate
16KB
64KB
0
16B 32B 64B 128B 256B
Block size
17
The Locality Principle
  • A program tends to access data that form a
    physical cluster in the memory multiple
    accesses may be made within the same block.
  • Physical localities are temporal and may shift
    over longer periods of time data not used for
    some time is less likely to be used in the
    future. Upon miss, the least recently used (LRU)
    block can be overwitten by a new block.
  • P. J. Denning, The Locality Principle,
    Communications of the ACM, vol. 48, no. 7, pp.
    19-24, July 2005.

18
Data Locality, Cache, Blocks
Memory
Increase block size to match locality size
Increase cache size to increase most data
Cache
Data needed by a program
Block 1
Block 2
19
Types of Caches
  • Direct-mapped cache
  • Partitions of size of cache in the memory
  • Each partition subdivided into blocks
  • Set-associative cache

20
Direct-Mapped Cache
Memory
LRU
Swap-out
Cache
Data needed by a program
Block 1
Swap-in
Block 2
Data needed
21
Set-Associative Cache
Memory
LRU
Swap-out
Cache
Data needed by a program
Block 1
Swap-in
Block 2
Data needed
22
Direct-Mapped Cache
00000 00001 00010 00011 00100 00101 00110 00111 0
1000 01001 01010 01011 01100 01101 01110 01111 10
000 10001 10010 10011 10100 10101 10110 10111 110
00 11001 11010 11011 11100 11101 11110 11111
Cache of 8 blocks
index (local address)
Block size 1 word
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
cache address tag index
Main memory
11 101 ? memory address
23
Direct-Mapped Cache
00000 00001 00010 00011 00100 00101 00110 00111 0
1000 01001 01010 01011 01100 01101 01110 01111 10
000 10001 10010 10011 10100 10101 10110 10111 110
00 11001 11010 11011 11100 11101 11110 11111

Cache of 4 blocks
Block size 2 word
index (local address)
tag
32-word word-addressable memory
00 01 10 11
00 11 00 10
block offset
0
1
cache address tag index block offset
Main memory
11 10 1 ? memory address
24
Number of Tag and Index Bits
Cache Sizew words
Main memory SizeW words
Each word in cache has unique index (local
addr.) Number of index bits log2w Index bits
are shared with block offset when a block
contains more words than 1 Assume partitions of
w words each in the main memory. W/w such
partitions, each identified by a tag Number of
tag bits log2(W/w)
25
Direct-Mapped Cache (Byte Address)
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Cache of 8 blocks
Block size 1 word
index
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word byte-addressable memory
cache address tag index
Main memory
11 101 00 ? memory address
byte offset
26
Finding a Word in Cache
Memory address
Tag
byte offset
32 words byte-address
Index
Valid 2-bit Index bit Tag Data
000 001 010 011 100 101 110 111
Cache size 8 words
Block size 1 word

Data
1 hit 0 miss
27
How Many Bits Cache Has?
  • Consider a main memory
  • 32 words byte address is 7 bits wide b6 b5 b4
    b3 b2 b1 b0
  • Each word is 32 bits wide
  • Assume that cache block size is 1 word (32 bits
    data) and it contains 8 blocks.
  • Cache requires, for each word
  • 2 bit tag, and one valid bit
  • Total storage needed in cache
  • blocks in cache (data bits/block tag
    bits valid bit)
  • 8 (3221) 280 bits

28
A More Realistic Cache
  • Consider 4 GB, byte-addressable main memory
  • 1Gwords byte address is 32 bits wide b31b16
    b15b2 b1 b0
  • Each word is 32 bits wide
  • Assume that cache block size is 1 word (32 bits
    data) and it contains 64 KB data, or 16K words,
    i.e., 16K blocks.
  • Number of cache index bits 14, because 16K
    214
  • Tag size 32 byte offset index bits 32
    2 14 16 bits
  • Cache requires, for each word
  • 16 bit tag, and one valid bit
  • Total storage needed in cache
  • blocks in cache (data bits/block tag
    size valid bits)
  • 214(32161) 1621049 784210 bits 784
    Kb 98 KB
  • Physical storage/Data storage 98/64 1.53

29
Cache Bits for 4-Word Block
  • Consider 4 GB, byte-addressable main memory
  • 1Gwords byte address is 32 bits wide b31b16
    b15b2 b1 b0
  • Each word is 32 bits wide
  • Assume that cache block size is 4 words (128 bits
    data) and it contains 64 KB data, or 16K words,
    i.e., 4K blocks.
  • Number of cache index bits 12, because 4K 212
  • Tag size 32 byte offset block offset bits
    index bits
  • 32 2 2 12 16 bits
  • Cache requires, for each word
  • 16 bit tag, and one valid bit
  • Total storage needed in cache
  • blocks in cache (data bits/block tag size
    valid bit)
  • 212(432161) 4210145 580210 bits 580
    Kb 72.5 KB
  • Physical storage/Data storage 72.5/64 1.13

30
Using Larger Cache Block (4 Words)
Memory address
b31 b15 b14 b4 b3 b2 b1 b0
16 bit Tag
4GB 1G words byte-address
byte offset
12 bit Index
Val. 16-bit Data Index bit Tag
(4 words128 bits)
2 bit block offset
0000 0000 0000
Cache size 16K words
4K Indexes
Block size 4 word
1111 1111 1111

1 hit 0 miss
M U X
Data
31
Interleaved Memory
  • Reduces miss penalty.
  • Memory designed to read words of a block
    simultaneously in one read operation.
  • Example
  • Cache block size 4 words
  • Interleaved memory with 4 banks
  • Suppose memory access 15 cycles
  • Miss penalty 1 cycle to send address 15
    cycles to read a block 4 cycles to send data to
    cache 20 cycles
  • Without interleaving, Miss penalty 65 cycles

Processor
words
Cache Small, fast memory
blocks
Memory bank 0
Memory bank 1
Memory bank 2
Memory bank 3
Main memory
32
Handling a Miss
  • Miss occurs when data at the required memory
    address is not found in cache.
  • Controller actions
  • Stall pipeline
  • Freeze contents of all registers
  • Activate a separate cache controller
  • If cache is full
  • select the least recently used (LRU) block in
    cache for over-writing
  • If selected block has inconsistent data, take
    proper action
  • Copy the block containing the requested address
    from memory
  • Restart Instruction

33
Miss During Instruction Fetch
  • Send original PC value (PC 4) to the memory.
  • Instruct main memory to perform a read and wait
    for the memory to complete the access.
  • Write cache entry.
  • Restart the instruction whose fetch failed.

34
Writing to Memory
  • Cache and memory become inconsistent when data is
    written into cache, but not to memory the cache
    coherence problem.
  • Strategies to handle inconsistent data
  • Write-through
  • Write to memory and cache simultaneously always.
  • Write to memory is 100 times slower than to
    cache.
  • Write buffer
  • Write to cache and to buffer for writing to
    memory.
  • If buffer is full, the processor must wait.

35
Writing to Memory Write-Back
  • Write-back (or copy back) writes only to cache
    but sets a dirty bit in the block where write
    is performed.
  • When a block with dirty bit on is to be
    overwritten in the cache, it is first written to
    the memory.

36
AMD Opteron Microprocessor
L2 1MB Block 64B Write-back
L1 (split 64KB each) Block 64B Write-back
37
Cache Hierarchy
Processor
  • Average access time
  • h1 T1 (1 h1) h2T2(1 h2)Tm
  • Where
  • T1 L1 cache access time (smallest)
  • T2 L2 cache access time (small)
  • Tm memory access time (large)
  • h1, h2 hit rates (0 h1, h2 1)
  • Average access time reduces by adding a cache.

Access time T1
L1 Cache (SRAM)
Access time T2
Access time Tm
L2 Cache (DRAM)
Main memory large, inexpensive (slow)
38
Average Access Time
h1 T1 (1 - h1) h2 T2 (1 - h2)Tm
Tm
T1 lt T2 lt Tm
h2 0
T2Tm 2
Access time
h2 0.5
h2 1
T2
T1
miss rate, 1- h1
0 h11
1 h10
39
Processor Performance Without Cache
  • 5GHz processor, cycle time 0.2ns
  • Memory access time 100ns 500 cycles
  • Ignoring memory access, CPI 1
  • Considering memory access
  • CPI 1 stall cycles
  • 1 500 501

40
Performance with 1 Level Cache
  • Assume hit rate, h1 0.95
  • L1 access time 0.2ns 1 cycle
  • CPI 1 stall cycles
  • 1 0.950 0.05500
  • 26
  • Processor speed increase due to cache
  • 501/26 19.3

41
Performance with 2 Level Caches
  • Assume
  • L1 hit rate, h1 0.95
  • L2 hit rate, h2 0.90
  • L2 access time 5ns 25 cycles
  • CPI 1 stall cycles
  • 1 0.950 0.05(0.9025 0.10525)
  • 1 1.125 2.625 4.75
  • Processor speed increase due to caches
  • 501/4.75 105.5
  • Speed increase due to L2 cache
  • 26/4.75 5.47

42
Miss Rate of Direct-Mapped Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
index
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
Least recently used (LRU) block
cache address tag index
Main memory
11 101 00 ? memory address
byte offset
43
Miss Rate of Direct-Mapped Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
1. mis
Block size 1 word
3. mis
index
2. mis
tag
000 001 010 011 100 101 110 111
00 / 01 / 00 / 10 xx xx xx xx xx 00 xx
4. mis
32-word word-addressable memory
5. mis
cache address tag index
6. mis
Main memory
11 101 00 ? memory address
byte offset
44
Fully-Associative Cache (8-Way Set Associative)
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
tag

000 001 010 011 100 101 110 01010 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
LRU block
cache address tag
Main memory
11101 00 ? memory address
byte offset
45
Miss Rate Fully-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
1. miss
Block size 1 word
4. miss
tag

2. miss
00000 01000 00110 10000 xxxxx xxxxx xxxxx
xxxxx
32-word word-addressable memory
6. miss
5. hit
3. hit
cache address tag
Main memory
11101 00 ? memory address
byte offset
46
Finding a Word in Associative Cache
Memory address
b6 b5 b4 b3 b2 b1 b0
5 bit Tag
byte offset
32 words byte-address
no index
Index Valid 5-bit Data bit Tag
Cache size 8 words
Block size 1 word
Must compare with all tags in the cache

Data
1 hit 0 miss
47
Eight-Way Set-Associative Cache
Cache size 8 words
Memory address
b6 b5 b4 b3 b2 b1 b0
32 words byte-address
byte offset
Block size 1 word

5 bit Tag

V tag data
V tag data
V tag data
V tag data
V tag data
V tag data
V tag data
V tag data








8 to 1 multiplexer
Data
1 hit 0 miss
48
Two-Way Set-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
tags
index
00 01 10 11
000 011 100 001 110 101 010 111
32-word word-addressable memory
LRU block
cache address tag index
Main memory
111 01 00 ? memory address
byte offset
49
Miss Rate Two-Way Set-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
Block size 1 word
1. miss
tags
index
2. miss
00 01 10 11
000 010 xxx xxx 001 xxx xxx xxx
4. miss
32-word word-addressable memory
3. hit
5. hit
6. miss
cache address tag index
Main memory
111 01 00 ? memory address
byte offset
50
Two-Way Set-Associative Cache
Memory address
b6 b5 b4 b3 b2 b1 b0
Cache size 8 words
32 words byte-address
byte offset

3 bit tag
Block size 1 word

2 bit index
00 01 10 11


Data
2 to 1 MUX
1 hit 0 miss
51
Virtual Memory System
Physical address
Virtual or logical address
MMU Memory management unit
Processor
Cache
Data
Physical address
Data
Main memory
Disk
DMA Direct memory access
52
Virtual vs. Physical Address
  • Processor assumes a certain memory addressing
    scheme
  • A block of data is called a virtual page
  • An address is called virtual (or logical) address
  • Main memory may have a different addressing
    scheme
  • Memory address is called physical address
  • MMU translates virtual address to physical
    address
  • Complete address translation table is large and
    kept in main memory
  • MMU contains TLB (translation lookaside buffer),
    which is a small cache of the address translation
    table ? address translation can create its own
    hit or miss

53
Page Fault
Disk
Main Memory
Processor Cache MMU (TLB)
Pages (Write-back, same as in cache)
Cache miss a required block is not found in
cache TLB miss a required virtual address is
not found in TLB
Cached pages, Page table Page fault a required
page is not found in main memory
All data, organized in Pages (4KB), accessed
by Physical addresses
Page fault in virtual memory is similar to
miss in cache.
Write a Comment
User Comments (0)
About PowerShow.com