ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7)

About This Presentation

Title:

ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7)

Description:

byte offset. Fall 2006, Dec. 1, 4. ELEC 5200-002/6200-002 Lecture 13. 26. Finding a Word in Cache ... How Many Bits Cache Has? Consider a main memory: ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 54

Provided by: vishwani1

Category:

more less

Transcript and Presenter's Notes

Title: ELEC 5200-002/6200-002 Computer Architecture and Design Fall 2006 Memory Organization (Chapter 7)

1
ELEC 5200-002/6200-002Computer Architecture and
DesignFall 2006 Memory Organization (Chapter 7)

Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849
http//www.eng.auburn.edu/vagrawal
vagrawal_at_eng.auburn.edu

2
Types of Computer Memories
From the cover of A. S. Tanenbaum, Structured
Computer Organization, Fifth Edition, Upper
Saddle River, New Jersey Pearson Prentice Hall,
2006.
3
Electronic Memory Devices
Memory technology Typical access time Clock rate GHz Cost per GB in 2004
SRAM 0.5-5 ns 0.2-2.0 GHz 4k-10k
DRAM 50-70 ns 15-20 MHz 100-200
Magnetic disk 5-20 ms 50-200 Hz 0.5-2
4
Random Access Memory (RAM)
Address bits
Address decoder
Memory cell array
Read/write circuits
Data bits
5
Static RAM (SRAM) Cell
bit
bit
Word line
Bit line
Bit line
6
Dynamic RAM (DRAM) Cell
Bit line
Word line
7
Cost of 40GB Memory
Type of memory Cost Clock rate
SRAM 160k 0.2-2.0 GHz
DRAM 4k 15-20 MHz
Disk 20 50-200 Hz
8
Trying to Buy a Laptop Computer?Two Years Ago
IBM ThinkPad X40 23717GU1.20 GHz Low Voltage
Intel Pentium M, 1MB L2 Cache
5 Microsoft Windows XP Professional 512 MB
DRAM 100 40 GB Hard Drive 40 2.71
lbs, 12.1" XGA (1024x768) IBM Embedded Security
Subsystem 2.0 Intel PRO/Wireless Network
Connection 802.11b, Gigabit Ethernet Integrated
graphics Intel Extreme Graphics 2 No CD/DVD drive
PROP, Fixed BayAvailability Within 2
weeks2,149.00 IBM web price1,741.65 sale
price
9
2006
Choose a Lenovo 3000 V Series to
customize buy
From 999.00
Sale price 949.00
Processor Intel Core 2 Duo T5500 (1.66GHz, 2MBL2,
667MHzFSB)
Total memory 512MB PC2-5300DDR2 SDRAM
Hard drive 80GB, 5400rpm Serial ATA
Weight 4.0lbs
10
Cache

Processor does all memory operations with cache.
Miss If requested word is not in cache, a block
of words containing the requested word is brought
to cache, and then the processor request is
completed.
Hit If the requested word is in cache, read or
write operation is performed directly in cache,
without accessing main memory.
Block minimum amount of data transferred
between cache and main memory.

Processor
words
Cache Small, fast memory
blocks
Main memory large, inexpensive (slow)
11
Invention of Cache
M. V. Wilkes, Slave Memories and Dynamic Storage
Allocation, IEEE Transactions on Electronic
Computers, vol. EC-14, no. 2, pp. 270-271, April
1965.
12
Cache Performance
Processor

Average access time
T1 h Tm (1 h)
Tm (Tm T1) h
where
T1 cache access time (small)
Tm memory access time (large)
h hit rate (0 h 1)
Hit rate is also known as hit ratio,
miss rate 1 hit rate

Access time T1
Cache Small, fast memory
Access time Tm
Main memory large, inexpensive (slow)
13
Average Access Time
Acceptable miss rate lt 10
Tm
Tm (Tm T1) h
Access time
Desirable miss rate lt 5
T1
miss rate, 1 h
0 h 1
1 h 0
14
Comparing Performance

Processor without cache, CPI 1
Assume memory access time of 10 cycles
Assume 30 instructions require memory access
Processor with cache
Assume hit rate 0.95 for instructions, 0.90 for
data
Assume miss penalty (time to read memory into
cache and from it) is 17 cycles
Comparing times of 100 instructions
Time without cache
10010 3010
------------ -------------------------
---
Time with cache 100(0.9510.0517)
30(0.910.117)
5.04

15
Controlling Miss Rate

Increase cache size
More blocks can be kept in cache chance of miss
is reduced.
Larger cache is slower.
Increase block size
More data available reduced chance of miss.
Fewer blocks in cache increase chance of miss.
Larger blocks need more time to swap.

Cache
Blocks
Large memory
Cache
Blocks
Large memory
16
Increasing Hit Rate

Hit rate increases with cache size.
Hit rate mildly depends on block size.

90 95 100
10
Cache size 4KB
Decreasing chances of getting fragmented data
Improving chances of getting localized data
hit rate, h
5
miss rate 1 hit rate
16KB
64KB
0
16B 32B 64B 128B 256B
Block size
17
The Locality Principle

A program tends to access data that form a
physical cluster in the memory multiple
accesses may be made within the same block.
Physical localities are temporal and may shift
over longer periods of time data not used for
some time is less likely to be used in the
future. Upon miss, the least recently used (LRU)
block can be overwitten by a new block.
P. J. Denning, The Locality Principle,
Communications of the ACM, vol. 48, no. 7, pp.
19-24, July 2005.

18
Data Locality, Cache, Blocks
Memory
Increase block size to match locality size
Increase cache size to increase most data
Cache
Data needed by a program
Block 1
Block 2
19
Types of Caches

Direct-mapped cache
Partitions of size of cache in the memory
Each partition subdivided into blocks
Set-associative cache

20
Direct-Mapped Cache
Memory
LRU
Swap-out
Cache
Data needed by a program
Block 1
Swap-in
Block 2
Data needed
21
Set-Associative Cache
Memory
LRU
Swap-out
Cache
Data needed by a program
Block 1
Swap-in
Block 2
Data needed
22
Direct-Mapped Cache
00000 00001 00010 00011 00100 00101 00110 00111 0
1000 01001 01010 01011 01100 01101 01110 01111 10
000 10001 10010 10011 10100 10101 10110 10111 110
00 11001 11010 11011 11100 11101 11110 11111
Cache of 8 blocks
index (local address)
Block size 1 word
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
cache address tag index
Main memory
11 101 ? memory address
23
Direct-Mapped Cache
00000 00001 00010 00011 00100 00101 00110 00111 0
1000 01001 01010 01011 01100 01101 01110 01111 10
000 10001 10010 10011 10100 10101 10110 10111 110
00 11001 11010 11011 11100 11101 11110 11111

Cache of 4 blocks
Block size 2 word
index (local address)
tag
32-word word-addressable memory
00 01 10 11
00 11 00 10
block offset
0
1
cache address tag index block offset
Main memory
11 10 1 ? memory address
24
Number of Tag and Index Bits
Cache Sizew words
Main memory SizeW words
Each word in cache has unique index (local
addr.) Number of index bits log2w Index bits
are shared with block offset when a block
contains more words than 1 Assume partitions of
w words each in the main memory. W/w such
partitions, each identified by a tag Number of
tag bits log2(W/w)
25
Direct-Mapped Cache (Byte Address)
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Cache of 8 blocks
Block size 1 word
index
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word byte-addressable memory
cache address tag index
Main memory
11 101 00 ? memory address
byte offset
26
Finding a Word in Cache
Memory address
Tag
byte offset
32 words byte-address
Index
Valid 2-bit Index bit Tag Data
000 001 010 011 100 101 110 111
Cache size 8 words
Block size 1 word

Data
1 hit 0 miss
27
How Many Bits Cache Has?

Consider a main memory
32 words byte address is 7 bits wide b6 b5 b4
b3 b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 1 word (32 bits
data) and it contains 8 blocks.
Cache requires, for each word
2 bit tag, and one valid bit
Total storage needed in cache
blocks in cache (data bits/block tag
bits valid bit)
8 (3221) 280 bits

28
A More Realistic Cache

Consider 4 GB, byte-addressable main memory
1Gwords byte address is 32 bits wide b31b16
b15b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 1 word (32 bits
data) and it contains 64 KB data, or 16K words,
i.e., 16K blocks.
Number of cache index bits 14, because 16K
214
Tag size 32 byte offset index bits 32
2 14 16 bits
Cache requires, for each word
16 bit tag, and one valid bit
Total storage needed in cache
blocks in cache (data bits/block tag
size valid bits)
214(32161) 1621049 784210 bits 784
Kb 98 KB
Physical storage/Data storage 98/64 1.53

29
Cache Bits for 4-Word Block

Consider 4 GB, byte-addressable main memory
1Gwords byte address is 32 bits wide b31b16
b15b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 4 words (128 bits
data) and it contains 64 KB data, or 16K words,
i.e., 4K blocks.
Number of cache index bits 12, because 4K 212
Tag size 32 byte offset block offset bits
index bits
32 2 2 12 16 bits
Cache requires, for each word
16 bit tag, and one valid bit
Total storage needed in cache
blocks in cache (data bits/block tag size
valid bit)
212(432161) 4210145 580210 bits 580
Kb 72.5 KB
Physical storage/Data storage 72.5/64 1.13

30
Using Larger Cache Block (4 Words)
Memory address
b31 b15 b14 b4 b3 b2 b1 b0
16 bit Tag
4GB 1G words byte-address
byte offset
12 bit Index
Val. 16-bit Data Index bit Tag
(4 words128 bits)
2 bit block offset
0000 0000 0000
Cache size 16K words
4K Indexes
Block size 4 word
1111 1111 1111

1 hit 0 miss
M U X
Data
31
Interleaved Memory

Reduces miss penalty.
Memory designed to read words of a block
simultaneously in one read operation.
Example
Cache block size 4 words
Interleaved memory with 4 banks
Suppose memory access 15 cycles
Miss penalty 1 cycle to send address 15
cycles to read a block 4 cycles to send data to
cache 20 cycles
Without interleaving, Miss penalty 65 cycles

Processor
words
Cache Small, fast memory
blocks
Memory bank 0
Memory bank 1
Memory bank 2
Memory bank 3
Main memory
32
Handling a Miss

Miss occurs when data at the required memory
address is not found in cache.
Controller actions
Stall pipeline
Freeze contents of all registers
Activate a separate cache controller
If cache is full
select the least recently used (LRU) block in
cache for over-writing
If selected block has inconsistent data, take
proper action
Copy the block containing the requested address
from memory
Restart Instruction

33
Miss During Instruction Fetch

Send original PC value (PC 4) to the memory.
Instruct main memory to perform a read and wait
for the memory to complete the access.
Write cache entry.
Restart the instruction whose fetch failed.

34
Writing to Memory

Cache and memory become inconsistent when data is
written into cache, but not to memory the cache
coherence problem.
Strategies to handle inconsistent data
Write-through
Write to memory and cache simultaneously always.
Write to memory is 100 times slower than to
cache.
Write buffer
Write to cache and to buffer for writing to
memory.
If buffer is full, the processor must wait.

35
Writing to Memory Write-Back

Write-back (or copy back) writes only to cache
but sets a dirty bit in the block where write
is performed.
When a block with dirty bit on is to be
overwritten in the cache, it is first written to
the memory.

36
AMD Opteron Microprocessor
L2 1MB Block 64B Write-back
L1 (split 64KB each) Block 64B Write-back
37
Cache Hierarchy
Processor

Average access time
h1 T1 (1 h1) h2T2(1 h2)Tm
Where
T1 L1 cache access time (smallest)
T2 L2 cache access time (small)
Tm memory access time (large)
h1, h2 hit rates (0 h1, h2 1)
Average access time reduces by adding a cache.

Access time T1
L1 Cache (SRAM)
Access time T2
Access time Tm
L2 Cache (DRAM)
Main memory large, inexpensive (slow)
38
Average Access Time
h1 T1 (1 - h1) h2 T2 (1 - h2)Tm
Tm
T1 lt T2 lt Tm
h2 0
T2Tm 2
Access time
h2 0.5
h2 1
T2
T1
miss rate, 1- h1
0 h11
1 h10
39
Processor Performance Without Cache

5GHz processor, cycle time 0.2ns
Memory access time 100ns 500 cycles
Ignoring memory access, CPI 1
Considering memory access
CPI 1 stall cycles
1 500 501

40
Performance with 1 Level Cache

Assume hit rate, h1 0.95
L1 access time 0.2ns 1 cycle
CPI 1 stall cycles
1 0.950 0.05500
26
Processor speed increase due to cache
501/26 19.3

41
Performance with 2 Level Caches

Assume
L1 hit rate, h1 0.95
L2 hit rate, h2 0.90
L2 access time 5ns 25 cycles
CPI 1 stall cycles
1 0.950 0.05(0.9025 0.10525)
1 1.125 2.625 4.75
Processor speed increase due to caches
501/4.75 105.5
Speed increase due to L2 cache
26/4.75 5.47

42
Miss Rate of Direct-Mapped Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
index
tag
000 001 010 011 100 101 110 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
Least recently used (LRU) block
cache address tag index
Main memory
11 101 00 ? memory address
byte offset
43
Miss Rate of Direct-Mapped Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
1. mis
Block size 1 word
3. mis
index
2. mis
tag
000 001 010 011 100 101 110 111
00 / 01 / 00 / 10 xx xx xx xx xx 00 xx
4. mis
32-word word-addressable memory
5. mis
cache address tag index
6. mis
Main memory
11 101 00 ? memory address
byte offset
44
Fully-Associative Cache (8-Way Set Associative)
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
tag

000 001 010 011 100 101 110 01010 111
00 10 11 01 01 00 10 11
32-word word-addressable memory
LRU block
cache address tag
Main memory
11101 00 ? memory address
byte offset
45
Miss Rate Fully-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
1. miss
Block size 1 word
4. miss
tag

2. miss
00000 01000 00110 10000 xxxxx xxxxx xxxxx
xxxxx
32-word word-addressable memory
6. miss
5. hit
3. hit
cache address tag
Main memory
11101 00 ? memory address
byte offset
46
Finding a Word in Associative Cache
Memory address
b6 b5 b4 b3 b2 b1 b0
5 bit Tag
byte offset
32 words byte-address
no index
Index Valid 5-bit Data bit Tag
Cache size 8 words
Block size 1 word
Must compare with all tags in the cache

Data
1 hit 0 miss
47
Eight-Way Set-Associative Cache
Cache size 8 words
Memory address
b6 b5 b4 b3 b2 b1 b0
32 words byte-address
byte offset
Block size 1 word

5 bit Tag

V tag data
V tag data
V tag data
V tag data
V tag data
V tag data
V tag data
V tag data

8 to 1 multiplexer
Data
1 hit 0 miss
48
Two-Way Set-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
This block is needed
Cache of 8 blocks
Block size 1 word
tags
index
00 01 10 11
000 011 100 001 110 101 010 111
32-word word-addressable memory
LRU block
cache address tag index
Main memory
111 01 00 ? memory address
byte offset
49
Miss Rate Two-Way Set-Associative Cache
00000 00 00001 00 00010 00 00011 00 00100
00 00101 00 00110 00 00111 00 01000 00 01001
00 01010 00 01011 00 01100 00 01101 00 01110
00 01111 00 10000 00 10001 00 10010 00 10011
00 10100 00 10101 00 10110 00 10111 00 11000
00 11001 00 11010 00 11011 00 11100 00 11101
00 11110 00 11111 00
Memory references to addresses 0, 8, 0, 6, 8, 16
Cache of 8 blocks
Block size 1 word
1. miss
tags
index
2. miss
00 01 10 11
000 010 xxx xxx 001 xxx xxx xxx
4. miss
32-word word-addressable memory
3. hit
5. hit
6. miss
cache address tag index
Main memory
111 01 00 ? memory address
byte offset
50
Two-Way Set-Associative Cache
Memory address
b6 b5 b4 b3 b2 b1 b0
Cache size 8 words
32 words byte-address
byte offset

3 bit tag
Block size 1 word

2 bit index
00 01 10 11

Data
2 to 1 MUX
1 hit 0 miss
51
Virtual Memory System
Physical address
Virtual or logical address
MMU Memory management unit
Processor
Cache
Data
Physical address
Data
Main memory
Disk
DMA Direct memory access
52
Virtual vs. Physical Address

Processor assumes a certain memory addressing
scheme
A block of data is called a virtual page
An address is called virtual (or logical) address
Main memory may have a different addressing
scheme
Memory address is called physical address
MMU translates virtual address to physical
address
Complete address translation table is large and
kept in main memory
MMU contains TLB (translation lookaside buffer),
which is a small cache of the address translation
table ? address translation can create its own
hit or miss

53
Page Fault
Disk
Main Memory
Processor Cache MMU (TLB)
Pages (Write-back, same as in cache)
Cache miss a required block is not found in
cache TLB miss a required virtual address is
not found in TLB
Cached pages, Page table Page fault a required
page is not found in main memory
All data, organized in Pages (4KB), accessed
by Physical addresses
Page fault in virtual memory is similar to
miss in cache.

Write a Comment

User Comments (0)