Title: CSL718 : Memory Hierarchy
1CSL718 Memory Hierarchy
- Cache Memories
- 6th Feb, 2006
2Memory technologies
- Semiconductor
- Registers
- SRAM Random Access
- DRAM
- FLASH
- Magnetic
- FDD
- HDD
- Optical Random sequential
- CD
- DVD
3Hierarchical structure
C
P
U
S
i
z
e
C
o
s
t
/
b
i
t
S
p
e
e
d
M
e
m
o
r
y
S
m
a
l
l
e
s
t
H
i
g
h
e
s
t
F
a
s
t
e
s
t
M
e
m
o
r
y
B
i
g
g
e
s
t
L
o
w
e
s
t
S
l
o
w
e
s
t
M
e
m
o
r
y
4System Configuration e-bay price Rs. 37,500 System Configuration e-bay price Rs. 37,500
Processor Intel P4 3.2GHz (800FSB) 1024k CPU with Hyper Threading
CPU Fan P4 Heavy Duty Cooling Fan With Heat Sink
Motherboard D915G express chipset 800FSBÂ (up to 3.6GHz support)
Memory 1GB DDR400 PC3200 DUAL CHANNEL RAM
Video Card GeForce FX 6200 256MB 16x PCI-e video with TV out
Hard drive 160GB 7200RPM UDMA-150 SATA
CD drive 52x32x52x16x CDRW DVD ROM driveÂ
Floppy drive Sony 1.44MB 3.5" drive
Sound AC 97 6 ch 5.1 Full duplex digital sound, stereo speakers
Network 10/100 RJ45 onboard network (Ethernet, cable or DSL)
Modem 56k v92 modemÂ
Ports Six USB 2.0 ports,1 serial, 1 parallel, 1 microphone jack
Case Black i BOX 522 Mid Tower 400w power supply (front USB)
Keyboard Black PS2 Windows Keyboard
Mouse Black PS2 Scroll Mouse
Monitor 17" SAMSUNG 793S MONITOR
5Main Memory for Pentium IVDDR (double data rate)
DRAM
Size Interface Price
128 MB PC-333 Rs. 599
256 MB PC-333 Rs. 1,299
1 GB PC-333 Rs. 4,999
1 GB PC-400 Rs, 5,299
6Disk drives Seagate Baracuda 7200 RPM
Capacity Price
40 GB Rs. 2,999
80 GB Rs. 3,499
120 GB Rs. 4,499
160 GB Rs. 4,799
200 GB Rs. 5,500
250 GB Rs. 6,999
300 GB Rs. 9,900
400 GB Rs. 14,950
7Data transfer between levels
hit
P
r
o
c
e
s
s
o
r
access
miss
D
a
t
a
t
r
a
n
s
f
e
r
unit of transfer block
8Principle of locality
- Temporal Locality
- references repeated in time
- Spatial Locality
- references repeated in space
- Special case Sequential Locality
9Memory Hierarchy Analysis
- Memory Mi M1, M2, . , Mn
- Capacity si s1lt s2lt . lt sn
- Unit cost ci c1gt c2gt . gt cn
- Total cost Ctotal ?i ci . si
- Access time ti ?1 ?2 . ?i (?i at
level i) - ?1lt ?2lt . lt ?n
- Hit ratios hi(si) h1lt h2lt . lt hn 1
- Effective time Teff ?i mi . hi . ti ?i mi .
?i - Miss before level i, mi (1-h1)(1-h2) . (1-hi-1)
10Cache Types
- Instruction Data Unified Split
- Split vs. Unified
- Split allows specializing each part
- Unified allows best use of the capacity
- On-chip Off-chip
- on-chip fast but small
- off-chip large but slow
- Single level Multi level
11Cache Policies
- Placement what gets placed where?
- Read when? from where?
- Load order of bytes/words?
- Fetch when to fetch new block?
- Replacement which one?
- Write when? to where?
12Block placement strategies
Direct mapped
Set associative
Fully associative
Block
Set
0
1
2
3
4
5
6
7
0
1
2
3
D
a
t
a
D
a
t
a
D
a
t
a
1
1
1
T
a
g
T
a
g
T
a
g
2
2
2
S
e
a
r
c
h
S
e
a
r
c
h
S
e
a
r
c
h
13Organization/placement policy
Set 1
Cache
Set S
Set
Sector 1
Sector 2
Sector SE
LRU
Sector
Block 1
Block 2
Block B
Tag
Block
AU 1
AU 2
AU A
V D S
14Addressing Cache
Sector Name Set Index
Block Displacement
Address
Selects set
Compared to Tags
Selects Block
Selects AU
Early select access data after tag matching Late
select access data while tag matching
15Cache organization example
Sector
Sector
Block
Block
Block
Block
Tag V D AU AU V D AU AU Tag V D AU AU V D AU AU
1
2
3
4
Sets
5
6
7
8
16Cache access mechanism
Address 31 0
18
12
2
Hit
Data
Tag
byte offset
index
index v tag data
0 1 ... ... 4095
32
18
17Cache with 4 word blocks
Address 31 0
Data
18
10
2
2
Hit
Tag
byte offset
block offset
index
index v tag
data
0 1 ... ... 1023
32
32
32
32
18
Mux
184-way set associative cache
31 0
tag
20
byte offset
8
2
2
index
block offset
v tag data
v tag data
v tag data
v tag data
0 ... ... ... 255
20
20
20
20
128
128
128
128
Mux
Mux
Mux
Mux
32
32
32
32
Hit
Mux
Data
19Read policies
- Sequential or concurrent
- initiate memory access only after detecting a
miss - initiate memory access along with cache access in
anticipation of a miss - With or without forwarding
- give data to CPU after filling the missing block
in cache - forward data to CPU as it gets filled in cache
20Read Policies
Sequential Simple
1
1
1
Teff(1-pm).1 pm . (T2)
Cache
T
Memory
Concurrent Simple
1
1
1
Teff(1-pm).1 pm . (T1)
Cache
T
Memory
Sequential Forward
1
1
Teff(1-pm).1 pm . (T1)
Cache
T
Memory
Concurrent Forward
1
1
Teff(1-pm).1 pm . (T)
Cache
T
Memory
21Load policies
4 AU Block
2
3
1
0
Cache miss on AU 1
Block Load
Load Forward
Fetch Bypass (wrap around load)
22Fetch Policies
- Fetch on miss (demand fetching)
- Software prefetching
- Hardware Prefetching
23Fetch Policies
- Demand fetching
- fetch only when required (miss)
- Hardware prefetching
- automatically prefetch next block
- Software prefetching
- programmer decides to prefetch
- questions
- how much ahead (prefetch distance)
- how often
24Software Control of Cache
- Software visible cache
- mode selection (WT, WB etc)
- block flush
- block invalidate
- block prefetch
25Replacement Policies
- Least Recently Used (LRU)
- Least Frequently Used (LFU)
- First In First Out (FIFO)
- Random
26Write Policies
- Write Hit
- Write Back
- Write Through
- Write Miss
- Write Back
- Write Through (with or without Write Allocate)
- Buffers are used in all cases to hide latencies