Title: Memory , Hierarchical Memory Systems Cache memory
1Memory , Hierarchical Memory SystemsCache memory
- Prof. Sin-Min Lee
- Department of Computer Science
CS147 Lecture 14
2The Five Classic Components of a Computer
3The Processor Picture
4Processor/Memory Bus
PCI Bus
I/O Busses
5 Capacity Speed (latency) Logic 2x in
3 years 2x in 3 years DRAM 4x in 3 years 2x
in 10 years Disk 4x in 3 years 2x in 10
years
Technology Trends
DRAM Year Size Cycle
Time 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1
Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145
ns 1995 64 Mb 120 ns
10001!
21!
6Predicting Performance ChangeMoore's Law
Original version The density of transistors in
an integrated circuit will double every year.
(Gordon Moore, Intel, 1965) Current
version Cost/performance of silicon chips
doubles every 18 months.
7(No Transcript)
8(No Transcript)
9Processor-DRAM Memory Gap (latency)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18The connection between the CPU and cache is very
fast the connection between the CPU and memory
is slower
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28There are three methods in block placement
Direct mapped if each block has only one place
it can appear in the cache, the cache is said to
be direct mapped. The mapping is usually (Block
address) MOD (Number of blocks in cache) Fully
Associative if a block can be placed anywhere
in the cache, the cache is said to be fully
associative. Set associative if a block can
be placed in a restricted set of places in the
cache, the cache is said to be set associative .
A set is a group of blocks in the cache. A block
is first mapped onto a set, and then the block
can be placed anywhere within that set. The set
is usually chosen by bit selection that is,
(Block address) MOD (Number of sets in cache)
29(No Transcript)
30(No Transcript)
31Cache (cont.)
Bits 2-4 of main memory address is the cache
address (index). The upper 5 bits of main memory
(tag) is stored in cache along with data. If tag
and index requested from CPU matches, its a
cache hit.
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
                                        Â
- A pictorial example for a cache with only 4
blocks and a memory with only 16 blocks.
47(No Transcript)
48(No Transcript)
49Replacement Policies
- Whenever there is a miss, the information must
be read from main memory. In addition, the cache
is updated with this new information. One line
will be replaced with the new block of
information. - Policies for doing this vary. The three most
commonly used are FIFO, LRU, and Random.
50FIFO Replacement Policy
- First in, first out Replaces the oldest line in
the cache, regardless of the last time that this
line was accessed. - The main benefit is that this is easy to
implement. - The principle drawback is that you wont keep any
item in cache for long you may find that you
are constantly removing and adding the same block
of memory.
51Hit Ratio
- The hit ratiohits divided by the sum of hits and
missesis a measure of cache performance. - A well-designed cache can have a hit ratio close
to 1. - The number of cache hits, far outnumber the
misses and this speeds up system performance
dramatically.
52Total 14 Hit 4 Hit ratio 2/7
53LRU Replacement Policy
- Least Recently Used The line that was accessed
least recently is replaced with the new block of
data. - The benefit is that this keeps the most
frequently accessed lines in the cache. - The drawback is that this can be difficult and
costly to implement, especially if there are lots
of lines to consider.
54(No Transcript)
55Random Replacement Policy
- With this policy, the line that is replaced is
chosen randomly. - Performance is close to that of LRU, and the
implementation is much simpler.
56Mapping Technique
- The cache mapping technique is another factor
that determines how effective the cache is, that
is, what its hit ratio and speed will be. Three
types are - Direct Mapped Cache Each memory location is
mapped to a single cache line that it shares with
many others only one of the many addresses that
share this line can use it at a given time. This
is the simplest technique both in concept and in
implementation. Using this cache means the
circuitry to check for hits is fast and easy to
design, but the hit ratio is relatively poor
compared to the other designs because of its
inflexibility. Motherboard-based system caches
are typically direct mapped. - 2. Fully Associative Cache Any memory location
can be cached in any cache line. This is the most
complex technique and requires sophisticated
search algorithms when checking for a hit. It can
lead to the whole cache being slowed down because
of this, but it offers the best theoretical hit
ratio since there are so many options for caching
any memory address. - 3. N-Way Set Associative Cache "N" is typically
2, 4, 8 etc. A compromise between the two
previous design, the cache is broken into sets of
"N" lines each, and any memory address can be
cached in any of those "N" lines. This improves
hit ratios over the direct mapped cache, but
without incurring a severe search penalty (since
"N" is kept small). The 2-way or 4-way set
associative cache is common in processor level 1
caches.
57Comparison of cache mapping techniques1. Direct
Mapped Cache
- The direct mapped cache is the simplest form of
cache and the easiest to check for a hit. - Since there is only one possible place that any
memory location can be cached, there is nothing
to search the line either contains the memory
information we are looking for, or it doesn't. - Unfortunately, the direct mapped cache also has
the worst performance, because again there is
only one place that any address can be stored.
58Direct mapped cache example
Data A B C A D B E F A C D B G C H I A B
C 0 A B B A A B B B A A A B B B B B A B
A 1 D D D D D D D D D D D D D D
C 2 C C C C C C C C C C C C C C C C
H 3 G G G G G G
E 4 E E E E E E E E E E E E
5 F F F F F F F F F F F
6 I I I
7 H H H H
Hit? Y Y Y
592. Fully Associative Cache
- The fully associative cache has the best hit
ratio because any line in the cache can hold any
address that needs to be cached. - This means the problem seen in the direct mapped
cache disappears, because there is no dedicated
single line that an address must use. - However, this cache suffers from problems
involving searching the cache. If a given address
can be stored in any of 16,384 lines, how do you
know where it is? Even with specialized hardware
to do the searching, a performance penalty is
incurred. And this penalty occurs for all
accesses to memory, whether a cache hit occurs or
not, because it is part of searching the cache to
determine a hit.
60Associative cache example
Data A B C A D B E F A C D B G C H I A B
C 0 A B B A A B B B A A A B B B B B A B
A 1 D D D D D D D D D D D D D D
C 2 C C C C C C C C C C C C C C C C
H 3 G G G G G G
E 4 E E E E E E E E E E E E
5 F F F F F F F F F F F
6 I I I
7 H H H H
Hit? Y Y Y
613. N-Way Set Associative Cache
- The set associative cache is a good compromise
between the direct mapped and set associative
caches. - Each address is mapped to a certain set of cache
locations. - The address space is divided into blocks of m
bytes (the cache line size), discarding the
bottom m address bits. - An "n-way set associative" cache with S sets has
n cache locations in each set. Block b is mapped
to set "b mod S" and may be stored in any of the
n locations in that set with its upper address
bits as a tag. To determine whether block b is in
the cache, set "b mod S" is searched
associatively for the tag. . - In the "real world", the direct mapped and set
associative caches are by far the most common.
Direct mapping is used more for level 2 caches on
motherboards, while the higher-performance
set-associative cache is found more commonly on
the smaller primary caches contained within
processors.
622-Way Set-Associative example
Data A B C A D B E F A C D B G C H I A B
C 0 A 0 A0 A1 B0 A1 B0 A0 B1 A1 B0 A1 B0 E0 B1 E0 B1 E1 A0 E1 A0 E1 A0 B0 A1 B0 A1 B0 A1 B0 A1 B0 A1 B1 A0 A0 A1
C 1 H 1 D0 D0 D0 D1 F0 D1 F0 D1 F0 D0 F1 D0 F1 D0 F1 D0 F1 D0 F1 D0 F1 D0 F1 D0 F1
E 2 2 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C0 C1 I0 C1 I0 C1 I0
3 3 G0 G0 G1 H0 G1 H0 G1 H0 G1 H0
Hit? Y Y Y Y Y Y Y
63Summary of mapping techniques
Cache Type Hit Ratio Search Speed
Direct Mapped Good Best
Fully Associative Best Moderate
N-Way Set Associative Very good, better as N increases. Good, but gets worse as N increases.