Memory Subsystem and Cache - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Memory Subsystem and Cache

Description:

The Goal: Large, Fast, Cheap Memory !!! Fact. Large memories are slow ... How do we create a memory that is large, cheap and fast (most of the time) ? Hierarchy ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 45
Provided by: jli45
Category:

less

Transcript and Presenter's Notes

Title: Memory Subsystem and Cache


1
Memory Subsystem and Cache
  • Adapted from lectures notes of Dr. Patterson and
    Dr. Kubiatowicz of UC Berkeley and
  • Rabi Mahapatra Hank Walker

2
The Big Picture
Processor
Input
Control
Memory
Datapath
Output
3
Technology Trends contd
Processor-DRAM Memory Gap (latency)
µProc 60/yr. (2X/1.5yr)
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
10
Less Law?
DRAM 9/yr. (2X/10 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
4
The Goal Large, Fast, Cheap Memory !!!
  • Fact
  • Large memories are slow
  • Fast memories are small
  • How do we create a memory that is large, cheap
    and fast (most of the time) ?
  • Hierarchy
  • Parallelism

5
  • By taking advantage of the principle of locality
  • Present the user with as much memory as is
    available in the cheapest technology.
  • Provide access at the speed offered by the
    fastest technology.

Processor
Control
Secondary Storage (Disk)
Main Memory (DRAM)
Second Level Cache (SRAM)
On-Chip Cache
Datapath
Registers
10,000,000ns (10 ms)
1s
Speed (ns)
10ns
100ns
100s
Size (bytes)
Ks
Ms
Gs
6
Memory Hierarchy (1/4)
  • Processor
  • executes programs
  • runs on order of nanoseconds to picoseconds
  • needs to access code and data for programs where
    are these?
  • Disk
  • HUGE capacity (virtually limitless)
  • VERY slow runs on order of milliseconds
  • so how do we account for this gap?

7
Memory Hierarchy (2/4)
  • Memory (DRAM)
  • smaller than disk (not limitless capacity)
  • contains subset of data on disk basically
    portions of programs that are currently being run
  • much faster than disk memory accesses dont slow
    down processor quite as much
  • Problem memory is still too slow(hundreds of
    nanoseconds)
  • Solution add more layers (caches)

8
Memory Hierarchy (3/4)
Higher
Lower
9
Memory Hierarchy (4/4)
  • If level is closer to Processor, it must be
  • smaller
  • faster
  • subset of all higher levels (contains most
    recently used data)
  • contain at least all the data in all lower levels
  • Lowest Level (usually disk) contains all
    available data

10
Analogy Library
  • Youre writing a term paper (Processor) at a
    table in Evans
  • Evans Library is equivalent to disk
  • essentially limitless capacity
  • very slow to retrieve a book
  • Table is memory
  • smaller capacity means you must return book when
    table fills up
  • easier and faster to find a book there once
    youve already retrieved it

11
Analogy Library contd
  • Open books on table are cache
  • smaller capacity can have very few open books
    fit on table again, when table fills up, you
    must close a book
  • much, much faster to retrieve data
  • Illusion created whole library open on the
    tabletop
  • Keep as many recently used books open on table as
    possible since likely to use again
  • Also keep as many books on table as possible,
    since faster than going to library

12
Memory Hierarchy Basics
  • Disk contains everything.
  • When Processor needs something, bring it into to
    all lower levels of memory.
  • Cache contains copies of data in memory that are
    being used.
  • Memory contains copies of data on disk that are
    being used.
  • Entire idea is based on Temporal Locality if we
    use it now, well want to use it again soon (a
    Big Idea)

13
Caches Why does it Work ?
  • Temporal Locality (Locality in Time)
  • gt Keep most recently accessed data items closer
    to the processor
  • Spatial Locality (Locality in Space)
  • gt Move blocks consists of contiguous words to
    the upper levels

14
Cache Design Issues
  • How do we organize cache?
  • Where does each memory address map to? (Remember
    that cache is subset of memory, so multiple
    memory addresses map to the same cache location.)
  • How do we know which elements are in cache?
  • How do we quickly locate them?

15
Direct Mapped Cache
  • In a direct-mapped cache, each memory address is
    associated with one possible block within the
    cache
  • Therefore, we only need to look in a single
    location in the cache for the data if it exists
    in the cache
  • Block is the unit of transfer between cache and
    memory

16
Direct Mapped Cache contd
  • Cache Location 0 can be occupied by data from
  • Memory location 0, 4, 8, ...
  • In general any memory location that is multiple
    of 4

17
Issues with Direct Mapped Cache
  • Since multiple memory addresses map to same cache
    index, how do we tell which one is in there?
  • What if we have a block size gt 1 byte?
  • Result divide memory address into three fields

18
Example of a direct mapped cache
  • For a 2 N byte cache
  • The uppermost (32 - N) bits are always the Cache
    Tag
  • The lowest M bits are the Byte Select (Block Size
    2 M)

Block address
0
4
31
9
Cache Index
Cache Tag
Example 0x50
Byte Select
Ex 0x01
Ex 0x00
Stored as part of the cache state
Cache Data
Valid Bit
Cache Tag

0
Byte 0
Byte 1
Byte 31

1
0x50
Byte 32
Byte 33
Byte 63
2
3




31
Byte 992
Byte 1023
19
Terminology
  • All fields are read as unsigned integers.
  • Index specifies the cache index (which row of
    the cache we should look in)
  • Offset once weve found correct block, specifies
    which byte within the block we want
  • Tag the remaining bits after offset and index
    are determined these are used to distinguish
    between all the memory addresses that map to the
    same location

20
Terminology contd
  • Hit data appears in some block in the upper
    level (example Block X)
  • Hit Rate the fraction of memory access found in
    the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data needs to be retrieve from a block in
    the lower level (Block Y)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level
  • Time to deliver the block the processor
  • Hit Time ltlt Miss Penalty

Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
21
How is the hierarchy managed ?
  • Registers lt-gt Memory
  • by compiler (programmer?)
  • cache lt-gt memory
  • by the hardware
  • memory lt-gt disks
  • by the hardware and operating system (virtual
    memory)
  • by the programmer (files)

22
Example
  • Suppose we have a 16KB of data in a direct-mapped
    cache with 4 word blocks
  • Determine the size of the tag, index and offset
    fields if were using a 32-bit architecture
  • Offset
  • need to specify correct byte within a block
  • block contains 4 words 16 bytes 24
    bytes
  • need 4 bits to specify correct byte
  • (or 2 bits to specify correct word)

23
Example contd
  • Index (index into an array of blocks)
  • need to specify correct row in cache
  • cache contains 16 KB 214 bytes
  • block contains 24 bytes (4 words)
  • rows/cache blocks/cache (since theres
    one block/row) bytes/cache bytes/row
    214 bytes/cache 24 bytes/row
    210 rows/cache
  • need 10 bits to specify this many rows

24
Example contd
  • Tag use remaining bits as tag
  • tag length mem addr length -
    offset - index 32 - 4 -
    10 bits 18 bits
  • so tag is leftmost 18 bits of memory address
  • (or 20 bits if we use word address)

25
Accessing data in cache
Memory
Value of Word
Address (hex)
  • Ex. 16KB of data, direct-mapped, 4 word blocks
  • Read 4 addresses
  • 0x00000014, 0x0000001C, 0x00000034, 0x00008014
  • Memory values on right
  • only cache/memory level of hierarchy

26
Accessing data in cache contd
  • 4 Addresses
  • 0x00000014, 0x0000001C, 0x00000034, 0x00008014
  • 4 Addresses divided (for convenience) into Tag,
    Index, Byte Offset fields

000000000000000000 0000000001 0100 000000000000000
000 0000000001 1100 000000000000000000 0000000011
0100 000000000000000010 0000000001 0100 Tag
Index Offset
27
16 KB Direct Mapped Cache, 16B blocks
  • Valid bit determines whether anything is stored
    in that row (when computer initially turned on,
    all entries are invalid)

Index
28
Read 0x00000014 000 0..001 0100
  • 000000000000000000 0000000001 0100

Offset
Index field
Tag field
Index
29
So we read block 1 (0000000001)
  • 000000000000000000 0000000001 0100

Tag field
Index field
Offset
Index
30
No valid data
  • 000000000000000000 0000000001 0100

Tag field
Index field
Offset
Index
31
So load that data into cache, setting tag, valid
  • 000000000000000000 0000000001 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
32
Read from cache at offset, return word b
  • 000000000000000000 0000000001 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
33
Read 0x0000001C 000 0..001 1100
  • 000000000000000000 0000000001 1100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
34
Data valid, tag OK, so read offset return word d
  • 000000000000000000 0000000001 1100

Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
35
Read 0x00000034 000 0..011 0100
  • 000000000000000000 0000000011 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
36
So read block 3
  • 000000000000000000 0000000011 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
37
No valid data
  • 000000000000000000 0000000011 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
0
0
0
0
0
0
0
38
Load that cache block, return word f
  • 000000000000000000 0000000011 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
1
0
e
f
g
h
0
0
0
0
0
0
39
Read 0x00008014 010 0..001 0100
  • 000000000000000010 0000000001 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
1
0
e
f
g
h
0
0
0
0
0
0
40
So read Cache Block 1, Data is Valid
  • 000000000000000010 0000000001 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
1
0
e
f
g
h
0
0
0
0
0
0
41
Cache Block 1 Tag does not match (0 ! 2)
  • 000000000000000010 0000000001 0100

Tag field
Index field
Offset
Index
0
1
0
a
b
c
d
0
1
0
e
f
g
h
0
0
0
0
0
0
42
Miss, so replace block 1 with new data tag
  • 000000000000000010 0000000001 0100

Tag field
Index field
Offset
Index
0
1
2
i
j
k
l
0
1
0
e
f
g
h
0
0
0
0
0
0
43
And return word j
  • 000000000000000010 0000000001 0100

Tag field
Index field
Offset
Index
0
1
2
i
j
k
l
0
1
0
e
f
g
h
0
0
0
0
0
0
44
Things to Remember
  • We would like to have the capacity of disk at the
    speed of the processor unfortunately this is not
    feasible.
  • So we create a memory hierarchy
  • each successively lower level contains most
    used data from next higher level
  • Exploit temporal and spatial locality
Write a Comment
User Comments (0)
About PowerShow.com