CS 161 Ch 7: Memory Hierarchy LECTURE 16

About This Presentation

Title:

CS 161 Ch 7: Memory Hierarchy LECTURE 16

Description:

Low Miss ratio because more space available for either instruction or data. Low cache bandwidth because instruction and ... ord Line. Storage. Cell. Row Decoder ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 24

Provided by: davep173

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 161 Ch 7: Memory Hierarchy LECTURE 16

1
CS 161Ch 7 Memory Hierarchy LECTURE 16

Instructor L.N. Bhuyan
www.cs.ucr.edu/bhuyan

2
Cache Access Time
With Load Bypass Average Access Time Hit Time
x (1 - Miss Rate) Miss Penalty x Miss Rate
OR Without Load Bypass Average Memory Acess Time
Time for a hit Miss rate x Miss penalty
3
Unified vs Split Caches

Unified Cache
Low Miss ratio because more space available for
either instruction or data
Low cache bandwidth because instruction and data
cannot be read at the same time due to one port.
Split Cache
High miss ratio because either instructions or
data may run out of space even though space is
available at other cache
High bandwidth because an instruction and data
can be accessed at the same time.
Example
16KB ID Inst miss rate0.64, Data miss
rate6.47
32KB unified Aggregate miss rate1.99
Which is better (ignore L2 cache)?
Assume 33 data ops ? 75 accesses from
instructions (1.0/1.33)
hit time1, miss time50
Note that data hit has 1 stall for unified cache
(only one port)
AMATHarvard75x(10.64x50)25x(16.47x50)
2.05
AMATUnified75x(11.99x50)25x(111.99x50)
2.24

4
Static RAM (SRAM)

Six transistors in cross connected fashion
Provides regular AND inverted outputs
Implemented in CMOS process

Single Port 6-T SRAM Cell
5
Dynamic Random Access Memory - DRAM

DRAM organization is similar to SRAM except that
each bit of DRAM is constructed using a pass
transistor and a capacitor, shown in next slide
Less number of transistors/bit gives high
density, but slow discharge through capacitor.
Capacitor needs to be recharged or refreshed
giving rise to high cycle time. Q What is the
difference between access time and cycle time?
Uses a two-level decoder as shown later. Note
that 2048 bits are accessed per row, but only one
bit is used.

6
Dynamic RAM

SRAM cells exhibit high speed/poor density
DRAM simple transistor/capacitor pairs in high
density form

Word Line
C
Bit Line
...
Sense Amp
7
DRAM logical organization (4 Mbit)

Access time of DRAM Row access time column
access time refreshing

D
Column Decoder

Sense
Amps I/O
1
1
Q
Memory
Array
A0A1
0
Row Decoder

(2,048 x 2,048)
Storage
W
ord Line
Cell

Square root of bits per RAS/CAS

8
Virtual Memory

Idea 1 Many Programs sharing DRAM Memory so that
context switches can occur
Idea 2 Allow program to be written without
memory constraints program can exceed the size
of the main memory
Idea 3 Relocation Parts of the program can be
placed at different locations in the memory
instead of a big chunk.
Virtual Memory
(1) DRAM Memory holds many programs running at
same time (processes)
(2) use DRAM Memory as a kind of cache for disk

9
Disk Technology in Brief
tracks

Disk is mechanical memory

R/W arm
3600 - 7200 RPM rotation speed

Disk Access Time seek time rotational delay
transfer time
usually measured in milliseconds
Miss to disk is extremely expensive
typical access time millions of clock cycles

10
Virtual Memory has own terminology

Each process has its own private virtual address
space (e.g., 232 Bytes) CPU actually generates
virtual addresses
Each computer has a physical address space
(e.g., 128 MegaBytes DRAM) also called real
memory
Address translation mapping virtual addresses to
physical addresses
Allows multiple programs to use (different chunks
of physical) memory at same time
Also allows some chunks of virtual memory to be
represented on disk, not in main memory (to
exploit memory hierarchy)

11
Mapping Virtual Memory to Physical Memory
Virtual Memory

Divide Memory into equal sizedchunks (say, 4KB
each)

Stack

Any chunk of Virtual Memory assigned to any chunk
of Physical Memory (page)

Physical Memory
64 MB
Single Process
Heap
Static
Code
0
0
12
Handling Page Faults

A page fault is like a cache miss
Must find page in lower level of hierarchy
If valid bit is zero, the Physical Page Number
points to a page on disk
When OS starts new process, it creates space on
disk for all the pages of the process, sets all
valid bits in page table to zero, and all
Physical Page Numbers to point to disk
called Demand Paging - pages of the process are
loaded from disk only as needed

13
Comparing the 2 levels of hierarchy

Cache Virtual Memory
Block or Line Page
Miss Page Fault
Block Size 32-64B Page Size 4K-16KB
Placement Fully AssociativeDirect Mapped,
N-way Set Associative
Replacement Least Recently UsedLRU or
Random (LRU) approximation
Write Thru or Back Write Back
How Managed Hardware SoftwareHardware (Operati
ng System)

14
How to Perform Address Translation?

VM divides memory into equal sized pages
Address translation relocates entire pages
offsets within the pages do not change
if make page size a power of two, the virtual
address separates into two fields
like cache index, offset fields

virtual address
Virtual Page Number
Page Offset
15
Mapping Virtual to Physical Address
Virtual Address
31 30 29 28 27 ..12 11 10
9 8 ... 3 2 1 0
Virtual Page Number
Page Offset
1KB page size
Translation
Page Offset
Physical Page Number
9 8 ... 3 2 1 0
29 28 27 ..12 11 10
Physical Address
16
Address Translation

Want fully associative page placement
How to locate the physical page?
Search impractical (too many pages)
A page table is a data structure which contains
the mapping of virtual pages to physical pages
There are several different ways, all up to the
operating system, to keep this data around
Each process running in the system has its own
page table

17
Address Translation Page Table
Virtual Address (VA)
virtual page nbr
offset
Page Table
...
V
A.R.
P. P. N.

Access Rights
Physical Page Number
Val -id
Physical Memory Address (PA)
...
Page Table is located in physical memory
Access Rights None, Read Only, Read/Write,
Executable
disk
18
Optimizing for Space

Page Table too big!
4GB Virtual Address Space 4 KB page ? 220 ( 1
million) Page Table Entries ? 4 MB just for Page
Table of single process!
Variety of solutions to tradeoff Page Table size
for slower performance
Use a limit register to restrict page table size
and let it grow with more pages,Multilevel page
table, Paging page tables, etc.
(Take O/S Class to learn more)

19
How to Translate Fast?

Problem Virtual Memory requires two memory
accesses!
one to translate Virtual Address into Physical
Address (page table lookup)
one to transfer the actual data (cache hit)
But Page Table is in physical memory!
Observation since there is locality in pages of
data, must be locality in virtual addresses of
those pages!
Why not create a cache of virtual to physical
address translations to make translation fast?
(smaller is faster)
For historical reasons, such a page table cache
is called a Translation Lookaside Buffer, or TLB

20
Typical TLB Format
Virtual Physical Valid Ref Dirty Access Page
Nbr Page Nbr Rights
data
tag

TLB just a cache of the page table mappings
Dirty since use write back, need to know
whether or not to write page to disk when
replaced
Ref Used to calculate LRU on replacement
TLB access time comparable to cache (much
less than main memory access time)

21
Translation Look-Aside Buffers

TLB is usually small, typically 32-4,096 entries
Like any other cache, the TLB can be fully
associative, set associative, or direct mapped

data
data
virtualaddr.
physicaladdr.
TLB
Cache
Main Memory
miss
hit
hit
Processor
miss
PageTable
Disk Memory
OS FaultHandler
page fault/protection violation
22
DECStation 3100/MIPS R2000
3
1

3
0

2
9

1
5

1
4

1
3

1
2

1
1

1
0

9

8

3

2

1

0

Virtual Address
P
a
g
e

o
f
f
s
e
t
V
i
r
t
u
a
l

p
a
g
e

n
u
m
b
e
r
1
2
2
0
P
h
y
s
i
c
a
l

p
a
g
e

n
u
m
b
e
r
V
a
l
i
d
D
i
r
t
y
T
a
g
TLB
T
L
B

h
i
t
64 entries, fully associative
2
0
P
a
g
e

o
f
f
s
e
t
P
h
y
s
i
c
a
l

p
a
g
e

n
u
m
b
e
r
Physical Address
C
a
c
h
e

i
n
d
e
x
P
h
y
s
i
c
a
l

a
d
d
r
e
s
s

t
a
g
B
y
t
e
1
4
2
1
6
o
f
f
s
e
t
T
a
g
D
a
t
a
V
a
l
i
d
Cache
16K entries, direct mapped
3
2
D
a
t
a
C
a
c
h
e

h
i
t
23
Real Stuff Pentium Pro Memory Hierarchy

Address Size 32 bits (VA, PA)
VM Page Size 4 KB, 4 MB
TLB organization separate i,d TLBs (i-TLB
32 entries, d-TLB 64 entries) 4-way set
associative LRU approximated hardware
handles miss
L1 Cache 8 KB, separate i,d 4-way set
associative LRU approximated 32 byte
block write back
L2 Cache 256 or 512 KB

Write a Comment

User Comments (0)

About PowerShow.com

CS 161 Ch 7: Memory Hierarchy LECTURE 16 - PowerPoint PPT Presentation

CS 161 Ch 7: Memory Hierarchy LECTURE 16

Low Miss ratio because more space available for either instruction or data. Low cache bandwidth because instruction and ... ord Line. Storage. Cell. Row Decoder ... – PowerPoint PPT presentation