CS 161 Ch 7: Memory Hierarchy LECTURE 16 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 161 Ch 7: Memory Hierarchy LECTURE 16

Description:

Low Miss ratio because more space available for either instruction or data. Low cache bandwidth because instruction and ... ord Line. Storage. Cell. Row Decoder ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: davep173
Learn more at: http://www.cs.ucr.edu
Category:
Tags: lecture | hierarchy | memory | ord | ras

less

Transcript and Presenter's Notes

Title: CS 161 Ch 7: Memory Hierarchy LECTURE 16


1
CS 161Ch 7 Memory Hierarchy LECTURE 16
  • Instructor L.N. Bhuyan
  • www.cs.ucr.edu/bhuyan

2
Cache Access Time
With Load Bypass Average Access Time Hit Time
x (1 - Miss Rate) Miss Penalty x Miss Rate
OR Without Load Bypass Average Memory Acess Time
Time for a hit Miss rate x Miss penalty
3
Unified vs Split Caches
  • Unified Cache
  • Low Miss ratio because more space available for
    either instruction or data
  • Low cache bandwidth because instruction and data
    cannot be read at the same time due to one port.
  • Split Cache
  • High miss ratio because either instructions or
    data may run out of space even though space is
    available at other cache
  • High bandwidth because an instruction and data
    can be accessed at the same time.
  • Example
  • 16KB ID Inst miss rate0.64, Data miss
    rate6.47
  • 32KB unified Aggregate miss rate1.99
  • Which is better (ignore L2 cache)?
  • Assume 33 data ops ? 75 accesses from
    instructions (1.0/1.33)
  • hit time1, miss time50
  • Note that data hit has 1 stall for unified cache
    (only one port)
  • AMATHarvard75x(10.64x50)25x(16.47x50)
    2.05
  • AMATUnified75x(11.99x50)25x(111.99x50)
    2.24

4
Static RAM (SRAM)
  • Six transistors in cross connected fashion
  • Provides regular AND inverted outputs
  • Implemented in CMOS process

Single Port 6-T SRAM Cell
5
Dynamic Random Access Memory - DRAM
  • DRAM organization is similar to SRAM except that
    each bit of DRAM is constructed using a pass
    transistor and a capacitor, shown in next slide
  • Less number of transistors/bit gives high
    density, but slow discharge through capacitor.
  • Capacitor needs to be recharged or refreshed
    giving rise to high cycle time. Q What is the
    difference between access time and cycle time?
  • Uses a two-level decoder as shown later. Note
    that 2048 bits are accessed per row, but only one
    bit is used.

6
Dynamic RAM
  • SRAM cells exhibit high speed/poor density
  • DRAM simple transistor/capacitor pairs in high
    density form

Word Line
C
Bit Line
...
Sense Amp
7
DRAM logical organization (4 Mbit)
  • Access time of DRAM Row access time column
    access time refreshing

D
Column Decoder

Sense
Amps I/O
1
1
Q
Memory
Array
A0A1
0
Row Decoder

(2,048 x 2,048)
Storage
W
ord Line
Cell
  • Square root of bits per RAS/CAS

8
Virtual Memory
  • Idea 1 Many Programs sharing DRAM Memory so that
    context switches can occur
  • Idea 2 Allow program to be written without
    memory constraints program can exceed the size
    of the main memory
  • Idea 3 Relocation Parts of the program can be
    placed at different locations in the memory
    instead of a big chunk.
  • Virtual Memory
  • (1) DRAM Memory holds many programs running at
    same time (processes)
  • (2) use DRAM Memory as a kind of cache for disk

9
Disk Technology in Brief
tracks
  • Disk is mechanical memory

R/W arm
3600 - 7200 RPM rotation speed
  • Disk Access Time seek time rotational delay
    transfer time
  • usually measured in milliseconds
  • Miss to disk is extremely expensive
  • typical access time millions of clock cycles

10
Virtual Memory has own terminology
  • Each process has its own private virtual address
    space (e.g., 232 Bytes) CPU actually generates
    virtual addresses
  • Each computer has a physical address space
    (e.g., 128 MegaBytes DRAM) also called real
    memory
  • Address translation mapping virtual addresses to
    physical addresses
  • Allows multiple programs to use (different chunks
    of physical) memory at same time
  • Also allows some chunks of virtual memory to be
    represented on disk, not in main memory (to
    exploit memory hierarchy)

11
Mapping Virtual Memory to Physical Memory
Virtual Memory
  • Divide Memory into equal sizedchunks (say, 4KB
    each)


Stack
  • Any chunk of Virtual Memory assigned to any chunk
    of Physical Memory (page)

Physical Memory
64 MB
Single Process
Heap
Static
Code
0
0
12
Handling Page Faults
  • A page fault is like a cache miss
  • Must find page in lower level of hierarchy
  • If valid bit is zero, the Physical Page Number
    points to a page on disk
  • When OS starts new process, it creates space on
    disk for all the pages of the process, sets all
    valid bits in page table to zero, and all
    Physical Page Numbers to point to disk
  • called Demand Paging - pages of the process are
    loaded from disk only as needed

13
Comparing the 2 levels of hierarchy
  • Cache Virtual Memory
  • Block or Line Page
  • Miss Page Fault
  • Block Size 32-64B Page Size 4K-16KB
  • Placement Fully AssociativeDirect Mapped,
    N-way Set Associative
  • Replacement Least Recently UsedLRU or
    Random (LRU) approximation
  • Write Thru or Back Write Back
  • How Managed Hardware SoftwareHardware (Operati
    ng System)

14
How to Perform Address Translation?
  • VM divides memory into equal sized pages
  • Address translation relocates entire pages
  • offsets within the pages do not change
  • if make page size a power of two, the virtual
    address separates into two fields
  • like cache index, offset fields

virtual address
Virtual Page Number
Page Offset
15
Mapping Virtual to Physical Address
Virtual Address
31 30 29 28 27 ..12 11 10
9 8 ... 3 2 1 0
Virtual Page Number
Page Offset
1KB page size
Translation
Page Offset
Physical Page Number
9 8 ... 3 2 1 0
29 28 27 ..12 11 10
Physical Address
16
Address Translation
  • Want fully associative page placement
  • How to locate the physical page?
  • Search impractical (too many pages)
  • A page table is a data structure which contains
    the mapping of virtual pages to physical pages
  • There are several different ways, all up to the
    operating system, to keep this data around
  • Each process running in the system has its own
    page table

17
Address Translation Page Table
Virtual Address (VA)
virtual page nbr
offset
Page Table
...
V
A.R.
P. P. N.

Access Rights
Physical Page Number
Val -id
Physical Memory Address (PA)
...
Page Table is located in physical memory
Access Rights None, Read Only, Read/Write,
Executable
disk
18
Optimizing for Space
  • Page Table too big!
  • 4GB Virtual Address Space 4 KB page ? 220 ( 1
    million) Page Table Entries ? 4 MB just for Page
    Table of single process!
  • Variety of solutions to tradeoff Page Table size
    for slower performance
  • Use a limit register to restrict page table size
    and let it grow with more pages,Multilevel page
    table, Paging page tables, etc.
  • (Take O/S Class to learn more)

19
How to Translate Fast?
  • Problem Virtual Memory requires two memory
    accesses!
  • one to translate Virtual Address into Physical
    Address (page table lookup)
  • one to transfer the actual data (cache hit)
  • But Page Table is in physical memory!
  • Observation since there is locality in pages of
    data, must be locality in virtual addresses of
    those pages!
  • Why not create a cache of virtual to physical
    address translations to make translation fast?
    (smaller is faster)
  • For historical reasons, such a page table cache
    is called a Translation Lookaside Buffer, or TLB

20
Typical TLB Format
Virtual Physical Valid Ref Dirty Access Page
Nbr Page Nbr Rights
data
tag
  • TLB just a cache of the page table mappings
  • Dirty since use write back, need to know
    whether or not to write page to disk when
    replaced
  • Ref Used to calculate LRU on replacement
  • TLB access time comparable to cache (much
    less than main memory access time)

21
Translation Look-Aside Buffers
  • TLB is usually small, typically 32-4,096 entries
  • Like any other cache, the TLB can be fully
    associative, set associative, or direct mapped

data
data
virtualaddr.
physicaladdr.
TLB
Cache
Main Memory
miss
hit
hit
Processor
miss
PageTable
Disk Memory
OS FaultHandler
page fault/protection violation
22
DECStation 3100/MIPS R2000
3
1

3
0

2
9


1
5

1
4

1
3

1
2

1
1

1
0

9

8


3

2

1

0

Virtual Address
P
a
g
e

o
f
f
s
e
t
V
i
r
t
u
a
l

p
a
g
e

n
u
m
b
e
r
1
2
2
0
P
h
y
s
i
c
a
l

p
a
g
e

n
u
m
b
e
r
V
a
l
i
d
D
i
r
t
y
T
a
g
TLB
T
L
B

h
i
t
64 entries, fully associative
2
0
P
a
g
e

o
f
f
s
e
t
P
h
y
s
i
c
a
l

p
a
g
e

n
u
m
b
e
r
Physical Address
C
a
c
h
e

i
n
d
e
x
P
h
y
s
i
c
a
l

a
d
d
r
e
s
s

t
a
g
B
y
t
e
1
4
2
1
6
o
f
f
s
e
t
T
a
g
D
a
t
a
V
a
l
i
d
Cache
16K entries, direct mapped
3
2
D
a
t
a
C
a
c
h
e

h
i
t
23
Real Stuff Pentium Pro Memory Hierarchy
  • Address Size 32 bits (VA, PA)
  • VM Page Size 4 KB, 4 MB
  • TLB organization separate i,d TLBs (i-TLB
    32 entries, d-TLB 64 entries) 4-way set
    associative LRU approximated hardware
    handles miss
  • L1 Cache 8 KB, separate i,d 4-way set
    associative LRU approximated 32 byte
    block write back
  • L2 Cache 256 or 512 KB
Write a Comment
User Comments (0)
About PowerShow.com