Conventional DRAM Organization presentation

About This Presentation

Transcript and Presenter's Notes

Title: Conventional DRAM Organization

1
Conventional DRAM Organization

d x w DRAM
dw total bits organized as d supercells of size w
bits

16 x 8 DRAM chip
cols
0
1
2
3
memory controller
0
2 bits /
addr
1
rows
supercell (2,1)
2
(to CPU)
3
8 bits /
data
internal row buffer
2
Reading DRAM Supercell (2,1)

Step 1(a) Row access strobe (RAS) selects row 2.

Step 1(b) Row 2 copied from DRAM array to row
buffer.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
RAS 2
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
3
Reading DRAM Supercell (2,1)

Step 2(a) Column access strobe (CAS) selects
column 1.

Step 2(b) Supercell (2,1) copied from buffer to
data lines, and eventually back to the CPU.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
CAS 1
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
internal buffer
4
Memory Modules
supercell (i,j)
DRAM 0
64 MB memory module consisting of eight 8Mx8
DRAMs
DRAM 7
Memory controller
5
Typical Bus Structure Connecting CPU and Memory

A bus is a collection of parallel wires that
carry address, data, and control signals.
Buses are typically shared by multiple devices.

CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
6
Memory Read Transaction (1)

CPU places address A on the memory bus.

register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
A

bus interface
A
x
7
Memory Read Transaction (2)

Main memory reads A from the memory bus,
retreives word x, and places it on the bus.

register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
x
bus interface
A
x
8
Memory Read Transaction (3)

CPU read word x from the bus and copies it into
register eax.

register file
Load operation movl A, eax
ALU
eax
x
main memory
0
I/O bridge
bus interface
A
x
9
Memory Write Transaction (1)

CPU places address A on bus. Main memory reads
it and waits for the corresponding data word to
arrive.

register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
A
bus interface
A
10
Memory Write Transaction (2)

CPU places data word y on the bus.

register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
y
bus interface
A
11
Memory Write Transaction (3)

Main memory read data word y from the bus and
stores it at address A.

Disks consist of platters, each with two
surfaces.
Each surface consists of concentric rings called
tracks.
Each track consists of sectors separated by gaps.

tracks
surface
track k
gaps
spindle
sectors
13
I/O Bus
CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
I/O bus
Expansion slots for other devices such as network
adapters.
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
14
Reading a Disk Sector (1)
CPU chip
CPU initiates a disk read by writing a command,
logical block number, and destination memory
address to a port (address) associated with disk
controller.

register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
15
Reading a Disk Sector (2)
CPU chip
Disk controller reads the sector and performs a
direct memory access (DMA) transfer into main
memory.
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
16
Reading a Disk Sector (3)
CPU chip
When the DMA transfer completes, the disk
controller notifies the CPU with an interrupt
(i.e., asserts a special interrupt pin on the
CPU)
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
17
An Example Memory Hierarchy
Smaller, faster, and costlier (per byte) storage
devices
L0
registers
CPU registers hold words retrieved from L1 cache.
on-chip L1 cache (SRAM)
L1
off-chip L2 cache (SRAM)
L2
main memory (DRAM)
L3
Larger, slower, and cheaper (per
byte) storage devices
local secondary storage (local disks)
L4
remote secondary storage (distributed file
systems, Web servers)
L5
18
Caching in a Memory Hierarchy
4
10
4
10
0
1
2
3
Larger, slower, cheaper storage device at level
k1 is partitioned into blocks.
4
5
6
7
4
Level k1
8
9
10
11
10
12
13
14
15
19
General Caching Concepts

Program needs object d, which is stored in some
block b.
Cache hit
Program finds b in the cache at level k. E.g.,
block 14.
Cache miss
b is not at level k, so level k cache must fetch
it from level k1. E.g., block 12.
If level k cache is full, then some current block
must be replaced (evicted). Which one is the
victim?
Placement policy where can the new block go?
E.g., b mod 4
Replacement policy which block should be
evicted? E.g., LRU

Request 14
Request 12
14
12
0
1
2
3
Level k
14
4
9
3
14
4
12
Request 12
12
4
0
1
2
3
4
5
6
7
Level k1
4
8
9
10
11
12
13
14
15
12
20
Cache Memories

Cache memories are small, fast SRAM-based
memories managed automatically in hardware.
Hold frequently accessed blocks of main memory
CPU looks first for data in L1, then in L2, then
in main memory.
Typical bus structure

CPU chip
register file
ALU
L1 cache
cache bus
system bus
memory bus
main memory
I/O bridge
bus interface
L2 cache
21
Inserting an L1 Cache Between the CPU and Main
Memory
The tiny, very fast CPU register file has room
for four 4-byte words.
The transfer unit between the CPU register file
and the cache is a 4-byte block.
line 0
The small fast L1 cache has room for two 4-word
blocks.
line 1
The transfer unit between the cache and main
memory is a 4-word block (16 bytes).
a b c d
block 10
...
The big slow main memory has room for many
4-word blocks.
p q r s
block 21
...
w x y z
block 30
...
22
General Org of a Cache Memory
t tag bits per line
1 valid bit per line
B 2b bytes per cache block
Cache is an array of sets. Each set contains one
or more lines. Each line holds a block of data.

B1
1
0
valid
tag
E lines per set

set 0

B1
1
0
valid
tag

B1
1
0
valid
tag

set 1
S 2s sets

B1
1
0
valid
tag


B1
1
0
valid
tag

set S-1

B1
1
0
valid
tag
Cache size C B x E x S data bytes
23
Addressing Caches
Address A
b bits
t bits
s bits
0
m-1

B1
1
0
v
tag

set 0
lttaggt
ltset indexgt
ltblock offsetgt

B1
1
0
v
tag

B1
1
0
v
tag

set 1

B1
1
0
v
tag
The word at address A is in the cache if the tag
bits in one of the ltvalidgt lines in set ltset
indexgt match lttaggt. The word contents begin at
offset ltblock offsetgt bytes from the beginning
of the block.


B1
1
0
v
tag
set S-1


B1
1
0
v
tag
24
Direct-Mapped Cache

Simplest kind of cache
Characterized by exactly one line per set.

set 0
E1 lines per set
valid
tag
cache block
cache block
valid
tag
set 1

cache block
valid
tag
set S-1
25
Accessing Direct-Mapped Caches

Set selection
Use the set index bits to determine the set of
interest.

set 0
valid
tag
cache block
selected set
valid
tag
set 1
cache block

t bits
s bits
b bits
valid
tag
set S-1
cache block
0 0 0 0 1
0
m-1
tag
set index
block offset
26
Accessing Direct-Mapped Caches

Line matching and word selection
Line matching Find a valid line in the selected
set with a matching tag
Word selection Then extract the word

3
0
1
2
7
4
5
6
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
27
Direct-Mapped Cache Simulation
M16 byte addresses, B2 bytes/block, S4 sets,
E1 entry/set Address trace (reads) 0 00002,
1 00012, 13 11012, 8 10002, 0 00002
28
Why Use Middle Bits as Index?
High-Order Bit Indexing
Middle-Order Bit Indexing
4-line Cache
00
0000
0000
01
0001
0001
10
0010
0010
11
0011
0011
0100
0100
0101
0101

High-Order Bit Indexing
Adjacent memory lines would map to same cache
entry
Poor use of spatial locality
Middle-Order Bit Indexing
Consecutive memory lines map to different cache
lines
Can hold C-byte region of address space in cache
at one time

0110
0110
0111
0111
1000
1000
1001
1001
1010
1010
1011
1011
1100
1100
1101
1101
1110
1110
1111
1111
29
Set Associative Caches

Characterized by more than one line per set

valid
tag
cache block
set 0
E2 lines per set
valid
tag
cache block
valid
tag
cache block
set 1
valid
tag
cache block

valid
tag
cache block
set S-1
valid
tag
cache block
30
Accessing Set Associative Caches

Set selection
identical to direct-mapped cache

valid
tag
cache block
set 0
cache block
valid
tag
valid
tag
cache block
Selected set
set 1
cache block
valid
tag

cache block
valid
tag
set S-1
t bits
s bits
b bits
cache block
valid
tag
0 0 0 0 1
0
m-1
tag
set index
block offset
31
Accessing Set Associative Caches

Line matching and word selection
must compare the tag in each valid line in the
selected set.

3
0
1
2
7
4
5
6
1
1001
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
32

E2, B4, S8. Words 0x0E34, 0x0DD5, 0x1FE4
And list memory address that will hit in Set 3.
Set tag V B0 B1 B2 B3 tag V B0 B1 B2
B3
0 09 1 86 30 3F 10 00 0 --
-- -- --
45 1 60 4F E0 23 38 1 00 BC
0B 37
EB 0 -- -- -- -- 0B 0 --
-- -- --
06 0 -- -- -- -- 32 1 12
08 7B AD
C7 1 06 78 07 C5 05 1 40 67 C2
3B
71 1 0B DE 18 4B 6E 0 -- --
-- --
91 1 A0 B7 26 2D F0 0 -- --
-- --
46 0 -- -- -- -- DE 1 12
C0 88 37

33
Multi-Level Caches

Options separate data and instruction caches, or
a unified cache

Unified L2 Cache
Memory
L1 d-cache
Regs
Processor
disk
L1 i-cache
size speed /Mbyte line size
200 B 3 ns 8 B
8-64 KB 3 ns 32 B
128 MB DRAM 60 ns 1.50/MB 8 KB
30 GB 8 ms 0.05/MB
1-4MB SRAM 6 ns 100/MB 32 B
larger, slower, cheaper
34
Motivations for Virtual Memory

Use Physical DRAM as a Cache for the Disk
Address space of a process can exceed physical
memory size
Sum of address spaces of multiple processes can
exceed physical memory
Simplify Memory Management
Multiple processes resident in main memory.
Each process with its own address space
Only active code and data is actually in memory
Allocate more memory to process as needed.
Provide Protection
One process cant interfere with another.
because they operate in different address spaces.
User process cannot access privileged information
different sections of address spaces have
different permissions.

35
Motivation 1 DRAM a Cache for Disk

Full address space is quite large
32-bit addresses
4,000,000,000 (4 billion) bytes
64-bit addresses 16,000,000,000,000,000,000 (16
quintillion) bytes
Disk storage is 300X cheaper than DRAM storage
80 GB of DRAM 33,000
80 GB of disk 110
To access large amounts of data in a
cost-effective manner, the bulk of the data must
be stored on disk

36
Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
37
DRAM vs. SRAM as a Cache

DRAM vs. disk is more extreme than SRAM vs. DRAM
Access latencies
DRAM 10X slower than SRAM
Disk 100,000X slower than DRAM
Importance of exploiting spatial locality
First byte is 100,000X slower than successive
bytes on disk
vs. 4X improvement for page-mode vs. regular
accesses to DRAM
Bottom line
Design decisions made for DRAM caches driven by
enormous cost of misses

DRAM
Disk
SRAM
38
Impact of Properties on Design

If DRAM was to be organized similar to an SRAM
cache, how would we set the following design
parameters?
Line size?
Large, since disk better at transferring large
blocks
Associativity?
High, to mimimize miss rate
Write through or write back?
Write back, since cant afford to perform small
writes to disk
What would the impact of these choices be on
miss rate
Extremely low. ltlt 1
hit time
Must match cache/DRAM performance
miss latency
Very high. 20ms
tag storage overhead
Low, relative to block size

39
A System with Physical Memory Only

Examples
most Cray machines, early PCs, nearly all
embedded systems, etc.

Addresses generated by the CPU correspond
directly to bytes in physical memory

40
A System with Virtual Memory

Examples
workstations, servers, modern PCs, etc.

Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk

Address Translation Hardware converts virtual
addresses to physical addresses via OS-managed
lookup table (page table)

41
Page Faults (like Cache Misses)

What if an object is on disk rather than in
memory?
Page table entry indicates virtual address not in
memory
OS exception handler invoked to move data from
disk into memory
current process suspends, others can resume
OS has full control over placement, etc.

Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
42
Servicing a Page Fault
(1) Initiate Block Read

Processor Signals Controller
Read block of length P starting at disk address X
and store starting at memory address Y
Read Occurs
Direct Memory Access (DMA)
Under control of I/O controller
I / O Controller Signals Completion
Interrupt processor
OS resumes suspended process

Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
43
Motivation 2 Memory Management

Multiple processes can reside in physical memory.
How do we resolve address conflicts?
what if two processes access something at the
same address?

memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
44
Solution Separate Virt. Addr. Spaces

Virtual and physical address spaces divided into
equal-sized blocks
blocks are called pages (both virtual and
physical)
Each process has its own virtual address space
operating system controls how virtual pages as
assigned to physical memory

0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
45
Motivation 3 Protection

Page table entry contains access rights
information
hardware enforces this protection (trap into OS
if violation occurs)

Page Tables
Memory
Process i
Process j
46
VM Address Translation

Virtual Address Space
V 0, 1, , N1
Physical Address Space
P 0, 1, , M1
M lt N
Address Translation
MAP V ? P U ?
For virtual address a
MAP(a) a if data at virtual address a at
physical address a in P
MAP(a) ? if data at virtual address a not in
physical memory
Either invalid or stored on disk

47
VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
48
VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
49
VM Address Translation

Parameters
P 2p page size (bytes).
N 2n Virtual address limit
M 2m Physical address limit

n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
50
Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
51
Address Translation via Page Table
52
Page Table Operation

Translation
Separate (set of) page table(s) per process
VPN forms index into page table (points to a page
table entry)

53
Page Table Operation

Computing Physical Address
Page Table Entry (PTE) provides information about
page
if (valid bit 1) then the page is in memory.
Use physical page number (PPN) to construct
address
if (valid bit 0) then the page is on disk
Page fault

54
Page Table Operation

Checking Protection
Access rights field indicate allowable access
e.g., read-only, read-write, execute-only
typically support multiple protection modes
(e.g., kernel vs. user)
Protection violation fault if user doesnt have
necessary permission

55
Integrating VM and Cache

Most Caches Physically Addressed
Accessed by physical addresses
Allows multiple processes to have blocks in cache
at same time
Allows multiple processes to share pages
Cache doesnt need to be concerned with
protection issues
Access rights checked as part of address
translation
Perform Address Translation Before Cache Lookup
But this could involve a memory access itself (of
the PTE)
Of course, page table entries can also become
cached

56
Speeding up Translation with a TLB

Translation Lookaside Buffer (TLB)
Small hardware cache in MMU
Maps virtual page numbers to physical page
numbers
Contains complete page table entries for small
number of pages

57
Simple Memory System Page Table

Only show first 16 entries

VPN PPN Valid VPN PPN Valid
00 28 1 08 13 1
01 0 09 17 1
02 33 1 0A 09 1
03 02 1 0B 0
04 0 0C 0
05 16 1 0D 2D 1
06 0 0E 11 1
07 0 0F 0D 1
58
Simple Memory System TLB

TLB
16 entries
4-way associative

Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
0 03 0 09 0D 1 00 0 07 02 1
1 03 2D 1 02 0 04 0 0A 0
2 02 0 08 0 06 0 03 0
3 07 0 03 0D 1 0A 34 1 02 0
59
Simple Memory System Example

Addressing
14-bit virtual addresses
12-bit physical address
Page size 64 bytes

(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
60
Simple Memory System Cache

Cache
16 lines
4-byte line size
Direct mapped

Idx Tag Valid B0 B1 B2 B3 Idx Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11 8 24 1 3A 00 51 89
1 15 0 9 2D 0
2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B
3 36 0 B 0B 0
4 32 1 43 6D 8F 09 C 12 0
5 0D 1 36 72 F0 1D D 16 1 04 96 34 15
6 31 0 E 13 1 83 77 1B D3
7 16 1 11 C2 DF 03 F 14 0
61
Address Translation Example 1

Virtual Address 0x03D4
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

62
Address Translation Example 2

Virtual Address 0x0B8F
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

63
Address Translation Example 3

Virtual Address 0x0040
VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____
Physical Address
Offset ___ CI___ CT ____ Hit? __ Byte ____

64
Multi-Level Page Tables
Level 2 Tables

Given
4KB (212) page size
32-bit address space
4-byte PTE
Problem
Would need a 4 MB page table!
220 4 bytes
Common solution
multi-level page tables
e.g., 2-level table (P6)
Level 1 table 1024 entries, each of which points
to a Level 2 page table.
Level 2 table 1024 entries, each of which
points to a page

Level 1 Table
...

Write a Comment

User Comments (0)

About PowerShow.com

Conventional DRAM Organization PowerPoint PPT Presentation