Title: Conventional DRAM Organization
1Conventional DRAM Organization
- d x w DRAM
- dw total bits organized as d supercells of size w
bits
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
0
2 bits /
addr
1
rows
supercell (2,1)
2
(to CPU)
3
8 bits /
data
internal row buffer
2Reading DRAM Supercell (2,1)
- Step 1(a) Row access strobe (RAS) selects row 2.
Step 1(b) Row 2 copied from DRAM array to row
buffer.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
RAS 2
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
3Reading DRAM Supercell (2,1)
- Step 2(a) Column access strobe (CAS) selects
column 1.
Step 2(b) Supercell (2,1) copied from buffer to
data lines, and eventually back to the CPU.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
CAS 1
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
internal buffer
4Memory Modules
supercell (i,j)
DRAM 0
64 MB memory module consisting of eight 8Mx8
DRAMs
DRAM 7
Memory controller
5Typical Bus Structure Connecting CPU and Memory
- A bus is a collection of parallel wires that
carry address, data, and control signals. - Buses are typically shared by multiple devices.
CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
6Memory Read Transaction (1)
- CPU places address A on the memory bus.
register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
A
bus interface
A
x
7Memory Read Transaction (2)
- Main memory reads A from the memory bus,
retreives word x, and places it on the bus.
register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
x
bus interface
A
x
8Memory Read Transaction (3)
- CPU read word x from the bus and copies it into
register eax.
register file
Load operation movl A, eax
ALU
eax
x
main memory
0
I/O bridge
bus interface
A
x
9Memory Write Transaction (1)
- CPU places address A on bus. Main memory reads
it and waits for the corresponding data word to
arrive.
register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
A
bus interface
A
10Memory Write Transaction (2)
- CPU places data word y on the bus.
register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
y
bus interface
A
11Memory Write Transaction (3)
- Main memory read data word y from the bus and
stores it at address A.
register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
bus interface
A
y
12Disk Geometry
- Disks consist of platters, each with two
surfaces. - Each surface consists of concentric rings called
tracks. - Each track consists of sectors separated by gaps.
tracks
surface
track k
gaps
spindle
sectors
13I/O Bus
CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
I/O bus
Expansion slots for other devices such as network
adapters.
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
14Reading a Disk Sector (1)
CPU chip
CPU initiates a disk read by writing a command,
logical block number, and destination memory
address to a port (address) associated with disk
controller.
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
15Reading a Disk Sector (2)
CPU chip
Disk controller reads the sector and performs a
direct memory access (DMA) transfer into main
memory.
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
16Reading a Disk Sector (3)
CPU chip
When the DMA transfer completes, the disk
controller notifies the CPU with an interrupt
(i.e., asserts a special interrupt pin on the
CPU)
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
17An Example Memory Hierarchy
Smaller, faster, and costlier (per byte) storage
devices
L0
registers
CPU registers hold words retrieved from L1 cache.
on-chip L1 cache (SRAM)
L1
off-chip L2 cache (SRAM)
L2
main memory (DRAM)
L3
Larger, slower, and cheaper (per
byte) storage devices
local secondary storage (local disks)
L4
remote secondary storage (distributed file
systems, Web servers)
L5
18Caching in a Memory Hierarchy
4
10
4
10
0
1
2
3
Larger, slower, cheaper storage device at level
k1 is partitioned into blocks.
4
5
6
7
4
Level k1
8
9
10
11
10
12
13
14
15
19General Caching Concepts
- Program needs object d, which is stored in some
block b. - Cache hit
- Program finds b in the cache at level k. E.g.,
block 14. - Cache miss
- b is not at level k, so level k cache must fetch
it from level k1. E.g., block 12. - If level k cache is full, then some current block
must be replaced (evicted). Which one is the
victim? - Placement policy where can the new block go?
E.g., b mod 4 - Replacement policy which block should be
evicted? E.g., LRU
Request 14
Request 12
14
12
0
1
2
3
Level k
14
4
9
3
14
4
12
Request 12
12
4
0
1
2
3
4
5
6
7
Level k1
4
8
9
10
11
12
13
14
15
12
20Cache Memories
- Cache memories are small, fast SRAM-based
memories managed automatically in hardware. - Hold frequently accessed blocks of main memory
- CPU looks first for data in L1, then in L2, then
in main memory. - Typical bus structure
CPU chip
register file
ALU
L1 cache
cache bus
system bus
memory bus
main memory
I/O bridge
bus interface
L2 cache
21Inserting an L1 Cache Between the CPU and Main
Memory
The tiny, very fast CPU register file has room
for four 4-byte words.
The transfer unit between the CPU register file
and the cache is a 4-byte block.
line 0
The small fast L1 cache has room for two 4-word
blocks.
line 1
The transfer unit between the cache and main
memory is a 4-word block (16 bytes).
a b c d
block 10
...
The big slow main memory has room for many
4-word blocks.
p q r s
block 21
...
w x y z
block 30
...
22General Org of a Cache Memory
t tag bits per line
1 valid bit per line
B 2b bytes per cache block
Cache is an array of sets. Each set contains one
or more lines. Each line holds a block of data.
 Â
B1
1
0
valid
tag
E lines per set
 Â
set 0
 Â
B1
1
0
valid
tag
 Â
B1
1
0
valid
tag
 Â
set 1
S 2s sets
 Â
B1
1
0
valid
tag
 Â
 Â
B1
1
0
valid
tag
 Â
set S-1
 Â
B1
1
0
valid
tag
Cache size C B x E x S data bytes
23Addressing Caches
Address A
b bits
t bits
s bits
0
m-1
 Â
B1
1
0
v
tag
 Â
set 0
lttaggt
ltset indexgt
ltblock offsetgt
 Â
B1
1
0
v
tag
 Â
B1
1
0
v
tag
 Â
set 1
 Â
B1
1
0
v
tag
The word at address A is in the cache if the tag
bits in one of the ltvalidgt lines in set ltset
indexgt match lttaggt. The word contents begin at
offset ltblock offsetgt bytes from the beginning
of the block.
 Â
 Â
B1
1
0
v
tag
set S-1
 Â
 Â
B1
1
0
v
tag
24Direct-Mapped Cache
- Simplest kind of cache
- Characterized by exactly one line per set.
set 0
E1 lines per set
valid
tag
cache block
cache block
valid
tag
set 1
 Â
cache block
valid
tag
set S-1
25Accessing Direct-Mapped Caches
- Set selection
- Use the set index bits to determine the set of
interest.
set 0
valid
tag
cache block
selected set
valid
tag
set 1
cache block
 Â
t bits
s bits
b bits
valid
tag
set S-1
cache block
0 0 0 0 1
0
m-1
tag
set index
block offset
26Accessing Direct-Mapped Caches
- Line matching and word selection
- Line matching Find a valid line in the selected
set with a matching tag - Word selection Then extract the word
3
0
1
2
7
4
5
6
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
27Direct-Mapped Cache Simulation
M16 byte addresses, B2 bytes/block, S4 sets,
E1 entry/set Address trace (reads) 0 00002,
1 00012, 13 11012, 8 10002, 0 00002
28Why Use Middle Bits as Index?
High-Order Bit Indexing
Middle-Order Bit Indexing
4-line Cache
00
0000
0000
01
0001
0001
10
0010
0010
11
0011
0011
0100
0100
0101
0101
- High-Order Bit Indexing
- Adjacent memory lines would map to same cache
entry - Poor use of spatial locality
- Middle-Order Bit Indexing
- Consecutive memory lines map to different cache
lines - Can hold C-byte region of address space in cache
at one time
0110
0110
0111
0111
1000
1000
1001
1001
1010
1010
1011
1011
1100
1100
1101
1101
1110
1110
1111
1111
29Set Associative Caches
- Characterized by more than one line per set
valid
tag
cache block
set 0
E2 lines per set
valid
tag
cache block
valid
tag
cache block
set 1
valid
tag
cache block
 Â
valid
tag
cache block
set S-1
valid
tag
cache block
30Accessing Set Associative Caches
- Set selection
- identical to direct-mapped cache
valid
tag
cache block
set 0
cache block
valid
tag
valid
tag
cache block
Selected set
set 1
cache block
valid
tag
 Â
cache block
valid
tag
set S-1
t bits
s bits
b bits
cache block
valid
tag
0 0 0 0 1
0
m-1
tag
set index
block offset
31Accessing Set Associative Caches
- Line matching and word selection
- must compare the tag in each valid line in the
selected set.
3
0
1
2
7
4
5
6
1
1001
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
32- E2, B4, S8. Words 0x0E34, 0x0DD5, 0x1FE4
- And list memory address that will hit in Set 3.
- Set tag V B0 B1 B2 B3 tag V B0 B1 B2
B3 - 0 09 1 86 30 3F 10 00 0 --
-- -- -- - 45 1 60 4F E0 23 38 1 00 BC
0B 37 - EB 0 -- -- -- -- 0B 0 --
-- -- -- - 06 0 -- -- -- -- 32 1 12
08 7B AD - C7 1 06 78 07 C5 05 1 40 67 C2
3B - 71 1 0B DE 18 4B 6E 0 -- --
-- -- - 91 1 A0 B7 26 2D F0 0 -- --
-- -- - 46 0 -- -- -- -- DE 1 12
C0 88 37
33Multi-Level Caches
- Options separate data and instruction caches, or
a unified cache
Unified L2 Cache
Memory
L1 d-cache
Regs
Processor
disk
L1 i-cache
size speed /Mbyte line size
200 B 3 ns 8 B
8-64 KB 3 ns 32 B
128 MB DRAM 60 ns 1.50/MB 8 KB
30 GB 8 ms 0.05/MB
1-4MB SRAM 6 ns 100/MB 32 B
larger, slower, cheaper
34Motivations for Virtual Memory
- Use Physical DRAM as a Cache for the Disk
- Address space of a process can exceed physical
memory size - Sum of address spaces of multiple processes can
exceed physical memory - Simplify Memory Management
- Multiple processes resident in main memory.
- Each process with its own address space
- Only active code and data is actually in memory
- Allocate more memory to process as needed.
- Provide Protection
- One process cant interfere with another.
- because they operate in different address spaces.
- User process cannot access privileged information
- different sections of address spaces have
different permissions.
35Motivation 1 DRAM a Cache for Disk
- Full address space is quite large
- 32-bit addresses
4,000,000,000 (4 billion) bytes - 64-bit addresses 16,000,000,000,000,000,000 (16
quintillion) bytes - Disk storage is 300X cheaper than DRAM storage
- 80 GB of DRAM 33,000
- 80 GB of disk 110
- To access large amounts of data in a
cost-effective manner, the bulk of the data must
be stored on disk
36Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
37DRAM vs. SRAM as a Cache
- DRAM vs. disk is more extreme than SRAM vs. DRAM
- Access latencies
- DRAM 10X slower than SRAM
- Disk 100,000X slower than DRAM
- Importance of exploiting spatial locality
- First byte is 100,000X slower than successive
bytes on disk - vs. 4X improvement for page-mode vs. regular
accesses to DRAM - Bottom line
- Design decisions made for DRAM caches driven by
enormous cost of misses
DRAM
Disk
SRAM
38Impact of Properties on Design
- If DRAM was to be organized similar to an SRAM
cache, how would we set the following design
parameters? - Line size?
- Large, since disk better at transferring large
blocks - Associativity?
- High, to mimimize miss rate
- Write through or write back?
- Write back, since cant afford to perform small
writes to disk - What would the impact of these choices be on
- miss rate
- Extremely low. ltlt 1
- hit time
- Must match cache/DRAM performance
- miss latency
- Very high. 20ms
- tag storage overhead
- Low, relative to block size
39A System with Physical Memory Only
- Examples
- most Cray machines, early PCs, nearly all
embedded systems, etc.
- Addresses generated by the CPU correspond
directly to bytes in physical memory
40A System with Virtual Memory
- Examples
- workstations, servers, modern PCs, etc.
Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk
- Address Translation Hardware converts virtual
addresses to physical addresses via OS-managed
lookup table (page table)
41Page Faults (like Cache Misses)
- What if an object is on disk rather than in
memory? - Page table entry indicates virtual address not in
memory - OS exception handler invoked to move data from
disk into memory - current process suspends, others can resume
- OS has full control over placement, etc.
Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
42Servicing a Page Fault
(1) Initiate Block Read
- Processor Signals Controller
- Read block of length P starting at disk address X
and store starting at memory address Y - Read Occurs
- Direct Memory Access (DMA)
- Under control of I/O controller
- I / O Controller Signals Completion
- Interrupt processor
- OS resumes suspended process
Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
43Motivation 2 Memory Management
- Multiple processes can reside in physical memory.
- How do we resolve address conflicts?
- what if two processes access something at the
same address?
memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
44Solution Separate Virt. Addr. Spaces
- Virtual and physical address spaces divided into
equal-sized blocks - blocks are called pages (both virtual and
physical) - Each process has its own virtual address space
- operating system controls how virtual pages as
assigned to physical memory
0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
45Motivation 3 Protection
- Page table entry contains access rights
information - hardware enforces this protection (trap into OS
if violation occurs)
Page Tables
Memory
Process i
Process j
46VM Address Translation
- Virtual Address Space
- V 0, 1, , N1
- Physical Address Space
- P 0, 1, , M1
- M lt N
- Address Translation
- MAP V ? P U ?
- For virtual address a
- MAP(a) a if data at virtual address a at
physical address a in P - MAP(a) ? if data at virtual address a not in
physical memory - Either invalid or stored on disk
47VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
48VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
49VM Address Translation
- Parameters
- P 2p page size (bytes).
- N 2n Virtual address limit
- M 2m Physical address limit
n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
50Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
51Address Translation via Page Table
52Page Table Operation
- Translation
- Separate (set of) page table(s) per process
- VPN forms index into page table (points to a page
table entry)
53Page Table Operation
- Computing Physical Address
- Page Table Entry (PTE) provides information about
page - if (valid bit 1) then the page is in memory.
- Use physical page number (PPN) to construct
address - if (valid bit 0) then the page is on disk
- Page fault
54Page Table Operation
- Checking Protection
- Access rights field indicate allowable access
- e.g., read-only, read-write, execute-only
- typically support multiple protection modes
(e.g., kernel vs. user) - Protection violation fault if user doesnt have
necessary permission
55Integrating VM and Cache
- Most Caches Physically Addressed
- Accessed by physical addresses
- Allows multiple processes to have blocks in cache
at same time - Allows multiple processes to share pages
- Cache doesnt need to be concerned with
protection issues - Access rights checked as part of address
translation - Perform Address Translation Before Cache Lookup
- But this could involve a memory access itself (of
the PTE) - Of course, page table entries can also become
cached
56Speeding up Translation with a TLB
- Translation Lookaside Buffer (TLB)
- Small hardware cache in MMU
- Maps virtual page numbers to physical page
numbers - Contains complete page table entries for small
number of pages
57Simple Memory System Page Table
- Only show first 16 entries
VPN PPN Valid VPN PPN Valid
00 28 1 08 13 1
01 0 09 17 1
02 33 1 0A 09 1
03 02 1 0B 0
04 0 0C 0
05 16 1 0D 2D 1
06 0 0E 11 1
07 0 0F 0D 1
58Simple Memory System TLB
- TLB
- 16 entries
- 4-way associative
Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
0 03 0 09 0D 1 00 0 07 02 1
1 03 2D 1 02 0 04 0 0A 0
2 02 0 08 0 06 0 03 0
3 07 0 03 0D 1 0A 34 1 02 0
59Simple Memory System Example
- Addressing
- 14-bit virtual addresses
- 12-bit physical address
- Page size 64 bytes
(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
60Simple Memory System Cache
- Cache
- 16 lines
- 4-byte line size
- Direct mapped
Idx Tag Valid B0 B1 B2 B3 Idx Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11 8 24 1 3A 00 51 89
1 15 0 9 2D 0
2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B
3 36 0 B 0B 0
4 32 1 43 6D 8F 09 C 12 0
5 0D 1 36 72 F0 1D D 16 1 04 96 34 15
6 31 0 E 13 1 83 77 1B D3
7 16 1 11 C2 DF 03 F 14 0
61Address Translation Example 1
- Virtual Address 0x03D4
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
62Address Translation Example 2
- Virtual Address 0x0B8F
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
63Address Translation Example 3
- Virtual Address 0x0040
- VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
Fault? __ PPN ____ - Physical Address
- Offset ___ CI___ CT ____ Hit? __ Byte ____
64Multi-Level Page Tables
Level 2 Tables
- Given
- 4KB (212) page size
- 32-bit address space
- 4-byte PTE
- Problem
- Would need a 4 MB page table!
- 220 4 bytes
- Common solution
- multi-level page tables
- e.g., 2-level table (P6)
- Level 1 table 1024 entries, each of which points
to a Level 2 page table. - Level 2 table 1024 entries, each of which
points to a page
Level 1 Table
...