Conventional DRAM Organization - PowerPoint PPT Presentation

About This Presentation
Title:

Conventional DRAM Organization

Description:

CPU registers hold words retrieved from L1 cache. ... The tiny, very fast CPU register file. has room for four 4-byte words. The transfer unit between ... – PowerPoint PPT presentation

Number of Views:824
Avg rating:3.0/5.0
Slides: 65
Provided by: randa59
Category:

less

Transcript and Presenter's Notes

Title: Conventional DRAM Organization


1
Conventional DRAM Organization
  • d x w DRAM
  • dw total bits organized as d supercells of size w
    bits

16 x 8 DRAM chip
cols
0
1
2
3
memory controller
0
2 bits /
addr
1
rows
supercell (2,1)
2
(to CPU)
3
8 bits /
data
internal row buffer
2
Reading DRAM Supercell (2,1)
  • Step 1(a) Row access strobe (RAS) selects row 2.

Step 1(b) Row 2 copied from DRAM array to row
buffer.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
RAS 2
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
3
Reading DRAM Supercell (2,1)
  • Step 2(a) Column access strobe (CAS) selects
    column 1.

Step 2(b) Supercell (2,1) copied from buffer to
data lines, and eventually back to the CPU.
16 x 8 DRAM chip
cols
0
1
2
3
memory controller
CAS 1
2 /
0
addr
1
rows
2
3
8 /
data
internal row buffer
internal buffer
4
Memory Modules
supercell (i,j)
DRAM 0
64 MB memory module consisting of eight 8Mx8
DRAMs
DRAM 7
Memory controller
5
Typical Bus Structure Connecting CPU and Memory
  • A bus is a collection of parallel wires that
    carry address, data, and control signals.
  • Buses are typically shared by multiple devices.

CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
6
Memory Read Transaction (1)
  • CPU places address A on the memory bus.

register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
A

bus interface
A
x
7
Memory Read Transaction (2)
  • Main memory reads A from the memory bus,
    retreives word x, and places it on the bus.

register file
Load operation movl A, eax
ALU
eax
main memory
0
I/O bridge
x
bus interface
A
x
8
Memory Read Transaction (3)
  • CPU read word x from the bus and copies it into
    register eax.

register file
Load operation movl A, eax
ALU
eax
x
main memory
0
I/O bridge
bus interface
A
x
9
Memory Write Transaction (1)
  • CPU places address A on bus. Main memory reads
    it and waits for the corresponding data word to
    arrive.

register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
A
bus interface
A
10
Memory Write Transaction (2)
  • CPU places data word y on the bus.

register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
y
bus interface
A
11
Memory Write Transaction (3)
  • Main memory read data word y from the bus and
    stores it at address A.

register file
Store operation movl eax, A
ALU
eax
y
main memory
0
I/O bridge
bus interface
A
y
12
Disk Geometry
  • Disks consist of platters, each with two
    surfaces.
  • Each surface consists of concentric rings called
    tracks.
  • Each track consists of sectors separated by gaps.

tracks
surface
track k
gaps
spindle
sectors
13
I/O Bus
CPU chip
register file
ALU
system bus
memory bus
main memory
I/O bridge
bus interface
I/O bus
Expansion slots for other devices such as network
adapters.
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
14
Reading a Disk Sector (1)
CPU chip
CPU initiates a disk read by writing a command,
logical block number, and destination memory
address to a port (address) associated with disk
controller.

register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
15
Reading a Disk Sector (2)
CPU chip
Disk controller reads the sector and performs a
direct memory access (DMA) transfer into main
memory.
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
16
Reading a Disk Sector (3)
CPU chip
When the DMA transfer completes, the disk
controller notifies the CPU with an interrupt
(i.e., asserts a special interrupt pin on the
CPU)
register file
ALU
main memory
bus interface
I/O bus
USB controller
disk controller
graphics adapter
mouse
keyboard
monitor
disk
17
An Example Memory Hierarchy
Smaller, faster, and costlier (per byte) storage
devices
L0
registers
CPU registers hold words retrieved from L1 cache.
on-chip L1 cache (SRAM)
L1
off-chip L2 cache (SRAM)
L2
main memory (DRAM)
L3
Larger, slower, and cheaper (per
byte) storage devices
local secondary storage (local disks)
L4
remote secondary storage (distributed file
systems, Web servers)
L5
18
Caching in a Memory Hierarchy
4
10
4
10
0
1
2
3
Larger, slower, cheaper storage device at level
k1 is partitioned into blocks.
4
5
6
7
4
Level k1
8
9
10
11
10
12
13
14
15
19
General Caching Concepts
  • Program needs object d, which is stored in some
    block b.
  • Cache hit
  • Program finds b in the cache at level k. E.g.,
    block 14.
  • Cache miss
  • b is not at level k, so level k cache must fetch
    it from level k1. E.g., block 12.
  • If level k cache is full, then some current block
    must be replaced (evicted). Which one is the
    victim?
  • Placement policy where can the new block go?
    E.g., b mod 4
  • Replacement policy which block should be
    evicted? E.g., LRU

Request 14
Request 12
14
12
0
1
2
3
Level k
14
4
9
3
14
4
12
Request 12
12
4
0
1
2
3
4
5
6
7
Level k1
4
8
9
10
11
12
13
14
15
12
20
Cache Memories
  • Cache memories are small, fast SRAM-based
    memories managed automatically in hardware.
  • Hold frequently accessed blocks of main memory
  • CPU looks first for data in L1, then in L2, then
    in main memory.
  • Typical bus structure

CPU chip
register file
ALU
L1 cache
cache bus
system bus
memory bus
main memory
I/O bridge
bus interface
L2 cache
21
Inserting an L1 Cache Between the CPU and Main
Memory
The tiny, very fast CPU register file has room
for four 4-byte words.
The transfer unit between the CPU register file
and the cache is a 4-byte block.
line 0
The small fast L1 cache has room for two 4-word
blocks.
line 1
The transfer unit between the cache and main
memory is a 4-word block (16 bytes).
a b c d
block 10
...
The big slow main memory has room for many
4-word blocks.
p q r s
block 21
...
w x y z
block 30
...
22
General Org of a Cache Memory
t tag bits per line
1 valid bit per line
B 2b bytes per cache block
Cache is an array of sets. Each set contains one
or more lines. Each line holds a block of data.
  
B1
1
0
valid
tag
E lines per set
  
set 0
  
B1
1
0
valid
tag
  
B1
1
0
valid
tag
  
set 1
S 2s sets
  
B1
1
0
valid
tag
  
  
B1
1
0
valid
tag
  
set S-1
  
B1
1
0
valid
tag
Cache size C B x E x S data bytes
23
Addressing Caches
Address A
b bits
t bits
s bits
0
m-1
  
B1
1
0
v
tag
  
set 0
lttaggt
ltset indexgt
ltblock offsetgt
  
B1
1
0
v
tag
  
B1
1
0
v
tag
  
set 1
  
B1
1
0
v
tag
The word at address A is in the cache if the tag
bits in one of the ltvalidgt lines in set ltset
indexgt match lttaggt. The word contents begin at
offset ltblock offsetgt bytes from the beginning
of the block.
  
  
B1
1
0
v
tag
set S-1
  
  
B1
1
0
v
tag
24
Direct-Mapped Cache
  • Simplest kind of cache
  • Characterized by exactly one line per set.

set 0
E1 lines per set
valid
tag
cache block
cache block
valid
tag
set 1
  
cache block
valid
tag
set S-1
25
Accessing Direct-Mapped Caches
  • Set selection
  • Use the set index bits to determine the set of
    interest.

set 0
valid
tag
cache block
selected set
valid
tag
set 1
cache block
  
t bits
s bits
b bits
valid
tag
set S-1
cache block
0 0 0 0 1
0
m-1
tag
set index
block offset
26
Accessing Direct-Mapped Caches
  • Line matching and word selection
  • Line matching Find a valid line in the selected
    set with a matching tag
  • Word selection Then extract the word

3
0
1
2
7
4
5
6
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
27
Direct-Mapped Cache Simulation
M16 byte addresses, B2 bytes/block, S4 sets,
E1 entry/set Address trace (reads) 0 00002,
1 00012, 13 11012, 8 10002, 0 00002
28
Why Use Middle Bits as Index?
High-Order Bit Indexing
Middle-Order Bit Indexing
4-line Cache
00
0000
0000
01
0001
0001
10
0010
0010
11
0011
0011
0100
0100
0101
0101
  • High-Order Bit Indexing
  • Adjacent memory lines would map to same cache
    entry
  • Poor use of spatial locality
  • Middle-Order Bit Indexing
  • Consecutive memory lines map to different cache
    lines
  • Can hold C-byte region of address space in cache
    at one time

0110
0110
0111
0111
1000
1000
1001
1001
1010
1010
1011
1011
1100
1100
1101
1101
1110
1110
1111
1111
29
Set Associative Caches
  • Characterized by more than one line per set

valid
tag
cache block
set 0
E2 lines per set
valid
tag
cache block
valid
tag
cache block
set 1
valid
tag
cache block
  
valid
tag
cache block
set S-1
valid
tag
cache block
30
Accessing Set Associative Caches
  • Set selection
  • identical to direct-mapped cache

valid
tag
cache block
set 0
cache block
valid
tag
valid
tag
cache block
Selected set
set 1
cache block
valid
tag
  
cache block
valid
tag
set S-1
t bits
s bits
b bits
cache block
valid
tag
0 0 0 0 1
0
m-1
tag
set index
block offset
31
Accessing Set Associative Caches
  • Line matching and word selection
  • must compare the tag in each valid line in the
    selected set.

3
0
1
2
7
4
5
6
1
1001
selected set (i)
1
0110
w3
w0
w1
w2
t bits
s bits
b bits
100
i
0110
0
m-1
tag
set index
block offset
32
  • E2, B4, S8. Words 0x0E34, 0x0DD5, 0x1FE4
  • And list memory address that will hit in Set 3.
  • Set tag V B0 B1 B2 B3 tag V B0 B1 B2
    B3
  • 0 09 1 86 30 3F 10 00 0 --
    -- -- --
  • 45 1 60 4F E0 23 38 1 00 BC
    0B 37
  • EB 0 -- -- -- -- 0B 0 --
    -- -- --
  • 06 0 -- -- -- -- 32 1 12
    08 7B AD
  • C7 1 06 78 07 C5 05 1 40 67 C2
    3B
  • 71 1 0B DE 18 4B 6E 0 -- --
    -- --
  • 91 1 A0 B7 26 2D F0 0 -- --
    -- --
  • 46 0 -- -- -- -- DE 1 12
    C0 88 37

33
Multi-Level Caches
  • Options separate data and instruction caches, or
    a unified cache

Unified L2 Cache
Memory
L1 d-cache
Regs
Processor
disk
L1 i-cache
size speed /Mbyte line size
200 B 3 ns 8 B
8-64 KB 3 ns 32 B
128 MB DRAM 60 ns 1.50/MB 8 KB
30 GB 8 ms 0.05/MB
1-4MB SRAM 6 ns 100/MB 32 B
larger, slower, cheaper
34
Motivations for Virtual Memory
  • Use Physical DRAM as a Cache for the Disk
  • Address space of a process can exceed physical
    memory size
  • Sum of address spaces of multiple processes can
    exceed physical memory
  • Simplify Memory Management
  • Multiple processes resident in main memory.
  • Each process with its own address space
  • Only active code and data is actually in memory
  • Allocate more memory to process as needed.
  • Provide Protection
  • One process cant interfere with another.
  • because they operate in different address spaces.
  • User process cannot access privileged information
  • different sections of address spaces have
    different permissions.

35
Motivation 1 DRAM a Cache for Disk
  • Full address space is quite large
  • 32-bit addresses
    4,000,000,000 (4 billion) bytes
  • 64-bit addresses 16,000,000,000,000,000,000 (16
    quintillion) bytes
  • Disk storage is 300X cheaper than DRAM storage
  • 80 GB of DRAM 33,000
  • 80 GB of disk 110
  • To access large amounts of data in a
    cost-effective manner, the bulk of the data must
    be stored on disk

36
Levels in Memory Hierarchy
cache
virtual memory
Memory
disk
8 B
32 B
4 KB
Register
Cache
Memory
Disk Memory
size speed /Mbyte line size
32 B 1 ns 8 B
32 KB-4MB 2 ns 125/MB 32 B
1024 MB 30 ns 0.20/MB 4 KB
100 GB 8 ms 0.001/MB
larger, slower, cheaper
37
DRAM vs. SRAM as a Cache
  • DRAM vs. disk is more extreme than SRAM vs. DRAM
  • Access latencies
  • DRAM 10X slower than SRAM
  • Disk 100,000X slower than DRAM
  • Importance of exploiting spatial locality
  • First byte is 100,000X slower than successive
    bytes on disk
  • vs. 4X improvement for page-mode vs. regular
    accesses to DRAM
  • Bottom line
  • Design decisions made for DRAM caches driven by
    enormous cost of misses

DRAM
Disk
SRAM
38
Impact of Properties on Design
  • If DRAM was to be organized similar to an SRAM
    cache, how would we set the following design
    parameters?
  • Line size?
  • Large, since disk better at transferring large
    blocks
  • Associativity?
  • High, to mimimize miss rate
  • Write through or write back?
  • Write back, since cant afford to perform small
    writes to disk
  • What would the impact of these choices be on
  • miss rate
  • Extremely low. ltlt 1
  • hit time
  • Must match cache/DRAM performance
  • miss latency
  • Very high. 20ms
  • tag storage overhead
  • Low, relative to block size

39
A System with Physical Memory Only
  • Examples
  • most Cray machines, early PCs, nearly all
    embedded systems, etc.
  • Addresses generated by the CPU correspond
    directly to bytes in physical memory

40
A System with Virtual Memory
  • Examples
  • workstations, servers, modern PCs, etc.

Memory
Page Table
Virtual Addresses
Physical Addresses
0
1
P-1
Disk
  • Address Translation Hardware converts virtual
    addresses to physical addresses via OS-managed
    lookup table (page table)

41
Page Faults (like Cache Misses)
  • What if an object is on disk rather than in
    memory?
  • Page table entry indicates virtual address not in
    memory
  • OS exception handler invoked to move data from
    disk into memory
  • current process suspends, others can resume
  • OS has full control over placement, etc.

Before fault
After fault
Memory
Memory
Page Table
Page Table
Virtual Addresses
Physical Addresses
Virtual Addresses
Physical Addresses
CPU
CPU
Disk
Disk
42
Servicing a Page Fault
(1) Initiate Block Read
  • Processor Signals Controller
  • Read block of length P starting at disk address X
    and store starting at memory address Y
  • Read Occurs
  • Direct Memory Access (DMA)
  • Under control of I/O controller
  • I / O Controller Signals Completion
  • Interrupt processor
  • OS resumes suspended process

Processor
Reg
(3) Read Done
Cache
Memory-I/O bus
(2) DMA Transfer
I/O controller
Memory
disk
Disk
43
Motivation 2 Memory Management
  • Multiple processes can reside in physical memory.
  • How do we resolve address conflicts?
  • what if two processes access something at the
    same address?

memory invisible to user code
kernel virtual memory
stack
esp
Memory mapped region forshared libraries
Linux/x86 process memory image
the brk ptr
runtime heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
0
44
Solution Separate Virt. Addr. Spaces
  • Virtual and physical address spaces divided into
    equal-sized blocks
  • blocks are called pages (both virtual and
    physical)
  • Each process has its own virtual address space
  • operating system controls how virtual pages as
    assigned to physical memory

0
Physical Address Space (DRAM)
Address Translation
Virtual Address Space for Process 1
0
VP 1
PP 2
VP 2
...
N-1
(e.g., read/only library code)
PP 7
Virtual Address Space for Process 2
0
VP 1
PP 10
VP 2
...
M-1
N-1
45
Motivation 3 Protection
  • Page table entry contains access rights
    information
  • hardware enforces this protection (trap into OS
    if violation occurs)

Page Tables
Memory
Process i
Process j
46
VM Address Translation
  • Virtual Address Space
  • V 0, 1, , N1
  • Physical Address Space
  • P 0, 1, , M1
  • M lt N
  • Address Translation
  • MAP V ? P U ?
  • For virtual address a
  • MAP(a) a if data at virtual address a at
    physical address a in P
  • MAP(a) ? if data at virtual address a not in
    physical memory
  • Either invalid or stored on disk

47
VM Address Translation Hit
Processor
Hardware Addr Trans Mechanism
Main Memory
a
a'
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
48
VM Address Translation Miss
page fault
fault handler
Processor
?
Hardware Addr Trans Mechanism
Main Memory
Secondary memory
a
a'
OS performs this transfer (only if miss)
physical address
virtual address
part of the on-chip memory mgmt unit (MMU)
49
VM Address Translation
  • Parameters
  • P 2p page size (bytes).
  • N 2n Virtual address limit
  • M 2m Physical address limit

n1
0
p1
p
virtual address
virtual page number
page offset
address translation
0
p1
p
m1
physical address
physical page number
page offset
Page offset bits dont change as a result of
translation
50
Page Tables
Memory resident page table (physical page or
disk address)
Virtual Page Number
Physical Memory
Valid
1
1
0
1
1
1
0
1
Disk Storage (swap file or regular file system
file)
0
1
51
Address Translation via Page Table
52
Page Table Operation
  • Translation
  • Separate (set of) page table(s) per process
  • VPN forms index into page table (points to a page
    table entry)

53
Page Table Operation
  • Computing Physical Address
  • Page Table Entry (PTE) provides information about
    page
  • if (valid bit 1) then the page is in memory.
  • Use physical page number (PPN) to construct
    address
  • if (valid bit 0) then the page is on disk
  • Page fault

54
Page Table Operation
  • Checking Protection
  • Access rights field indicate allowable access
  • e.g., read-only, read-write, execute-only
  • typically support multiple protection modes
    (e.g., kernel vs. user)
  • Protection violation fault if user doesnt have
    necessary permission

55
Integrating VM and Cache
  • Most Caches Physically Addressed
  • Accessed by physical addresses
  • Allows multiple processes to have blocks in cache
    at same time
  • Allows multiple processes to share pages
  • Cache doesnt need to be concerned with
    protection issues
  • Access rights checked as part of address
    translation
  • Perform Address Translation Before Cache Lookup
  • But this could involve a memory access itself (of
    the PTE)
  • Of course, page table entries can also become
    cached

56
Speeding up Translation with a TLB
  • Translation Lookaside Buffer (TLB)
  • Small hardware cache in MMU
  • Maps virtual page numbers to physical page
    numbers
  • Contains complete page table entries for small
    number of pages

57
Simple Memory System Page Table
  • Only show first 16 entries

VPN PPN Valid VPN PPN Valid
00 28 1 08 13 1
01 0 09 17 1
02 33 1 0A 09 1
03 02 1 0B 0
04 0 0C 0
05 16 1 0D 2D 1
06 0 0E 11 1
07 0 0F 0D 1
58
Simple Memory System TLB
  • TLB
  • 16 entries
  • 4-way associative

Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
0 03 0 09 0D 1 00 0 07 02 1
1 03 2D 1 02 0 04 0 0A 0
2 02 0 08 0 06 0 03 0
3 07 0 03 0D 1 0A 34 1 02 0
59
Simple Memory System Example
  • Addressing
  • 14-bit virtual addresses
  • 12-bit physical address
  • Page size 64 bytes

(Virtual Page Offset)
(Virtual Page Number)
(Physical Page Number)
(Physical Page Offset)
60
Simple Memory System Cache
  • Cache
  • 16 lines
  • 4-byte line size
  • Direct mapped

Idx Tag Valid B0 B1 B2 B3 Idx Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11 8 24 1 3A 00 51 89
1 15 0 9 2D 0
2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B
3 36 0 B 0B 0
4 32 1 43 6D 8F 09 C 12 0
5 0D 1 36 72 F0 1D D 16 1 04 96 34 15
6 31 0 E 13 1 83 77 1B D3
7 16 1 11 C2 DF 03 F 14 0
61
Address Translation Example 1
  • Virtual Address 0x03D4
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

62
Address Translation Example 2
  • Virtual Address 0x0B8F
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

63
Address Translation Example 3
  • Virtual Address 0x0040
  • VPN ___ TLBI ___ TLBT ____ TLB Hit? __ Page
    Fault? __ PPN ____
  • Physical Address
  • Offset ___ CI___ CT ____ Hit? __ Byte ____

64
Multi-Level Page Tables
Level 2 Tables
  • Given
  • 4KB (212) page size
  • 32-bit address space
  • 4-byte PTE
  • Problem
  • Would need a 4 MB page table!
  • 220 4 bytes
  • Common solution
  • multi-level page tables
  • e.g., 2-level table (P6)
  • Level 1 table 1024 entries, each of which points
    to a Level 2 page table.
  • Level 2 table 1024 entries, each of which
    points to a page

Level 1 Table
...
Write a Comment
User Comments (0)
About PowerShow.com