CS252 Graduate Computer Architecture Lecture 5 Memory Technology - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

CS252 Graduate Computer Architecture Lecture 5 Memory Technology

Description:

Random Access Memory (vs. Serial Access Memory) Different ... ord Line. Storage. Cell. Row Decoder. CS252/Culler. Lec 5.12. 2/5/02. So, Why do I freaking care? ... – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 38

Provided by: johnkubi

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS252 Graduate Computer Architecture Lecture 5 Memory Technology

1
CS252Graduate Computer ArchitectureLecture
5Memory Technology

February 5, 2001
Phil Buonadonna

2
Main Memory Background

Random Access Memory (vs. Serial Access Memory)
Different flavors at different levels
Physical Makeup (CMOS, DRAM)
Low Level Architectures (FPM,EDO,BEDO,SDRAM)
Cache uses SRAM Static Random Access Memory
No refresh (6 transistors/bit vs. 1
transistorSize DRAM/SRAM 4-8, Cost/Cycle
time SRAM/DRAM 8-16
Main Memory is DRAM Dynamic Random Access Memory
Dynamic since needs to be refreshed periodically
(8 ms, 1 time)
Addresses divided into 2 halves (Memory as a 2D
matrix)
RAS or Row Access Strobe
CAS or Column Access Strobe

3
Static RAM (SRAM)

Six transistors in cross connected fashion
Provides regular AND inverted outputs
Implemented in CMOS process

Single Port 6-T SRAM Cell
4
SRAM Read Timing (typical)

tAA (access time for address) how long it takes
to get stable output after a change in address.
tACS (access time for chip select) how long it
takes to get stable output after CS is
asserted.
tOE (output enable time) how long it takes for
the three-state output buffers to leave the
high- impedance state when OE and CS are both
asserted.
tOZ (output-disable time) how long it takes for
the three-state output buffers to enter high-
impedance state after OE or CS are negated.
tOH (output-hold time) how long the output
data remains valid after a change to the
address inputs.

5
SRAM Read Timing (typical)
stable
stable
stable
ADDR
CS_L
OE_L
tOE
valid
valid
valid
DOUT
WE_L HIGH
6
Dynamic RAM

SRAM cells exhibit high speed/poor density
DRAM simple transistor/capacitor pairs in high
density form

Word Line
C
Bit Line
...
Sense Amp
7
Basic DRAM Cell

Planar Cell
Polysilicon-Diffusion Capacitance, Diffused
Bitlines
Problem Uses a lot of area (lt 1Mb)
You cant just ride the process curve to shrink C
(discussed later)

8
Advanced DRAM Cells

Stacked cell (Expand UP)

9
Advanced DRAM Cells

Trench Cell (Expand DOWN)

10
DRAM Operations

Write
Charge bitline HIGH or LOW and set wordline HIGH
Read
Bit line is precharged to a voltage halfway
between HIGH and LOW, and then the word line is
set HIGH.
Depending on the charge in the cap, the
precharged bitline is pulled slightly higheror
lower.
Sense Amp Detects change
Explains why Cap cant shrink
Need to sufficiently drive bitline
Increase density gt increase parasiticcapacitance

11
DRAM logical organization (4 Mbit)
D
Column Decoder

Sense
Amps I/O
1
1
Q
Memory
Array
A0A1
0
Row Decoder

(2,048 x 2,048)
Storage
W
ord Line
Cell

Square root of bits per RAS/CAS

12
So, Why do I freaking care?

By its nature, DRAM isnt built for speed
Reponse times dependent on capacitive circuit
properties which get worse as density increases
DRAM process isnt easy to integrate into CMOS
process
DRAM is off chip
Connectors, wires, etc introduce slowness
IRAM efforts looking to integrating the two
Memory Architectures are designed to minimize
impact of DRAM latency
Low Level Memory chips
High Level memory designs.
You will pay and then some for a good
memory system.

13
So, Why do I freaking care?

1960-1985 Speed (no. operations)
1990
Pipelined Execution Fast Clock Rate
Out-of-Order execution
Superscalar Instruction Issue
1998 Speed (non-cached memory accesses)
What does this mean for
Compilers?,Operating Systems?, Algorithms? Data
Structures?

14
4 Key DRAM Timing Parameters

tRAC minimum time from RAS line falling to the
valid data output.
Quoted as the speed of a DRAM when buy
A typical 4Mb DRAM tRAC 60 ns
Speed of DRAM since on purchase sheet?
tRC minimum time from the start of one row
access to the start of the next.
tRC 110 ns for a 4Mbit DRAM with a tRAC of 60
ns
tCAC minimum time from CAS line falling to valid
data output.
15 ns for a 4Mbit DRAM with a tRAC of 60 ns
tPC minimum time from the start of one column
access to the start of the next.
35 ns for a 4Mbit DRAM with a tRAC of 60 ns

15
DRAM Read Timing

Every DRAM access begins at
The assertion of the RAS_L
2 ways to read early or late v. CAS

DRAM Read Cycle Time
CAS_L
A
Row Address
Junk
Col Address
Row Address
Junk
Col Address
WE_L
OE_L
D
High Z
Data Out
Junk
Data Out
High Z
Read Access Time
Output Enable Delay
Early Read Cycle OE_L asserted before CAS_L
Late Read Cycle OE_L asserted after CAS_L
16
DRAM Performance

A 60 ns (tRAC) DRAM can
perform a row access only every 110 ns (tRC)
perform column access (tCAC) in 15 ns, but time
between column accesses is at least 35 ns (tPC).
In practice, external address delays and turning
around buses make it 40 to 50 ns
These times do not include the time to drive the
addresses off the microprocessor nor the memory
controller overhead!
Can it be made faster?

17
Admin

Hand in homework assignment
New assignment is/will be on the class website.

18
Fast Page Mode DRAM

Page All bits on the same ROW (Spatial Locality)
Dont need to wait for wordline to recharge
Toggle CAS with new column address

19
Extended Data Out (EDO)

Overlap Data output w/ CAS toggle
Later brother Burst EDO (CAS toggle used to get
next addr)

20
Synchronous DRAM

Has a clock input.
Data output is in bursts w/ each element clocked
Flavors SDRAM, DDR

21
RAMBUS (RDRAM)

Protocol based RAM w/ narrow (16-bit) bus
High clock rate (400 Mhz), but long latency
Pipelined operation
Multiple arrays w/ data transferred on both edges
of clock

RAMBUS Bank
RDRAM Memory System
22
RDRAM Timing
23
DRAM History

DRAMs capacity 60/yr, cost 30/yr
2.5X cells/area, 1.5X die size in 3 years
98 DRAM fab line costs 2B
DRAM only density, leakage v. speed
Rely on increasing no. of computers memory per
computer (60 market)
SIMM or DIMM is replaceable unit gt computers
use any generation DRAM
Commodity, second source industry gt high
volume, low profit, conservative
Little organization innovation in 20 years
Dont want to be chip foundries (bad for RDRAM)
Order of importance 1) Cost/bit 2) Capacity
First RAMBUS 10X BW, 30 cost gt little impact

24
Main Memory Organizations

Simple
CPU, Cache, Bus, Memory same width (32 or 64
bits)
Wide
CPU/Mux 1 word Mux/Cache, Bus, Memory N words
(Alpha 64 bits 256 bits UtraSPARC 512)
Interleaved
CPU, Cache, Bus 1 word Memory N Modules(4
Modules) example is word interleaved

25
Main Memory Performance

Timing model (word size is 32 bits)
1 to send address,
6 access time, 1 to send data
Cache Block is 4 words
Simple M.P. 4 x (161) 32
Wide M.P. 1 6 1 8
Interleaved M.P. 1 6 4x1 11

26
Independent Memory Banks

Memory banks for independent accesses vs. faster
sequential accesses
Multiprocessor
I/O
CPU with Hit under n Misses, Non-blocking Cache
Superbank all memory active on one block
transfer (or Bank)
Bank portion within a superbank that is word
interleaved (or Subbank)

Superbank
Bank
Superbank Offset
Superbank Number
Bank Number
Bank Offset
27
Independent Memory Banks

How many banks?
number banks ? number clocks to access word in
bank
For sequential accesses, otherwise will return to
original bank before it has next word ready
Increasing DRAM gt fewer chips gt less banks

RIMMs can have a HOTSPOT (literally)
28
Avoiding Bank Conflicts

Lots of banks
int x256512
for (j 0 j lt 512 j j1)
for (i 0 i lt 256 i i1)
xij 2 xij
Even with 128 banks, since 512 is multiple of
128, conflict on word accesses
SW loop interchange or declaring array not power
of 2 (array padding)
HW Prime number of banks
bank number address mod number of banks
address within bank address / number of words
in bank
modulo divide per memory access with prime no.
banks?
address within bank address mod number words in
bank
bank number? easy if 2N words per bank

29
Fast Bank Number

Chinese Remainder Theorem As long as two sets of
integers ai and bi follow these rules
and that ai and aj are co-prime.If i ? j, then
the integer x has only one solution (unambiguous
mapping)
bank number b0, number of banks a0 ( 3 in
example)
address within bank b1, number of words in bank
a1 ( 8 in example)
N word address 0 to N-1, prime no. banks, words
power of 2

Seq. Interleaved Modulo
Interleaved Bank Number 0 1 2 0 1 2 Address
within Bank 0 0 1 2 0 16 8 1 3 4 5
9 1 17 2 6 7 8 18 10 2 3 9 10 11 3 19 11 4 12 13
14 12 4 20 5 15 16 17 21 13 5 6 18 19 20 6 22 14 7
21 22 23 15 7 23
30
DRAMs per PC over Time
DRAM Generation
86 89 92 96 99 02 1 Mb 4 Mb 16 Mb 64
Mb 256 Mb 1 Gb
4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB
16
4
Minimum Memory Size
31
Need for Error Correction!

Motivation
Failures/time proportional to number of bits!
As DRAM cells shrink, more vulnerable
Went through period in which failure rate was low
enough without error correction that people
didnt do correction
DRAM banks too large now
Servers always corrected memory systems
Basic idea add redundancy through parity bits
Simple but wastful version
Keep three copies of everything, vote to find
right value
200 overhead, so not good!
Common configuration Random error correction
SEC-DED (single error correct, double error
detect)
One example 64 data bits 8 parity bits (11
overhead)
Papers up on reading list from last term tell you
how to do these types of codes
Really want to handle failures of physical
components as well
Organization is multiple DRAMs/SIMM, multiple
SIMMs
Want to recover from failed DRAM and failed SIMM!
Requires more redundancy to do this
All major vendors thinking about this in high-end
machines