CEG3470 - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

CEG3470

Description:

Collection of 2M complex gates organized in regular and dense fashion. Decoder ... FB of at least 213 means that we will want to use more than log4(213) = 6.5 ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 24

Provided by: tangwa

Category:

more less

Transcript and Presenter's Notes

Title: CEG3470

1
CEG3470
Digital Circuits (Spring 2009)
Lecture 6 Memory Decoder Design
Courtesy slides from DIC 2/e and EE141 notes
from Prof. Jan Rabaey

Tang Wai Chung, Matthew

2
Why Memory?
'Penryn' 45nm die
Dual-core chips will get up to 6MB of cache,
while quad-core parts will get up to 12MB.
From http//regmedia.co.uk/2007/01/25/intel_penry
n_3.jpg
3
Semiconductor Memory Classification
4
Random Access Memory (RAM)

Static (SRAM)
Data stored as long as supply is applied
Larger (6 transistors/cell)
Fast
Differential (usually)
Dynamic (DRAM)
Periodic refresh required
Smaller (1-3 transistors/cell)
Slower
Single Ended

5
Random Access Chip Architecture

Conceptual linear array
Each box holds some data
But this does not lead to a nice layout shape
Too long and skinny
Create a 2-D array
Decode row and column address to get data

6
Basic Memory Array

Core
keep square within a 21 ratio
rows are word lines
columns are bit lines
data in and out on columns
Decoders
needed to reduce total number of pins NM
address lines for 2NM bits of storagee.g. if
NM20 ? 220 1Mb.
Multiplexing
used to select one or more columns for input or
output of data

7
Memory Architecture Decoders
M bits
S0
Word 0
Word 1
S1
Storagecell
Word 2
S2
N words
Intuitive architecture for N x M memory Too many
select signals N words N select signals
Word N-2
SN-2
Word N-1
SN-1
Input-outputM bits
8
Memory Architecture Decoders
M bits
S0
Word 0
Word 1
A0
Storagecell
Word 2
A1
Decoder
AK-1
Word N-2
Word N-1
Decoder reduces the number of select signals K
log2 N
K log2 N
Input-outputM bits
9
Row Decoders

Collection of 2M complex gates organized in
regular and dense fashion

(N)AND Decoder
NOR Decoder
10
Decoder Design Example
Look at decoder for 256 x 256 memory block
(8KBtyes)
11
Problem Setup

Goal Build fastest possible decoder with static
CMOS logic
What we know
Basically need 256 AND gates, each one of them
drives one word line

2N gates
2N address lines
N 8
12
Problem Setup (1)

Each word line has 256 cells connected to it.
Total output load is 256 x Ccell Cwire
Assume that decoder input capacitance isCaddress
4 x Ccell
Each address drives 28/2 AND gates
A0 drives half of the gates, A0 the other half
of the gates
Neglecting Cwire, the fan-out on each one of the
16 address wires is

13
Decoder Fan-out

FB of at least 213 means that we will want to use
more than log4(213) 6.5 stages to implement the
AND8
????
Need many stages anyways
So what is the best way to implement the AND
gate?
Will see next that its the one with the most
stages and least complicated gates

14
Example 8-input AND
g 10/3 1 G 10/3 P 8 1
g 2 5/3 G 10/3 P 4 2
g 4/3 5/3 4/3 1 G 80/27 P 2 2
2 1
15
8-input AND

Using 2-input NAND gates
8-input gate takes 6 stages
Total LE is (4/3)3 ? 2.4
So PE is 2.4 x 213 optimal N 7.1

16
Decoder So Far

256 8-input AND gates
Each built out of tree of NAND gates and
inverters
Issue
Every address line has to drive 128 gates (and
wire) right away
Cant build gates small enough - forces us to add
buffers just to drive address inputs

256 gates
wl254
wl255
16 address lines
17
Look Inside Each AND8 Gate
a0 a1 a2 a3 a4 a5 a6 a7
a0 a1 a2 a3 a4 a5 a6 a7
wl254
wl254
a0 a1 a2 a3 a4 a5 a6 a7
a0 a1 a2 a3 a4 a5 a6 a7
wl255
wl255
18
Predecoders

Use a single gate for each of the shared terms
e.g., from A0, A0, A1, and A1, generate four
signals A0A1, A0A1, A0A1, A0A1
In other words, we are decoding smaller groups of
address bits first
And using the predecoded outputs to do the rest
of the decoding

19
Predecoder and Decoder
a0 a1
a4 a5
a2 a3
20
Predecoder/Decoder Layout

Predecoder outputs run along height of the memory
array.
Decoder must match height of RAM cell.

SRAM Cell Array
Final Decoders
Address
21
Predecoder Options

Two options for predecoding

Option 1
Option 2
22
Predecoder Options (2)

Larger predecode usually better
More stages before the long wires
Decreases their effect on the circuit
Fewer long wires switch
Lower power
Easier to fit 2-input gate into cell pitch

23
What We Now Know

Given decoder structure, input capacitance, final
load
Can size the entire chain using LE for minimum
delay
Is this the best we can do in terms of power
too?
Not necessarily probably want to reduce sizes
?? (especially on final decoder inputs)
Is there anything else we can do to improve
energy even further?