CEG3470 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

CEG3470

Description:

Collection of 2M complex gates organized in regular and dense fashion. Decoder ... FB of at least 213 means that we will want to use more than log4(213) = 6.5 ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 24
Provided by: tangwa
Category:

less

Transcript and Presenter's Notes

Title: CEG3470


1
CEG3470
Digital Circuits (Spring 2009)
Lecture 6 Memory Decoder Design
Courtesy slides from DIC 2/e and EE141 notes
from Prof. Jan Rabaey
  • Tang Wai Chung, Matthew

2
Why Memory?
'Penryn' 45nm die
Dual-core chips will get up to 6MB of cache,
while quad-core parts will get up to 12MB.
From http//regmedia.co.uk/2007/01/25/intel_penry
n_3.jpg
3
Semiconductor Memory Classification
4
Random Access Memory (RAM)
  • Static (SRAM)
  • Data stored as long as supply is applied
  • Larger (6 transistors/cell)
  • Fast
  • Differential (usually)
  • Dynamic (DRAM)
  • Periodic refresh required
  • Smaller (1-3 transistors/cell)
  • Slower
  • Single Ended

5
Random Access Chip Architecture
  • Conceptual linear array
  • Each box holds some data
  • But this does not lead to a nice layout shape
  • Too long and skinny
  • Create a 2-D array
  • Decode row and column address to get data

6
Basic Memory Array
  • Core
  • keep square within a 21 ratio
  • rows are word lines
  • columns are bit lines
  • data in and out on columns
  • Decoders
  • needed to reduce total number of pins NM
    address lines for 2NM bits of storagee.g. if
    NM20 ? 220 1Mb.
  • Multiplexing
  • used to select one or more columns for input or
    output of data

7
Memory Architecture Decoders
M bits
S0
Word 0
Word 1
S1
Storagecell
Word 2
S2
N words
Intuitive architecture for N x M memory Too many
select signals N words N select signals
Word N-2
SN-2
Word N-1
SN-1
Input-outputM bits
8
Memory Architecture Decoders
M bits
S0
Word 0
Word 1
A0
Storagecell
Word 2
A1
Decoder
AK-1
Word N-2
Word N-1
Decoder reduces the number of select signals K
log2 N
K log2 N
Input-outputM bits
9
Row Decoders
  • Collection of 2M complex gates organized in
    regular and dense fashion

(N)AND Decoder
NOR Decoder
10
Decoder Design Example
Look at decoder for 256 x 256 memory block
(8KBtyes)
11
Problem Setup
  • Goal Build fastest possible decoder with static
    CMOS logic
  • What we know
  • Basically need 256 AND gates, each one of them
    drives one word line

2N gates
2N address lines
N 8
12
Problem Setup (1)
  • Each word line has 256 cells connected to it.
  • Total output load is 256 x Ccell Cwire
  • Assume that decoder input capacitance isCaddress
    4 x Ccell
  • Each address drives 28/2 AND gates
  • A0 drives half of the gates, A0 the other half
    of the gates
  • Neglecting Cwire, the fan-out on each one of the
    16 address wires is

13
Decoder Fan-out
  • FB of at least 213 means that we will want to use
    more than log4(213) 6.5 stages to implement the
    AND8
  • ????
  • Need many stages anyways
  • So what is the best way to implement the AND
    gate?
  • Will see next that its the one with the most
    stages and least complicated gates

14
Example 8-input AND
g 10/3 1 G 10/3 P 8 1
g 2 5/3 G 10/3 P 4 2
g 4/3 5/3 4/3 1 G 80/27 P 2 2
2 1
15
8-input AND
  • Using 2-input NAND gates
  • 8-input gate takes 6 stages
  • Total LE is (4/3)3 ? 2.4
  • So PE is 2.4 x 213 optimal N 7.1

16
Decoder So Far
  • 256 8-input AND gates
  • Each built out of tree of NAND gates and
    inverters
  • Issue
  • Every address line has to drive 128 gates (and
    wire) right away
  • Cant build gates small enough - forces us to add
    buffers just to drive address inputs

256 gates
wl254
wl255
16 address lines
17
Look Inside Each AND8 Gate
a0 a1 a2 a3 a4 a5 a6 a7
a0 a1 a2 a3 a4 a5 a6 a7
wl254
wl254
a0 a1 a2 a3 a4 a5 a6 a7
a0 a1 a2 a3 a4 a5 a6 a7
wl255
wl255
18
Predecoders
  • Use a single gate for each of the shared terms
  • e.g., from A0, A0, A1, and A1, generate four
    signals A0A1, A0A1, A0A1, A0A1
  • In other words, we are decoding smaller groups of
    address bits first
  • And using the predecoded outputs to do the rest
    of the decoding

19
Predecoder and Decoder
a0 a1
a4 a5
a2 a3
20
Predecoder/Decoder Layout
  • Predecoder outputs run along height of the memory
    array.
  • Decoder must match height of RAM cell.

SRAM Cell Array
Final Decoders
Address
21
Predecoder Options
  • Two options for predecoding

Option 1
Option 2
22
Predecoder Options (2)
  • Larger predecode usually better
  • More stages before the long wires
  • Decreases their effect on the circuit
  • Fewer long wires switch
  • Lower power
  • Easier to fit 2-input gate into cell pitch

23
What We Now Know
  • Given decoder structure, input capacitance, final
    load
  • Can size the entire chain using LE for minimum
    delay
  • Is this the best we can do in terms of power
    too?
  • Not necessarily probably want to reduce sizes
    ?? (especially on final decoder inputs)
  • Is there anything else we can do to improve
    energy even further?
Write a Comment
User Comments (0)
About PowerShow.com