Title: Converting Behavioral Verilog to Transistor Counts
1E-Voting Machine - Design Presentation
- Group M1
- Bohyun Jessica Kim
- Jonathan Chiang
- Chi Ho Yoon
- Donald Cober
Mon. Sept 29 System Hardware Component
Diagram Gate-level Data path Updated Transistor
Estimates Floorplan
Secure Electronic Voting Terminal
2Status Update
- Behavioral Verilog Entire System
- Gate-level Hardware Block Diagram
- Updated Transistor Count Calculations
- Initial Floorplan
- Structural Verilog Entire System
- Refined Floorplan
3constant init
Data Bus
Card Reader
0
1
Machine Init FSM
8 bit MUX
Encryption Key SRAM
Key Register
8 bit Add/Sub
Fingerprint Scanner
8-bit REG
User ID SRAM
User ID FSM
T 88
Selection Counter
T 128
0
1
0
1
0
1
8 bit MUX
8 bit MUX
8 bit MUX
Write-in SRAM
User Input
8 bit Full Adder
8 bit Full Adder
8 bit Full Adder
Selection FSM
Choice SRAM
TX_Check
XOR
COMMS Register
XOR
Confirmation FSM
Message ROM
8 bit Full Adder
Display
8-bit REG
Shift Register In
Shift Register Out
4SUPER MUX!
- SuperMux
- Our data flow consists of shuffling 8 bits of
data from a source to a destination - These sources and destination are SRAMs, User
Input, Comms, etc - Many are bidirectional
- Since only one piece of data will be sent at a
time, it makes sense to use a bus configuration
for data movement rather than a set of giant
muxes - We can gate which srcs/dests (drop points) are
connected to the bus with one level of pass logic - This way the data will only ever go through two
layers of pass logic to - Get onto the bus
- Get off of the bus
- We will still call this the SuperMux for legacy
purposes - Layout will be fun
data70
Drop point
Drop point
5Tiny Encryption Algorithm Project Specs
Original Implementation 64-bit blocks Two
32-bit inputs 128-bit key Four 32-bit keys
(K0, K1, K2, K3) Feistel Structure
Symmetric structure used in block
ciphers Magic constant 9E3779B9 (Delta)
232 / 1.6180339887 (golden ratio) 64 Feistel
rounds 32 cycles E-Voting Machine
Implementation 16-bit blocks Two 8-bit
inputs 32-bit key Four 8-bit keys 32 Feistel
rounds 16 cycles Decision Scale up 1.6
golden ratio by magnitude of 10 to 16, scale
(216) by 10 655360 and do division 655360 /
16 to get Delta. Avoids using Floating point for
key scheduler. New Delta A000, truncate
least sig bit to A000 to fit 16 bits when
decrypting, since A00 8 cycles
0x5000 Hardware 4, 5-bit Shifters 16-bit
Multipliers 16-bit Adder / Subtractor
6COMMS BLOCK Hardware Implementation 1
States inA70 inB70 sel_out sel_shift10 se
l_sum v_out70 (1) delta sum70 0
00 0 v_out0 sum70 (2)
v1 sum70 0 01 1 v_out1 (CD)
(3) v1 ltlt 4 k0 1 10 0
v_out2 (AB) (CD) (4) v1 gtgt 5 k1 1
11 0 v_out3 (AB) (CD) (EF)
(5) v0 out3 0 1 1 v_outx V0
(AB) (CD) (EF)
States (6)-(9) same as above except using k2, k3,
and flip v1, v0
Implementation goes through 9 states/clk cycles
each iteration to update output function v_outx.
Reusing of (1x) 8 bit Full adder/sub (Ripple
carry) 168 128 (2x) 21 8 bit MUX for output
pass-through 482 64 (8x) 2-input XORS 68
48 (1x) 8 bit REG 118 88 (1x) 41 8 bit
MUX for shifting selection 128 96 In
addition, logic will to iterate 8 times and be
controlled via FSM machine that uses (2x) 31 8
bit MUX for state input selection 882
128 (2x) 1 bit Counter adder for updating cycle
162 32 (2x) 1 bit REG for storing updated
cycle 112 22 Total 606 Advantages Saves
transistors and area for Comms Block Disadvantage
s Very heavy pass-logic from MUX layers and
XOR High clk frequency required since reusing
same components for calculating outx by stages.
This translates to higher power consumption since
we are trying to do more with less
hardware. Tradeoff Every 8-bit MUX uses 48
32 transistors compared to 8-bit Full Adder 168
128 transistors. However MUXES have high
pass-logic so area vs. power tradeoff is
concerned here.
31 8 bit MUX
31 8 bit MUX
inA70
inB70
sel_shift10
Logical Shifter Code
sel_sum
0
1
inA70 sel_shift10 delta
00 v1 01 v1 ltlt 4
10 v1 gtgt 5 11
8 bit MUX
00 01 10 11
T 32
41 8 bit MUX
T 64
8h00
sel_out
8 bit Full Adder/Sub
0
1
8 bit MUX
T 128
T 32
1 bit Full Adder
XOR
T 48
1-bit REG
clk
8-bit REG
clk
T 88
v_outx
sum delta v0 ((v1ltlt4)k0) (v1sum)
((v1gtgt5)k1) v1 ((v0ltlt4)k2) (v0sum)
((v0gtgt5)k3)
7COMMS BLOCK Hardware Implementation 2
sum
delta
sel_out output 0 pass sum,
V1 1 pass new sum, V0
0
1
8 bit MUX
8 bit Add/Sub
Implementation 2 does concurrent calculations for
all 3 parts of function, completes full iteration
of calculations in 2 clk cycles. Uses (1x) 8
bit Full adder/sub (Ripple carry) 168
128 (3x) 8 bit Full adder (Ripple carry) 1284
384 (4x) 21 8 bit MUX for output
pass-through 484 128 (16x) 2-input XORS
616 96 (2x) 8 bit REG 1182 176 (1x) 1
bit Counter adder for updating cycle 16 (1x) 1
bit REG for storing updated cycle 11 Total
939 In addition, logic will not need complex
FSM, just needs to do 8 iterations. Advantages L
ow pass logic, speed performance, low power, MUX
logic transistor count essentially
halved. Disadvantages More Transistor Count and
larger area. Tradeoff Larger area but low pass
logic from reduced MUX and complex FSM simplifies
design, increases speed and minimizes power.
8-bit REG
K0
V1
K1
V0
V1
V1
clk
T 88
T 128
0
1
0
1
0
1
8 bit MUX
8 bit MUX
8 bit MUX
T 32
T 32
T 32
sel_out
V130, 4b0
5b0, V175
8 bit Full Adder
8 bit Full Adder
8 bit Full Adder
T 128
T 128
T 128
XOR
1 bit Full Adder
XOR
8 bit Full Adder
1-bit REG
T 128
clk
8-bit REG
clk
T 88
sum delta v0 ((v1ltlt4)k0) (v1sum)
((v1gtgt5)k1) v1 ((v0ltlt4)k2) (v0sum)
((v0gtgt5)k3)
v_outx
8E-Voting TEA Gate Level Hardware
Full Adder
Common full adder Mirror Adder -Uses 28
transistors (including 4 transistors in
inverters) -NMOS and CMOS are completely
symmetrical logic S a ? b ?
Carryin Carryout (a ? b) Carryin (a b)
9E-Voting TEA Gate Level Hardware
Full Adder
What we decided to use in this project 1-bit
full adder -Uses pass-transistor logic for
computing XNOR -Sum-bit equals to ABC, where A
and B are 2 inputs and Cin is the Carry-in input
muxing at the bottom will sort out the Cout
bit to carry out. -Will use this adder 8 times
to compute all 8 bits of data -Uses inverters to
strengthen the signal at the end of each
XNOR -Uses only 16 transistors yet strong signal
10E-Voting TEA Gate Level Hardware
MUX
XOR
XOR -To avoid using two t-gates -Uses 6
transistors (XNOR inv)
T-gate Mux -4 transistors -very tiny hence
difficult to layout
11E-Voting TEA Gate Level Hardware
REG
TSPC Register -True single phase clock
flip-flop -Advantage of single clock
distribution, small area for clock lines, high
speed and no clock skew -We will use 8T
instead of 9T
12SRAM Gate Level Hardware
SRAM Cell -6T SRAM Cell -smaller transistor
size -lower energy dissipation -efficient layout
13SRAM Gate Level Hardware
Address Decoder -Combination of inverters and
nand gates
14SRAM Gate Level Hardware
SRAM -Input/Ouput tri-state buffers? -Need of
Sense amplifier?
15Data Bus
Machine Initialization FSM
Encryption Key SRAM (4 byte)
1bit Activate next
161bit Activate this
1bit Reactivate this
User Input
Data Bus
User ID FSM
Card Reader
Fingerprint Scanner
Display
8bit Data
1bit Activate next
171bit Activate this
1bit Reactivate this
Data Bus
Selection FSM
User Input
3bit Count
Selection Counter
8bit Data
Display
8bit Data
1bit Activate next
18TX_Check
1bit Activate this
User Input
Data Bus
Confirmation FSM
1bit TX_good
User ID SRAM (8 byte)
1bit Reset
Choice SRAM (4 byte)
1bit Reset
Write-in SRAM (64 byte)
COMMS
Message ROM
Display
8bit Data
1bit Reactivate Selection
1bit Reactivate User ID
19SUPER MUX!
The statement that we only transfer one byte of
data at a time is technically false For example
When the Message ROM is sending a message to the
COMMS The COMMS are using data from the
Encryption Key SRAM to encode the message
Encryption Key SRAM (4 byte)
COMMS
Message ROM
Data Bus
We can circumvent this by hardwiring the
Encryption Key SRAM data to the COMMs Key input
in addition to attaching it to the bus. This only
works because the Key SRAM will never be active
on the data bus while the COMMs are accessing it
20SUPER MUX!
Other hardwired Connections
Choice SRAM
TX Check
The transmission check confirms that the data
sent to the main computer and held in its
current session matches the choices stored in our
SRAM During the Confirmation FSM the SRAM data
is sent to the main computer and the main
computer echos it back. The echo is streamed
into the TX Check (as well as the display) and
the TX Check compares it (as it is streaming) to
the Choice SRAM
Write-In SRAM
User Input
21Converting Behavioral Verilog to Transistor Counts
module machine_init_fsm(clk, cardDetectSig,
commDetectSig, actNext, mux_src, mux_dest,
message, address) //Initialize initial
begin actNext 0 state 0 next_state
1'b0 end //Main FSM always _at_
begin if(!actNext) begin case
(state) s1 begin mux_src
0 mux_dest 0 //Wait for card
data if(cardDetectSig) begin //Send
card data to the Key SRAM next_address
0 next_state s2 end
end s2 begin mux_src
CARD_SRC mux_dest KEY_SRAM_DEST
//read in 4 bytes from card reader if(address
3) begin next_state s3 end
next_address address 1
end s3 begin //Send a key request to
the comms message KEY_REQUEST mux_sr
c MESSAGE_SRC mux_dest
COMMS_DEST next_state s4
end s4 begin mux_src
0 mux_dest 0 next_address
0 //Wait for data to arrive if(commD
etectSig0) begin next_state
s4 end else begin next_state
s5 end end s5
begin mux_src COMMS_SRC mux_dest
KEY_SRAM_DEST //read in 4 bytes from card
reader if(address3) begin next_state
s6 end next_address address
1 end s6
begin //proceed mux_src
9'bzzzzzzzzz mux_dest 8'bzzzzzzzz me
ssage 3'bzzz address 2'bzz next_ad
dress 2'bzz actNext 1
end endcase end else begin mux_src
9'bzzzzzzzzz mux_dest 8'bzzzzzzzz messag
e 3'bzzz address 2'bzz next_address
2'bzz end end //State Register always
_at_(posedge clk) begin state next_state addre
ss next_address end endmodule
State src dest message
1 0 0 0
2 CARD KEY 0
3 MESSAGE COMMS KEY_REQUEST
4 0 0 0
5 COMMS KEY 0
6 z NEXT z
- Machine Init FSM
- Create registers
- 6 states gt 3 D-flip-Flops
- 2bit SRAM address
- State Change Logic
- Most changes are sequentially incrementing
- Flip Flops are configured as counters
- Further Logic
- Remaining logic consists of output signals
generated mostly by state - Random logic can be approximated based on number
and configuration of outputs
5 distinct 1bit outputs Each 1-bit output derived
from a 3-bit input (state) Approx 2 / 2 input
gates for each 10 transistors tfor each distinct
output 50 transistors total for random logic
22Converting Behavioral Verilog to Transistor
Counts (cont)
Block States Address Registers Distinct Outputs Random Transistors
Machine Init FSM 6 2 bits 5 5 50 105
User ID FSM 12 3 bits 7 13 130 207
Selection FSM 7 2 bits 5 9 90 145
Confirmation FSM 9 6 bits 10 8 80 170
User Input NA 6 bits 14 20 90 244
Selection Counter NA NA 3 3 0 33
TX Compare NA 2 bit 3 1 0 33
Block Points on Bus T-gates Transistors
Data Bus MUX 13 104 208
Block Messages Inputs Gates / Bit Transistors
Message ROM 8 (1 byte) 8 7 (35 transistors) 280
Total 1425
23Converting Behavioral Verilog to Transistor
Counts (cont)
Block Bits Address transistors Transistors
Key SRAM 32 8(22)22 36 228
User ID SRAM 64 8(23)23 70 454
Choice SRAM 32 8(22)22 36 228
Write-In SRAM 512 8(26)26 524 3 596
Block Bits Transistors
COMMs ltslide 7gt ltslide 7gt 939
Shift IN 8 88
Shift Out 8 88
Input/Output MUX 8 32
Register 16 176
Total 7254
24Encryption Key SRAM
Machine Init FSM
User ID SRAM
USER ID FSM
COMMS
Choice SRAM
Selection FSM
Comm Register
Shift In
Confirmation FSM
MUX
User Input
Shift Out
Write-In SRAM
25Questions?
Thank you!