Title: Chapter 2: Custom Single-Purpose Processors
1Chapter 2 Custom Single-Purpose Processors
2Outline
- Introduction
- Combinational Logic
- Sequential Logic
- Custom Single-Purpose Processor Design
- RT-level Custom Single-Purpose Processor Design
- Optimizing Custom Single-Purpose Processors
- Summary
3Introduction
- Processor
- Is a digital circuit that performs a computation
tasks - Consists of controller and datapath
- General-purpose can perform variety of
computation tasks - Single-purpose perform one particular
computation task - Custom single-purpose non-standard task
- A custom single-purpose processor may be
- Fast, small, low power
- But, high NRE, longer time-to-market, less
flexible
4Combinational Logic
- Transistor
- The basic electrical component in digital systems
- Acts as an on/off switch
- Voltage at gate controls whether current flows
from source to drain - Dont confuse this gate with a logic gate
- CMOS transistor on silicon
5CMOS Transistor Implementations
- Complementary Metal Oxide Semiconductor
- We refer to logic levels
- Typically 0 is 0V, 1 is 5V
- Two basic CMOS types
- nMOS conducts if gate1
- pMOS conducts if gate0
- Hence complementary
- Basic gates built from two basic CMOS
- Inverter, NAND, NOR
6Basic Logic Gates
- Each gate is represented symbolically, with a
Boolean equation, and with a truth table.
F x y AND
F x ? y XOR
F x Driver
F x y OR
F (x y) NAND
F x Inverter
F (xy) NOR
7Basic Combinational Logic Design
- Combinational circuit
- Is a digital circuit whose output is purely a
function of its present inputs. - Has no memory of past inputs
- Simple technique to design a combinational
circuit from basic logic gates - Problem description
- Truth table
- Output equations
- Minimized output equations (by Karnaugh maps)
- Draw the circuit diagram
8Combinational Logic Design
A) Problem description y is 1 if a is to 1, or
b and c are 1. z is 1 if b or c is to 1, but not
both, or if all are 1.
9RT-Level Combinational Components
- Combinational components often called
register-transfer, or RT, level components - Multiplexor (selector)
- Decoder
- Adder
- Comparator
- Arithmetic-logic unit (ALU)
- Shifter
10Combinational Components
O I0 if S0..00 I1 if S0..01 I(m-1) if
S1..11
less 1 if AltB equal 1 if AB greater1 if
AgtB
O A op B op determined by S.
O0 1 if I0..00 O1 1 if I0..01 O(n-1) 1 if
I1..11
sum AB (first n bits) carry (n1)th
bit of AB
With enable input e ? all Os are 0 if e0
With carry-in input Ci? sum A B Ci
May have status outputs carry, zero, etc.
11Multiplexor (Selector)
- Allows only one of its data inputs to pass
through to the output - m-by-1 multiplexor m data inputs, 1 data output
- n-bit multiplexor
- Each data input as well as the output consists of
n lines - n is independent of the number of select lines
- 4-bit 81 multiplexor
- If I61110, then output would be 1110
O I0 if S0..00 I1 if S0..01
I(m-1) if S1..11
12Decoder
- Converts its binary input I into a one-hot output
O. - Log2(n)n decoder
- An extra input called enable
- When enable is 0, all outputs are 0
O0 1 if I0..00 O1 1 if I0..01
O(n-1) 1 if I1..11
With enable input e ? all Os are 0 if e0
13Adder
- Adds two n-bit binary inputs A and B, generating
an n-bit output sum along with an output carry
sum AB (first n bits) carry (n1)th bit
of AB
With carry-in input Ci? sum A B Ci
14Comparator
- Compares two n-bit binary inputs A and B,
generating outputs that indicate whether A is
less than, equal to, or greater then B
less 1 if AltB equal 1 if AB greater1 if
AgtB
15Arithmetic-logic unit (ALU)
- performs a variety of arithmetic and logic
functions on its two n-bit binary inputs A and B - Select lines S choose the current function
- Common functions addition, subtraction, AND, OR
O A op B op determined by S.
May have status outputs carry, zero, etc.
16Sequential Logic
- Sequential circuit
- Is a digital circuit whose outputs are a function
of the present as well as previous input values. - Has memory
- Basic sequential circuits flip-flop
- Stores a single bit
- D flip-flop
- SR flip-flop
- JK flip-flop
17RT-Level Sequential Components
- Register
- Shift register
- counter
18Sequential Components
Q lsb - Content shifted - I stored in msb
Q 0 if clear1, I if load1 and
clock1, Q(previous) otherwise.
Q 0 if clear1, Q(prev)1 if count1 and
clock1.
19Register
- Stores n bits from its n-bit data input I, with
those stored bits appearing at is output Q - Parallel-load register
- All n bits of the register can be stored in
parallel
Q 0 if clear1, I if load1 and
clock1, Q(previous) otherwise.
20Shift Register
- Stores n bits, but these bits cannot be stored in
parallel. Instead, they must be shifted into the
register serially, meaning one bit per clock
edge. - 1-bit data input I, with I stored in MSB, content
shifted, LSB shifted out and appearing at is
output Q
Q lsb - Content shifted - I stored in msb
21Counter
- A register than can also increment
- A common counter feature is both up and down
counting or incrementing and decrementing,
requiring an additional control input
Q 0 if clear1, Q(prev)1 if count1 and
clock1.
22Sequential Logic Design
- Problem description
- Translate to a state diagram, called a finite
state machine (FSM) - Implement FSM
- Using a register to store the current state, and
combinational logic to generate the output values
and the next state - State table
- Assign to each state a unique binary value, and
create a truth table for the combinational logic - Minimized output equations (by Karnaugh maps)
- Draw the combinational logic circuit
23Sequential Logic Design (Cont.)
A) Problem Description You want to construct a
clock divider. Slow down your pre-existing clock
so that you output a 1 for every four clock cycles
- Given this implementation model
- Sequential logic design quickly reduces to
combinational logic design
24Sequential Logic Design (cont.)
25Custom Single-Purpose Processor Design
- A basic processor consists of a controller and a
datapath - Datapath
- Stores and manipulates a systems data
- Contains register units, functional units, and
connection units like wires and multiplexors. - Controller
- Carries out such configuration of the datapath
26Custom Single-Purpose Processor Basic Model
external control inputs
external data inputs
controller
datapath
registers
datapath control inputs
next-state and control logic
controller
datapath
datapath control outputs
functional units
state register
external control outputs
external data outputs
a view inside the controller and datapath
controller and datapath
27Example Greatest Common Divisor
- Building a single-purpose processor implementing
the GCD program - First create algorithm
- Convert algorithm to complex state machine
- Divide the functionality into a datapath part and
a controller part - Construct the datapath
- Construct the controller
- Perform optimizations to datapath and controller
28Example Greatest Common Divisor (Cont.)
- First create algorithm
- Convert algorithm to complex state machine
- Known as FSMD finite-state machine with datapath
- Can use templates to perform such conversion
(c) state diagram
(b) desired functionality
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
29State Diagram Templates
Assignment statement
Loop statement
Branch statement
a b next statement
while (cond) loop-body-
statements next statement
if (c1) c1 stmts else if c2 c2
stmts else other stmts next statement
C
c1
!c1!c2
!c1c2
c2 stmts
others
c1 stmts
J
next statement
30Creating the Datapath
- Create a register for any declared variable
- Create a functional unit for each arithmetic
operation - Connect the ports, registers and functional units
- Based on reads and writes
- Use multiplexors for multiple sources
- Create unique identifier
- for each datapath component control input and
output
31Creating the Controllers FSM
- Same structure as FSMD
- Replace complex actions/conditions with datapath
configurations
32Splitting into a Controller and Datapath
go_i
Controller
!1
1
0000
1
!(!go_i)
2
0001
!go_i
2-J
0010
x_sel 0 x_ld 1
3
0011
y_sel 0 y_ld 1
4
0100
x_neq_y0
5
0101
x_neq_y1
6
0110
x_lt_y1
x_lt_y0
y_sel 1 y_ld 1
x_sel 1 x_ld 1
7
8
0111
1000
6-J
1001
5-J
1010
d_ld 1
9
1011
1-J
1100
33Controller State Table for the GCD Example
34Completing the GCD Custom Single-Purpose
Processor Design
- We finished the datapath
- We have a state table for the next state and
control logic - All thats left is combinational logic design
- This is not an optimized design, but we see the
basic steps
35RT-level Custom Single-Purpose Processor Design
- We often start with a state machine
- rather than a program
- since the cycle-by-cycle timing of a system is
central to the system, but programming languages
dont typically support cycle-by-cycle
description. - Example
- Bus bridge that converts 4-bit bus to 8-bit bus
- One device (the sender) sends an 8-bit number to
another device (the receiver) - The receiver can receive all 8 bits at once
- The sender sends 4 bits at a time it sends the
low-order 4 bits, then the high-order 4 bits. - Start with FSMD
- Known as register-transfer (RT) level
36RT-level Custom Single-Purpose Processor Design
Example
- Example
- Bus bridge that converts 4-bit bus to 8-bit bus
- Start with FSMD
- Known as register-transfer (RT) level
37RT-Level Custom Single-Purpose Processor Design
Example (Cont.)
Bridge
(a) Controller
rdy_in
rdy_out
clk
data_in(4)
data_out
data_lo
data_hi
to all registers
data_lo_ld
data_hi_ld
data_out_ld
data_out
(b) Datapath
38Optimizing Custom Single-Purpose Processors
- Optimization is the task of making design metric
values the best possible - Optimization opportunities
- original program
- FSMD
- datapath
- FSM
39Optimizing the Original Program
- Analyze program attributes and look for areas of
possible improvement - number of computations
- size of variables
- time and space complexity
- operations used
- multiplication and division very expensive
- The choice of algorithm can have perhaps the
biggest impact on the efficiency of the desired
processor.
40Optimizing the Original Program (cont.)
original program
optimized program
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
0 int x, y, r 1 while (1) 2 while
(!go_i) // x must be the larger number
3 if (x_i gt y_i) 4 xx_i 5
yy_i 6 else 7
xy_i 8 yx_i 9
while (y ! 0) 10 r x y 11
x y 12 y r 13 d_o
x
replace the subtraction operation(s) with modulo
operation in order to speed up program
GCD(42, 8) - 9 iterations to complete the loop x
and y values evaluated as follows (42, 8), (34,
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4),
(2,2).
GCD(42,8) - 3 iterations to complete the loop x
and y values evaluated as follows (42, 8),
(8,2), (2,0)
41Optimizing the FSMD
- Scheduling
- The task of assigning operations from the
original program to states in an FSMD - The scheduling obtained using the template-based
method can be improved - Areas of possible improvements
- merge states
- states with constants on transitions can be
eliminated, transition taken is already known - states with independent operations can be merged
(e.g., xx_i,yy_i) - separate states
- states which require complex operations (e.g.,
abcd) can be broken into smaller states to
reduce hardware size - A design must be aware of whether output timing
may or may not be modified
42Optimizing the FSMD (cont.)
original FSMD
optimized FSMD
eliminate state 1 transitions have constant
values
merge state 2 and state 2J no loop operation in
between them
merge state 3 and state 4 assignment operations
are independent of one another
merge state 5 and state 6 transitions from
state 6 can be done in state 5
eliminate state 5J and 6J transitions from each
state can be done from state 7 and state 8,
respectively
eliminate state 1-J transition from state 1-J
can be done directly from state 9
43Optimizing the Datapath
- Sharing of functional units
- one-to-one mapping, as done previously, is not
necessary - e.g., subtractor for x-y, substractor for y-x
- if same operation occurs in different states,
they can share a single functional unit - e.g., a single subtractor and use multiplexors to
choose whether inputs are x and y, or instead y
and x - Multi-functional units
- ALUs support a variety of operations, it can be
shared among operations occurring in different
states
44Optimizing the FSM
- Designing a sequential circuit to implement an
FSM also provides some opportunities for
optimization - State encoding
- task of assigning a unique bit pattern to each
state in an FSM - size of state register and combinational logic
vary for different encodings - can be treated as an ordering problem
- State minimization
- task of merging equivalent states into a single
state - state equivalent if for all possible input
combinations the two states generate the same
outputs and transitions to the next same state
45Summary
- Designing a custom single-purpose processors
requires understanding of various aspects of
digital design. - Design of a circuit to implement Boolean
functions - Combinational design
- Building a truth table
- Optimizing the output functions
- Draw a circuit
- Design of a circuit to implement a state diagram
- Sequential design
- Drawing an implementation model with a state
register and a combinational logic block - Binary encoding to each state
- Drawing a state table
- Repeat combinational design process for this table
46Summary (Cont.)
- Design of a single-purpose processor circuit to
implement a program - Schedule the programs statements into a complex
state diagram (FSMD) - constructs a datapath
- Create a new state diagram (FSM) that replaces
complex actions and conditions by datapath
control operations - Design a controller circuit for the new state
diagram using sequential design - Much optimization can be performed at each level
of design - CAD tools can be of great assistance