Title: Verilog, Pipelined Processors CPSC 321
1Verilog, Pipelined ProcessorsCPSC 321
2Todays Menu
- Verilog
- Pipelined Processor
3Recall n-bit Ripple Carry Adder
- module ripple(cin, X, Y,
- S, cout)
- parameter n 4
- input cin
- input n-10 X, Y
- output n-10 S
- output cout
- reg n-10 S
- reg n0 C
- reg cout
- integer k
-
always _at_(X or Y or cin) begin C0 cin
for(k 0 k lt n-1 kk1) begin
Sk XkYkCk Ck1 (Xk
Yk) (CkXk)(CkYk) end
cout Cn end endmodule
4Recall versus lt
- initial begin
- a1 b2 c3 x4
- 5 a bc // wait 5 units, grab b,c,
- // compute abc23
- d a // d 5 bc at time t5.
- x lt 6 bc // grab bc now at t5, dont stop
- // assign x5 at t11.
- b lt 2 a // grab a at t5
- //(end of last blocking
statement). - // Deliver b5 at t7.
- // previous x is unaffected by change of b.
5Recall versus lt
- initial begin
- a1 b2 c3 x4
- 5 a bc
- d a // time t5
- x lt 6 bc // assign x5 at time t11
- b lt 2 a // assign b5 at time t7
- y lt 1 b c // grab bc at t5, dont stop,
- // assign x5 at t6.
- 3 z b c // grab bc at t8 (53),
- // assign z5 at t8.
- w lt x // assign w4 at t8.
- // ( starting at last blocking assignment)
-
6Confused?
a b c // blocking assignment a lt b c
// non-blocking assignment 2 // delay
by 2 time units Block assignment with delay?
Probably wrong! Non-blocking assignment without
delay? Bad idea!
7Address Register
- define REG_DELAY 1
- module add_reg(clk, reset, addr, reg_addr)
- input clk, reset
- input 150 addr
- output 150 reg_addr
- reg 150 reg_addr
- always _at_(posedge clk)
- if (reset)
- reg_addr lt (REG_DELAY) 16 h00
- else
- reg_addr lt (REG_DELAY) address
- endmodule
8Concurrency Example
Block 2 stmt 1 Block 1 stmt 1 Block 1 stmt
2 Block 2 stmt 2 Block 1 stmt 3 Block 2 stmt 3
- module concurrency_example
- initial begin
- 1 display(Block 1 stmt 1")
- display(Block 1 stmt 2")
- 2 display(Block 1 stmt 3")
- end
- initial begin
- display("Block 2 stmt 1")
- 2 display("Block 2 stmt 2")
- 2 display("Block 2 stmt 3")
- end
- endmodule
9Concurrency fork and join
- module concurrency_example
- initial fork
- 1 display(Block 1 stmt 1")
- display(Block 1 stmt 2")
- 2 display(Block 1 stmt 3")
- join
- initial fork
- display("Block 2 stmt 1")
- 2 display("Block 2 stmt 2")
- 2 display("Block 2 stmt 3")
- join
- endmodule
Block 1 stmt 2 Block 2 stmt 1 Block 1 stmt
1 Block 1 stmt 3 Block 2 stmt 2 Block 2 stmt 3
10Begin-End vs. Fork-Join
- In begin end blocks, the statements are
sequential and the delays are additive - In fork-join bocks, the statements are
concurrent and the delays are independent - The two constructs can be used to compound
statements. Nesting begin-end statements is not
useful neither is nesting for-join statements.
11Displaying Results
- a 4b0011
- display(The value of a is b, a)
- The value of a is 0011
- display(The value of a is 0b, a)
- The value of a is 11
- If you you display to print a value that is
changing - during this time step, then you might get the new
or - the old value use strobe to get the new value
-
12Displaying Results
- Standard displaying functions
- display, write, strobe, monitor
- Writing to a file instead of stdout
- fdisplay, fwrite, fstrobe, fmonitor
- Format specifiers
- b, 0b, d, 0d, h, 0h, c, s,
13Display Example
module f1 integer f initial begin f
fopen("myFile") fdisplay(f, "Hello, bla
bla") end endmodule
14Finite State Automata
15Moore Machines
next state logic
present state register
output logic
input
- The output of a Moore machine depends
- only on the current state. Output logic and
- next state logic are sometimes merged.
16Mealy Machines
output logic
next state logic
present state register
input
- The output of a Mealy machine depends on the
current state and the input.
17State Machine Modeling
- reg state register, nsl next state logic, ol
output logic - Model reg separate, nsl separate, ol separate
- 3 always blocks of combinatorial logic easy to
maintain. - Combine reg and nsl, keep ol separate
- The state register and the output logic are
strongly correlated it is usually more efficient
to combine these two. - Combine nsl and ol, keep register separate
- Messy! Dont do that!
- Combine everything into one always block
- Can only be used for a Moore state machine. Why?
- Combine register and output logic into one always
block - Can only be used for a Mealy state machine.
18Example Automatic Food Cooker
19Moore Machine Example
- Automatic food cooker
- Has a supply of food
- Can load food into the heater when requested
- Cooker unloads the food when cooking done
20Automated Cooker
- Outputs from the machine
- load signal that sends food into the cooker
- heat signal that turns on the heater
- unload signal that removes food from cooker
- beep signal that alerts that food is done
21Automated Cooker
- Inputs
- clock
- start start the load, cook, unload cycle
- temp_ok temperature sensor detecting when
preheating is done - done signal from timer when done
- quiet Should cooker beep?
22Cooker
- module cooker(
- clock, start, temp_ok, done, quiet, load, heat,
unload, beep - )
- input clock, start, temp_ok, done, quiet
- output load, heat, unload, beep
- reg load, heat, unload, beep
- reg 20 state, next_state
23Defining States
- define IDLE 3'b000
- define PREHEAT 3'b001
- define LOAD 3'b010
- define COOK 3'b011
- define EMPTY 3'b100
You can refer to these states as IDLE,
PREHEAT, etc. Symbolic names are a good idea!
24State Register Block
- define REG_DELAY 1
- always _at_(posedge clock)
- state lt (REG_DELAY) next_state
25Next State Logic
- always _at_(state or start or temp_ok or done)
- // whenever there is a change in input
- begin
- case (state)
- IDLE if (start) next_statePREHEAT
- PREHEAT if (temp_ok) next_state LOAD
- LOAD next_state COOK
- COOK if (done) next_stateEMPTY
- EMPTY next_state IDLE
- default next_state IDLE
- endcase
- end
26Output Logic
- always _at_(state)
- begin
- if(state LOAD) load 1 else load 0
- if(state EMPTY) unload 1 else unload 0
- if(state EMPTY quiet 0) beep 1
- else beep 0
- if(state PREHEAT
- state LOAD
- state COOK) heat 1
- else heat 0
- end
27always _at_(state or start or temp_ok or
done) begin case (state) IDLE if (start)
next_statePREHEAT PREHEAT if (temp_ok)
next_state LOAD LOAD next_state
COOK COOK if (done) next_stateEMPTY
EMPTY next_state IDLE default
next_state IDLE endcase end
module cooker(clock,...)
define IDLE 3'b000 define PREHEAT
3'b001 define LOAD 3'b010 define COOK
3'b011 define EMPTY 3'b100
define REG_DELAY 1 always _at_(posedge
clock) state lt (REG_DELAY) next_state
- always _at_(state)
- begin
- if(state LOAD) load 1 else load 0
- if(state EMPTY) unload 1 else unload 0
- if(state EMPTY quiet 0) beep 1
- else beep 0
- if(state PREHEAT
- state LOAD
- state COOK) heat 1
- else heat 0
- end
28Pipelined Processor
29Basic Idea
30Time Required for Load Word
- Assume that a lw instruction needs
- 2 ns for instruction fetch
- 1 ns for register read
- 2 ns for ALU operation
- 2 ns for data access
- 1 ns for register write
- Total time 8 ns
31Non-Pipelined vs. Pipelined Execution
32Question
- What is the average speed-up for
- pipelined versus non-pipelined execution
- in case of load word instructions?
Average speed-up is 4-fold!
33Reason
- Assuming ideal conditions
- time between instructions (pipelined)
- time between instructions (nonpipelined)
- number of pipe stages
34MIPS Appreciation Day
- All MIPS instructions have the same length
- gt simplifies the pipeline design
- fetch in first stage and decode in second stage
- Compare with 80x86
- Instructions 1 byte to 17 bytes
- Pipelining is much more challenging
35Obstacles to Pipelining
- Structural Hazards
- hardware cannot support the combination of
instructions in the same clock cycle - Control Hazards
- need to make decision based on results of one
instruction while other is still executing - Data Hazards
- instruction depends on results of instruction
still in pipeline
36Structural Hazards
- Laundry examples
- if you have a washer-dryer combination instead of
a separate washer and dryer, - separate washer and dryer, but roommate is busy
doing something else and does not put clothes
away sic! - Computer architecture
- competition in accessing hardware resources,
e.g., access memory at the same time
37Control Hazards
- Control hazards arise from the need to
- make a decision based on results of an
- instruction in the pipeline
- Branches What is the next instruction?
- How can we resolve the problem?
- Stall the pipeline until computations done
- or predict the result
- delayed decision
38Stall on Branch
- Assume that all branch computations are done in
stage 2 - Delay by one cycle to wait for the result
39Branch Prediction
- Predict branch result
- For example, predict always that branch
- is not taken
- (e.g. reasonable for while instructions)
- if choice is correct, then pipeline runs at full
speed - if choice is incorrect, then pipeline stalls
40Branch Prediction
41Delayed Branch
42Data Hazards
- A data hazard results if an instruction depends
on the result of a previous instruction - add s0, t0, t1
- sub t2, s0, t3 // s0 to be determined
- These dependencies happen often, so it is not
possible to avoid them completely - Use forwarding to get missing data from internal
resources once available
43Forwarding
- add s0, t0, t1
- sub t2, s0, t3
44Single Cycle Datapath
45Pipelined Version