Verilog, Pipelined Processors CPSC 321 - PowerPoint PPT Presentation

About This Presentation
Title:

Verilog, Pipelined Processors CPSC 321

Description:

`define COOK 3'b011 `define EMPTY 3'b100. You can refer to these states as IDLE, ... `define EMPTY 3'b100. module cooker(clock,...); always _at_(state or start or ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 46
Provided by: faculty
Category:

less

Transcript and Presenter's Notes

Title: Verilog, Pipelined Processors CPSC 321


1
Verilog, Pipelined ProcessorsCPSC 321
  • Andreas Klappenecker

2
Todays Menu
  • Verilog
  • Pipelined Processor

3
Recall n-bit Ripple Carry Adder
  • module ripple(cin, X, Y,
  • S, cout)
  • parameter n 4
  • input cin
  • input n-10 X, Y
  • output n-10 S
  • output cout
  • reg n-10 S
  • reg n0 C
  • reg cout
  • integer k

always _at_(X or Y or cin) begin C0 cin
for(k 0 k lt n-1 kk1) begin
Sk XkYkCk Ck1 (Xk
Yk) (CkXk)(CkYk) end
cout Cn end endmodule
4
Recall versus lt
  • initial begin
  • a1 b2 c3 x4
  • 5 a bc // wait 5 units, grab b,c,
  • // compute abc23
  • d a // d 5 bc at time t5.
  • x lt 6 bc // grab bc now at t5, dont stop
  • // assign x5 at t11.
  • b lt 2 a // grab a at t5
  • //(end of last blocking
    statement).
  • // Deliver b5 at t7.
  • // previous x is unaffected by change of b.

5
Recall versus lt
  • initial begin
  • a1 b2 c3 x4
  • 5 a bc
  • d a // time t5
  • x lt 6 bc // assign x5 at time t11
  • b lt 2 a // assign b5 at time t7
  • y lt 1 b c // grab bc at t5, dont stop,
  • // assign x5 at t6.
  • 3 z b c // grab bc at t8 (53),
  • // assign z5 at t8.
  • w lt x // assign w4 at t8.
  • // ( starting at last blocking assignment)

6
Confused?
a b c // blocking assignment a lt b c
// non-blocking assignment 2 // delay
by 2 time units Block assignment with delay?
Probably wrong! Non-blocking assignment without
delay? Bad idea!
7
Address Register
  • define REG_DELAY 1
  • module add_reg(clk, reset, addr, reg_addr)
  • input clk, reset
  • input 150 addr
  • output 150 reg_addr
  • reg 150 reg_addr
  • always _at_(posedge clk)
  • if (reset)
  • reg_addr lt (REG_DELAY) 16 h00
  • else
  • reg_addr lt (REG_DELAY) address
  • endmodule

8
Concurrency Example
Block 2 stmt 1 Block 1 stmt 1 Block 1 stmt
2 Block 2 stmt 2 Block 1 stmt 3 Block 2 stmt 3
  • module concurrency_example
  • initial begin
  • 1 display(Block 1 stmt 1")
  • display(Block 1 stmt 2")
  • 2 display(Block 1 stmt 3")
  • end
  • initial begin
  • display("Block 2 stmt 1")
  • 2 display("Block 2 stmt 2")
  • 2 display("Block 2 stmt 3")
  • end
  • endmodule

9
Concurrency fork and join
  • module concurrency_example
  • initial fork
  • 1 display(Block 1 stmt 1")
  • display(Block 1 stmt 2")
  • 2 display(Block 1 stmt 3")
  • join
  • initial fork
  • display("Block 2 stmt 1")
  • 2 display("Block 2 stmt 2")
  • 2 display("Block 2 stmt 3")
  • join
  • endmodule

Block 1 stmt 2 Block 2 stmt 1 Block 1 stmt
1 Block 1 stmt 3 Block 2 stmt 2 Block 2 stmt 3
10
Begin-End vs. Fork-Join
  • In begin end blocks, the statements are
    sequential and the delays are additive
  • In fork-join bocks, the statements are
    concurrent and the delays are independent
  • The two constructs can be used to compound
    statements. Nesting begin-end statements is not
    useful neither is nesting for-join statements.

11
Displaying Results
  • a 4b0011
  • display(The value of a is b, a)
  • The value of a is 0011
  • display(The value of a is 0b, a)
  • The value of a is 11
  • If you you display to print a value that is
    changing
  • during this time step, then you might get the new
    or
  • the old value use strobe to get the new value

12
Displaying Results
  • Standard displaying functions
  • display, write, strobe, monitor
  • Writing to a file instead of stdout
  • fdisplay, fwrite, fstrobe, fmonitor
  • Format specifiers
  • b, 0b, d, 0d, h, 0h, c, s,

13
Display Example
module f1 integer f initial begin f
fopen("myFile") fdisplay(f, "Hello, bla
bla") end endmodule
14
Finite State Automata
15
Moore Machines
next state logic
present state register
output logic
input
  • The output of a Moore machine depends
  • only on the current state. Output logic and
  • next state logic are sometimes merged.

16
Mealy Machines
output logic
next state logic
present state register
input
  • The output of a Mealy machine depends on the
    current state and the input.

17
State Machine Modeling
  • reg state register, nsl next state logic, ol
    output logic
  • Model reg separate, nsl separate, ol separate
  • 3 always blocks of combinatorial logic easy to
    maintain.
  • Combine reg and nsl, keep ol separate
  • The state register and the output logic are
    strongly correlated it is usually more efficient
    to combine these two.
  • Combine nsl and ol, keep register separate
  • Messy! Dont do that!
  • Combine everything into one always block
  • Can only be used for a Moore state machine. Why?
  • Combine register and output logic into one always
    block
  • Can only be used for a Mealy state machine.

18
Example Automatic Food Cooker
19
Moore Machine Example
  • Automatic food cooker
  • Has a supply of food
  • Can load food into the heater when requested
  • Cooker unloads the food when cooking done

20
Automated Cooker
  • Outputs from the machine
  • load signal that sends food into the cooker
  • heat signal that turns on the heater
  • unload signal that removes food from cooker
  • beep signal that alerts that food is done

21
Automated Cooker
  • Inputs
  • clock
  • start start the load, cook, unload cycle
  • temp_ok temperature sensor detecting when
    preheating is done
  • done signal from timer when done
  • quiet Should cooker beep?

22
Cooker
  • module cooker(
  • clock, start, temp_ok, done, quiet, load, heat,
    unload, beep
  • )
  • input clock, start, temp_ok, done, quiet
  • output load, heat, unload, beep
  • reg load, heat, unload, beep
  • reg 20 state, next_state

23
Defining States
  • define IDLE 3'b000
  • define PREHEAT 3'b001
  • define LOAD 3'b010
  • define COOK 3'b011
  • define EMPTY 3'b100

You can refer to these states as IDLE,
PREHEAT, etc. Symbolic names are a good idea!
24
State Register Block
  • define REG_DELAY 1
  • always _at_(posedge clock)
  • state lt (REG_DELAY) next_state

25
Next State Logic
  • always _at_(state or start or temp_ok or done)
  • // whenever there is a change in input
  • begin
  • case (state)
  • IDLE if (start) next_statePREHEAT
  • PREHEAT if (temp_ok) next_state LOAD
  • LOAD next_state COOK
  • COOK if (done) next_stateEMPTY
  • EMPTY next_state IDLE
  • default next_state IDLE
  • endcase
  • end

26
Output Logic
  • always _at_(state)
  • begin
  • if(state LOAD) load 1 else load 0
  • if(state EMPTY) unload 1 else unload 0
  • if(state EMPTY quiet 0) beep 1
  • else beep 0
  • if(state PREHEAT
  • state LOAD
  • state COOK) heat 1
  • else heat 0
  • end

27
always _at_(state or start or temp_ok or
done) begin case (state) IDLE if (start)
next_statePREHEAT PREHEAT if (temp_ok)
next_state LOAD LOAD next_state
COOK COOK if (done) next_stateEMPTY
EMPTY next_state IDLE default
next_state IDLE endcase end
module cooker(clock,...)
define IDLE 3'b000 define PREHEAT
3'b001 define LOAD 3'b010 define COOK
3'b011 define EMPTY 3'b100
define REG_DELAY 1 always _at_(posedge
clock) state lt (REG_DELAY) next_state
  • always _at_(state)
  • begin
  • if(state LOAD) load 1 else load 0
  • if(state EMPTY) unload 1 else unload 0
  • if(state EMPTY quiet 0) beep 1
  • else beep 0
  • if(state PREHEAT
  • state LOAD
  • state COOK) heat 1
  • else heat 0
  • end

28
Pipelined Processor
29
Basic Idea
30
Time Required for Load Word
  • Assume that a lw instruction needs
  • 2 ns for instruction fetch
  • 1 ns for register read
  • 2 ns for ALU operation
  • 2 ns for data access
  • 1 ns for register write
  • Total time 8 ns

31
Non-Pipelined vs. Pipelined Execution
32
Question
  • What is the average speed-up for
  • pipelined versus non-pipelined execution
  • in case of load word instructions?

Average speed-up is 4-fold!
33
Reason
  • Assuming ideal conditions
  • time between instructions (pipelined)
  • time between instructions (nonpipelined)
  • number of pipe stages

34
MIPS Appreciation Day
  • All MIPS instructions have the same length
  • gt simplifies the pipeline design
  • fetch in first stage and decode in second stage
  • Compare with 80x86
  • Instructions 1 byte to 17 bytes
  • Pipelining is much more challenging

35
Obstacles to Pipelining
  • Structural Hazards
  • hardware cannot support the combination of
    instructions in the same clock cycle
  • Control Hazards
  • need to make decision based on results of one
    instruction while other is still executing
  • Data Hazards
  • instruction depends on results of instruction
    still in pipeline

36
Structural Hazards
  • Laundry examples
  • if you have a washer-dryer combination instead of
    a separate washer and dryer,
  • separate washer and dryer, but roommate is busy
    doing something else and does not put clothes
    away sic!
  • Computer architecture
  • competition in accessing hardware resources,
    e.g., access memory at the same time

37
Control Hazards
  • Control hazards arise from the need to
  • make a decision based on results of an
  • instruction in the pipeline
  • Branches What is the next instruction?
  • How can we resolve the problem?
  • Stall the pipeline until computations done
  • or predict the result
  • delayed decision

38
Stall on Branch
  • Assume that all branch computations are done in
    stage 2
  • Delay by one cycle to wait for the result

39
Branch Prediction
  • Predict branch result
  • For example, predict always that branch
  • is not taken
  • (e.g. reasonable for while instructions)
  • if choice is correct, then pipeline runs at full
    speed
  • if choice is incorrect, then pipeline stalls

40
Branch Prediction
41
Delayed Branch
42
Data Hazards
  • A data hazard results if an instruction depends
    on the result of a previous instruction
  • add s0, t0, t1
  • sub t2, s0, t3 // s0 to be determined
  • These dependencies happen often, so it is not
    possible to avoid them completely
  • Use forwarding to get missing data from internal
    resources once available

43
Forwarding
  • add s0, t0, t1
  • sub t2, s0, t3

44
Single Cycle Datapath
45
Pipelined Version
Write a Comment
User Comments (0)
About PowerShow.com