Title: Modeling CPU
1Modeling CPUs using Different MOCs a Case Study
- Trevor C. Meyerowitz
- Advisor Alberto Sangiovanni-Vincentelli
- 290n Final Presentation
- May 15 2002
2Outline
- Introduction
- Motivation
- The Simple CPU to be modeled
- The Domains Investigated
- Modeling a Non-Pipelined Processor
- Modeling a Pipelined Processor
- Demo
- Conclusions
3Motivation
- Processor Designs are becoming much larger and
more complicated - Many instructions in flight at a single time
- Strange Orderings, Speculation
- This can be very hard to verify
- We are developing a methodology to help alleviate
these problems. - Using Different Models of Computation can
Potentially Simplify the Design Task - PtolemyII Allows us to Compare a Variety of these
MOCs in a Unified Framework
4The Simple CPU
- Processor Statistics
- Small Instruction Set
- ADD, SUB, ADDI, SUBI, and BNE
- Only Integer Operations
- 128 registers, 128 entry instruction memory
- This is enough to be interesting
- Data dependencies
- Control flow
5The Domains Investigated
- Process Networks
- Untimed Model
- Kahn-Macqueen Semantics
- Infinite Queues
- Blocking Reads
- Fully Deterministic
- Schedule Independent
- Synchronous Reactive
- Untimed Model
- Instantaneous Communication and Computation
- Iterates Until a Fixed Point is Found
- Signals must be monotonic
6The Nonpipelined Processor
- Code and netlist reusued for both domains (I.e.
these are domain polymorphic actors) - Represented in PtolemyII as Fetch, Regfile,
Execute and a Delay. - Fetch only after previous instruction has
completed
7Non-Pipelined Processor Pseudocode (Fetch
Regfile)
public class Fetch public fire() pc
input_pc.get(0) ltrs, rt, rd, val, instgt
readIMEM(pc) output_inst.send(0,
inst) output_regs.send(rs, rt)
public class Reg public fire() if
(read_mode) inst input_get_op_codes()
ltrs_v, rt_vgt read_regs()
output_regs.send(0, inst)
output_regs.send(rs_v, rt_v) else
rd_v input_get_write_vals()
write_values() read_mode
!read_mode
8Non-Pipelined Processor Pseudocode (Execute)
public class Exec public fire() if
(write_modefalse) reg_vals
input_reg_vals() inst_type
read_inst() results exec_inst(inst_type,
reg_vals) else
write_values(rd, results)
write_next_pc(results) write_mode
!write_mode
9Non-Pipelined Processor Differences between
Domains
- SR required that we put the register read and
register write in different iterations as well as
split of execution and writing its results - Process networks cannot query port status
- SR requires use of prefire and postfire
conditions - We shared code between the two domains, SR
probably has more flexibility.
10Pipelined Processor
- Only required recoding of fetch behavior
- Fetch every iteration
- Only stall after branches (no branch prediction)
- No forwarding logic is required!?
- This is because two register reads cant occur
without a register write happening between them - Due to PN deterministic requirement
- Also true because of SR because of states
- Probably could structure SR to require forwarding
logic (lower level of abstraction!!)
11Pipelined Processor Fetch pseudo-code
public class Fetch public fire() if
(initial_firing
prev_inst_is_branch) pc
input_pc.get(0) ltrs, rt, rd, val,
instgt readIMEM(pc)
output_inst.send(0, inst) output_regs.send(rs
, rt) pc pc1
Causes you to stall until the branch is finished.
Immediately fires again if there is no branch!
12Pipelining and Forwarding (t0)
Program
Fetch
Reg File
Exec
Inst_2 R3 R1(?) R1(?)
Inst_1 R1 R2(4) R3(5)
Register File State R1 2 R2 4 R3 5
13Pipelining and Forwarding (t1)
Program
Fetch
Reg File
Exec
Inst_2 R3 R1(2) R1(2)
Inst_1 R1(9) R2(4) R3(5)
Register File State R1 2 R2 4 R3 5
The PN and SR models dont have this problem
because they enforce the order read inst_1,
write inst_1, read inst_2
This is an error!! It should read R1 as 9. We
can solve this by adding forwarding logic, or
stalling the pipeline
14Pipelined Processor with Branch Prediction
- Still in order, but branches are predicted
instead of stalling. - Requires recoding of Fetch and the Register File
- Fetch
- Performs branch prediction
- Handles mispredicts
- Register File
- Keeps a queue of instructions
- Stall on dependencies
- Only write resolved instructions to regfile
- This represents one refinement path
- Biased towards Process Networks
15DEMO TIME
Inst RD, RS, RT (Val)ADD 5 5 5 ADD 6 5 5 BNE 5
20 -3 ADD 7 6 6 ADD 8 7 7 ADD 9 8 8 ADD 10 9
9 SUB 11 10 50
Program Code
16Outline
- Introduction
- Modeling a Non-Pipelined Processor
- Modeling a Pipelined Processor
- Demo
- Conclusions
- Other Architectural Features
- Observations
- Future Work
17Other Architectural Features
- Out of Order Execution
- Requires breaking of PN model
- Superscalar execution
- Multiple fetches at once.. Might be problematic
to do in PN. - Memory systems
- Initially simple, more complicated when
refinements are added.
18Observations
- Process Networks are relatively easy to use and
are quite predictable. - Process Networks are great for initial abstract
models. - Synchronous Reactive is simpler than DE to work
with, but more complicated to design than PNs. - PN doesnt deal well with ordering refinements,
but SR can handle them better. - We envision a methodology where you start with a
PN model and then move to an SR model.
19Future Work
- Look at implementing other architectural features
- Examine relaxing PNs requirements
- Look at domain specific actors
- Examine composing different MOCs
- Introduce timing
20Thank You!