Title: CS 2200 Lecture 7 Datapaths Control Logic, SingleMulticycle
1CS 2200 Lecture 7Datapaths Control Logic,
Single/Multi-cycle
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)
2MIPS dataflow
3The organization of a computer
- Von Neumann Model
- Stored-program machine instructions are
represented as numbers - Programs can be stored in memory to be
read/written just like numbers.
Compiler
Control
Input
Memory
Datapath
Output
Processor
4Functions of Each Component
- Datapath performs data manipulation operations
- arithmetic logic unit (ALU)
- floating point unit (FPU)
- Control directs operation of other components
- finite state machines
- micro-programming
- Memory stores instructions and data
- random access v.s. sequential access
- volatile v.s. non-volatile
- RAMs (SRAM, DRAM), ROMs (PROM, EEPROM), disk
- tradeoff between speed and cost/bit
- Input/Output and I/O devices interface to the
environment - mouse, keyboard, display, device drivers
5The Performance Perspective
- Performance of a machine determined by
- Instruction count, clock cycles per instruction,
clock cycle time - (Last time 210 ns vs. 1100 ns)
- Processor design (datapath and control)
determines - Clock cycles per instruction
- Clock cycle time
- We will discuss two implementations.
- Single-Cycle Implementation (a bx cx2
example) - Advantage One clock cycle per instruction
- Disadvantage Less flexible
- Multiple-Cycle Implementation (bus based)
- Advantage Shorter clock cycle times, different
number of cycles for different instructions,
functional unit sharing,
6Review of MIPS Instruction Formats
- All MIPS instructions are 32 bits (4 bytes) long.
- R-type
- I-Type
- J-type
7The MIPS Subset
- Consider a subset of instructions
- memory-reference lw, sw
- arithmetic-logical add, sub, and, or, slt
- branching beq, j
- Organizational overview
- fetch an instruction based on the content of PC
- decode the instruction
- fetch operands
- (read one or two registers)
- execute
- (effective address calculation/arithmetic-logical
operations/comparison) - store result
- (write to memory / write to register / update PC)
At simplest level, this is how Von Neumann, RISC
model works
8Implementation Overview
simplest view of Von Neumann, RISC mP
- Abstract / Simplified View
- 2 types of signals data and control
- Clocking strategy All storage elements clocked
by same - clock edge.
Data
Address
PC
Ra
Instruction
Address
Rb
A
L
U
Instruction Memory
Register File
Rw
Data Memory
Data
9Single Cycle Implementation
- Each instruction takes one cycle to complete.
- We wait for everything to settle down, and the
right thing to be done - ALU might not produce right answer right away
- Write signals along with clock tell when to write
- Cycle time determined by length of longest path
referring to 2 slides ago, what instruction
takes the longest?
10Instruction Fetch Unit
- Fetch the instruction memPC ,
- Update the program counter
- sequential code PC lt- PC4
- branch and jump PC lt- something else
PC
Next Addr Logic
Address
Instruction Word 32
Instruction Memory
11R-Type Instructions
- Instruction format
- RTL
- Instruction fetch memPC
- ALU operation regrd lt- regrs op regrt
- Go to next instruction Pc lt- PC 4
- Ra, Rb and Rw are from instructions rs, rt, rd
fields. - Actual ALU operation and register write should
occur after decoding the instruction.
12Datapath for R-Type Instructions
ALUctr
RegWr
5
Ra
32 32-bit Registers
rs
BusA 32
5
Rb
rt
ALU
5
Rw
rd
BusB 32
BusW 32
- Register timing
- Register can always be read.
- Register write only happens when RegWr is set to
high and at the falling edge of the clock
(note, unlike LC2200, multiple read ports here)
13I-Type Arithmetic/Logic Instructions
- Instruction format
- RTL for arithmetic operations e.g., ADDI
- Instruction fetch memPC
- Add operation regrt lt- regrs
SignExt(imm16) - Go to next instruction Pc lt- PC 4
- Also, immediate instructions
14Datapath for I-Type A/L Instructions
note that we reuse ALU
ALUctr
RegWr
5
Ra
32 32-bit Registers
rs
BusA 32
5
Rb
rt
ALU
Rw
BusB 32
5
32
BusW 32
RegDst
Extender
ALUSrc
16
must zero out 1st 16 bits
rd
rt
imm16
In MIPS, destination registers are in
different places in opcode ? therefore we need a
mux
BusW 32
15I-Type Load/Store Instructions
- Instruction format
- RTL for load/store operations e.g., LW
- Instruction fetch memPC
- Compute memory address Addr lt- regrs
SignExt(imm16) - Load data into register regrt lt- memAddr
- Go to next instruction Pc lt- PC 4
- How about store?
same thing, just skip 3rd step (memaddr ?
regrs)
16Datapath for Load/Store Instructions
need a control signal
address input
32 bits of data
17I-Type Branch Instructions
- Instruction format
- RTL for branch operations e.g., BEQ
- Instruction fetch memPC
- Compute conditon Cond lt- regrs - regrt
- Calculate the next instructions address
- if (Cond eq 0) then
- PC lt- PC 4 (SignExd(imm16) x 4)
- else ?
18Datapath for Branch Instructions
PC
Next Addr Logic
To Instruction Mem
RegWr
ALUctr
5
Ra
32 32-bit Registers
rs
BusA 32
5
Rb
rt
ALU
Rw
BusB 32
5
MUX
well define this next (will need PC, zero
test condition from ALU)
32
Zero
MUX
ALUSrc
RegDst
Extender
16
rt
rd
imm16
19Next Address Logic
contains PC 4
(why 30? subtlety see Chapter 5 in your text)
1
PC
CarryIn
30
ADD
Instruction Memory
30
May not want to change PC if BEQ condition not
met (implicitly says this stuff happens anyway
so we have to be sure we dont change things
we dont want to change)
0
MUX
30
SignExt
if branch instruction AND 0, can
automatically generate control signal
16
Zero
Branch
imm16
When does the correct new PC become available?
Can we do better?
20J-Type Jump Instructions
- Instruction format
- RTL operations e.g., BEQ
- Instruction fetch memPC
- Set up PC PC lt- ((PC 4)lt3129gt
CONCAT(targetlt250gt) x 4
21Instruction Fetch Unit
(why PClt3128gt subtlety see Page 383 in your
text)
PClt3128gt
Instructionlt250gt
1
PC
CarryIn
Jump
30
ADD
30
0
30
Instruction Memory
SignExt
16
Branch
Zero
imm16
22A Single Cycle Datapath
P
C
S
r
c
A
d
d
4
t
2
ALUctr
3
i
M
e
m
W
r
i
t
e
A
L
U
S
r
c
M
e
m
t
o
R
e
g
i
Z
e
r
o
A
L
U
A
L
U
R
e
a
d
A
d
d
r
e
s
s
r
e
s
u
l
t
M
d
a
t
a
M
u
u
x
D
a
t
a
x
m
e
m
o
r
y
W
r
i
t
e
R
e
g
W
r
i
t
e
d
a
t
a
S
i
g
n
M
e
m
R
e
a
d
e
x
t
e
n
d
Add Jump.
23Control logic for a single cycle machine
24Recall Implementation Overview
simplest view of Von Neumann, RISC mP
- Abstract / Simplified View
- Two types of signals data and control
- clocking strategy
- All storage elements are clocked by the same
clock edge.
Data
Address
PC
Ra
Instruction
Address
Rb
A
L
U
Instruction Memory
Register File
Rw
Data Memory
Data
25The HW needed, plus control
Single cycle MIPS machine
When we talk about control, we talk about these
blocks
26Implementing Control
- Implementation Steps Review
- Identify control inputs and control outputs
- Make a control signal table for each cycle
- Derive control logic from the control table
- As youve seen (and as well review), this logic
can take on many forms combinational logic,
ROMs, microcode, or combinations
I promise. This is not a hard thing to do. Dont
be intimated by complex datapath.
27Single Cycle Control Input/Output
- Control Inputs
- Opcode (6 bits)
- How about R-type instructions?
- Control Outputs
- RegDst
- ALUSrc
- MemtoReg
- RegWrite
- MemRead
- MemWrite
- Branch
- Jump
- ALUctr
Step 2 Make a control signal table for each cycle
28Control Signal Table
(inputs)
R-type
(outputs)
29The HW needed, plus control
Single cycle MIPS machine
30Main control, ALU control
Func
ALUctr
OP
ALU Control
Main Control
6
ALUOp
3
6
2
(opcode)
ALU
Other cnt. signals
- Use OP field to generate ALUOp (encoding)
- Control signal fed to ALU control block
- Use Func field and ALUOp to generate ALUctr
(decoding) - Specifically sets 3 ALU control signals
- B-Invert, Carry-in, operation
31Main control, ALU control
Or in other words 00 ALU performs add 01 ALU
performs sub 10 ALU does what function code
says (see p. 284 for more)
32Generating ALUctr
and - 00
or - 01
mux
adder - 10
ALUctrlt2gt B-negate (C-in B-invert) ALUctrlt1gt
Select ALU Output ALUctrlt0gt Select ALU Output
Invert B and C-in must be a 1 for subtract
less - 11
33The Logic
This table is used to generate the actual Boolean
logic gates that produce ALUctr.
Could generate gates by hand, often done w/SW.
(ALUOp)
ALUOp0
X/1
ALUctrlt2gt
ALUOp1
1/0
0/X
1/1
F3
1/0
ALUctr
(funclt50gt)
110/110
ALUctrlt1gt
F2
0/X
1/1
Ex ALUctrlt2gt (SUB/BEQ)
ALUctrlt0gt
F1
1/X
0/0
0/0
F0
0/X
0/X
34Recall
Single cycle MIPS machine
Recall, for MIPS, we have to build a Main Control
Block and an ALU Control Block
35Well, heres what we did
Single cycle MIPS machine
We came up with the information to generate this
logic which would fit here in the datapath.
36Single cycle versus multi-cycle
37Single Cycle Implementation
- Calculate cycle time assuming negligible delays
except - memory (2ns), ALU and adders (2ns), register file
access (1ns)
38Single-Cycle Implementation (Contd)
- Single-cycle, fixed-length clock
- CPI 1
- Clock cycle propagation delay of the longest
datapath operations among all instruction types - Easy to implement
- Single-cycle, variable-length clock
- CPI 1
- Clock cycle ? ((type-i instructions)
propagation delay of the type i instruction
datapath operations) - Better than the previous, but impractical to
implement - Disadvantages
- What if we have floating-point operations?
- How about component usage?
39Multiple Cycle Alternative
- Break an instruction into smaller steps
- Execute each step in one cycle.
- Execution sequence
- Balance amount of work to be done
- Restrict each cycle to use only one major
functional unit - At the end of a cycle
- Store values for use in later cycles, why?
- Introduce additional internal registers
- The advantages
- Cycle time much shorter
- Diff. inst. take different of cycles to
complete - Functional unit used more than once per
instruction
40Multiple-Cycle Implementation
- Datapath
- Component sharing ALU, Instruction/Data memory
- ALU used to compute address, increment PC
- Memory used for instruction AND data
- Additional elements MUXs, Instr Register,
Target Register - If a value needs to be alive during multiple
cycles, it should stay unchanged during the whole
time. - Control
- Needed for each datapath element during each
clock cycle.
41Five Step Execution
- 1. Instruction Fetch (Ifetch)
- Fetch instruction at address (PC)
- Store instruction in register IR
- Increment PC
- 2. Instruction Decode and Register Fetch
(Decode) - Decode instruction format, read register
- Store register contents in registers A and B
- Compute new PC address, store it in ALUOut
- 3. Execution, Memory Address Computation, or
Branch Completion (Execute) - Compute memory address (for LW and SW), or
- Perform R-type operation (for R-type
instruction), or - Update PC (for Branch and Jump)
- Store memory address or register operation result
in ALUOut
42Five Step Execution (contd)
- 4. Memory Access or R-type instruction completion
(MemRead/RegWrite) - Read memory at address ALUOut, store it in MDR
- Write ALUOut content into register file, or
- Read memory at address ALUOut, store it in B
- 5. Write-back step (WrBack)
- Write the memory content read into register file
- Number of cycles for an instruction
- R-type
- lw
- sw
- Branch or Jump
An exercise for the user
43Some Simple Questions
- How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3, 4(t3) beq
t2, t3, Label assume branch not taken add
t5, t2, t3 sw t5, 8(t3)Label ... - What is going on during the 8th cycle of
execution? - In what cycle does the actual addition of t2 and
t3 takes place?
1 5 10
15 20
44Transition slide5 steps in detail
45Step 1 Instruction Fetch
- Use PC to get instruction, put it in IR.
- Increment PC by 4, put the result back in PC.
- Can you write this using the RTL notation?
- IR lt- MemoryPC , PC lt- PC 4What is the
advantage of updating the PC now?
46Step 2 I-Decode and Register Fetch
- Read registers rs and rt in case we need them
- Compute branch address in case instruction is
branch - RTL A lt- RegIR25-21
- B lt- RegIR20-16
- ALUOut lt- PC (sign-extend(IR15-0) ltlt2)
- Did we set any control lines based on the
instruction type? (we are busy "decoding" it in
our control logic)
Means in parallel
47Step 3 (Instruction dependent)
- ALU is performing 1 of 3 functions, based on
instruction type - Memory Reference ALUOut lt- A
sign-extend(IR15-0) - R-type ALUOut lt- A op B
- Branch if (AB) then (PC lt- ALUOut)
48Step 4 (R-type or memory-access)
- Loads and stores access memory MDR lt-
MemoryALUOut or MemoryALUOut lt- B - R-type instructions finish RegIR15-11 lt-
ALUOutWhen does the write actually take
place? - -at the end of the cycle on the edge.
49Step 5 Write-Back
- RegIR20-16lt- MDR
- What about all the other instructions?
50Single cycle
51Multi-cycle
(Now, critical path dependent on longest
delay for string of components used in 1 of 5
steps)
- Where do we need to insert muxs?
- Other functional units?
52Execution Sequence Summary
IR ? MemoryPC
PC ? PC 4
A ? RegIR(2521)
B ? RegIR(2016)
ALUOut ? PC SignEx(IR(150) ltlt 2)
53Multiple Cycle Design
- Break up instructions into steps, each step takes
1 cycle - balance work to be done
- restrict each cycle to use only 1 major
functional unit - At the end of a cycle
- store values for use in later cycles (easiest
thing to do) - introduce additional internal registers
54Control Signals
New
Old
- PC PCWrite, PCWriteCond, PCSource
- Memory IorD, MemRead, MemWrite
- IR IRWrite
- Reg. File RegWrite, MemtoReg, RegDst
- ALU ALUSrcA, ALUSrcB, ALUOp, ALUCnt.
RegDst, MemToReg, RegWrite, MemRead, MemWrite,
Branch, ALUSrc, ALUOp, ALUCnt.
55Implementing the Control
- Value of control signals is dependent upon
- what instruction is being executed
- which step is being performed
- Use accumulated information to specify a finite
state machine - use a state diagram, or
- use microprogramming
- Implementation can be derived from specification
56Graphical Specification of FSM
t
Instruction Fetch
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
01 ALUOp 00 PCWrite PCSource 00
Instruction decode/ Register fetch
1
0
ALUSrcA 0 ALUSrcB 11 ALUOp 00
start
8
9
Branch Completion
Memory address computation
Jump Completion
2
6
Execution
ALUSrcA 1 ALUSrcB 00 ALUOp
01 PCWriteCond PCSource 01
ALUSrcA 1 ALUSrcB 10 ALUOp 00
ALUSrcA 1 ALUSrcB 00 ALUOp 10
PCWrite PCSource 10
Memory access
5
Memory access
RegDst 1 RegWrite MemToReg 0
MemRead IorD 1
MemRead IorD 1
3
Tells us what values are needed and during what
step
R-type completion
7
RegDst 0 RegWrite MemToReg 1
4
Memory read completion
57Finite State Machine for Control
Control logic is inside this box (could be
implemented in many different ways)
The outputs that we want now also dependent
on the current state.
could be ROM, logic, etc.
Inputs (which now also include the previous state)
(Still might need ALU control logic and hence
function code developed earlier)
58Microprogramming
- For our example, state diagrams, combinational
logic more than adequate - But were dealing with small subset of MIPS
processor - Full MIPS instruction set has over 100
instructions - In 1 implementation instructions take from 1 to
20 clock cycles - Control would be much more complex for this case
- Another alternative microcoding
- Think of control signals that must be asserted in
a state as an instruction to be executed by
datapath - Call these micro instructions
59Micro-instructions
- microinstruction
- Set of datapath control signals that must be
asserted in given state - Executing has affect of asserting control signals
specified by the instruction - How do we sequence?
- In some cases, fetch next instruction
- Next instruction just depends on state
- In others, consider inputs
- i.e. next instruction depends on state input
- Like assembly language, must branch explicitly
- microprogramming
- Designing control as a program that implements
machine instructions in simpler terms
60Microprogramming guidelines
- Make each field of microinstruction responsible
for specifying a non-overlapping set of control
signals - Signals never asserted simultaneously may share
same field - Have signals that a.) control datapath elements
b.) field that handles sequencing - (i.e. selecting the next instruction)
- Microinstructions usually in a ROM or PLA
- Therefore can assign addresses
- Like choosing s for FSM elements
61Example fields
62Choosing the next instruction
- How to we choose whats next?
- Increment the address of current microinstruction
to obtain the next - Put Seq in the sequencing field
- (Most common case, usually default)
- Branch to next microinstruction
- Place Fetch in the sequencing field
- Choose next microinstruction based on control
unit inputs - This is called a dispatch
- Usually implemented by creating a table
containing addresses of target microinstructions - (May be implemented in a ROM)
63Dispatch tables
- Often, (and realistically), there is more than 1
- Example state diagram constructed earlier
- We would need 2 dispatch tables here
- 1 to dispatch from state 1
- 1 to dispatch from state 2
- Indicate next microinstruction should be chosen
by a dispatch operation by placing dispatch i
in the sequencing field - (i is table )
64Recall
t
Instruction Fetch
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
01 ALUOp 00 PCWrite PCSource 00
Instruction decode/ Register fetch
1
0
ALUSrcA 0 ALUSrcB 11 ALUOp 00
start
8
9
Branch Completion
Memory address computation
Jump Completion
2
6
Execution
ALUSrcA 1 ALUSrcB 00 ALUOp
01 PCWriteCond PCSource 01
ALUSrcA 1 ALUSrcB 10 ALUOp 00
ALUSrcA 1 ALUSrcB 00 ALUOp 10
PCWrite PCSource 10
Memory access
5
Memory access
RegDst 1 RegWrite MemToReg 0
MemRead IorD 1
MemRead IorD 1
3
Tells us what values are needed and during what
step
R-type completion
7
RegDst 0 RegWrite MemToReg 1
4
Memory read completion
65Possible Values
66Creating the microprogram
- In microprogram, 2 situations where we could
leave a field of microinstruction blank - When field that controls a functional unit or
that causes state to be written (i.e. Memory
field, ALU dest field) is blank, no control
signals should be asserted - When a field only specifies control of a
multiplexor that determines input to a functional
unit, (i.e. SRC1), leaving it blank means that we
do not care about input to functional unit (or
output of multiplexor)
67Example
- 1st component of every instruction execution is
to fetch instructions, decode them, and compute
the sequential and branch target PC - Correspond directly to 1st 2 steps of execution
described (see p.385-388) - 2 microinstructions needed for 1st two steps are
below
68Example
- To understand each microinstruction, look at the
effect of a group of fields - In 1st microinstructions, fields asserted and
their effects are
Label field containing label Fetch, will be used
in Sequencing field when microprogram wants to
start execution of next instruction.
69The entire microprogram
70Control Example
- Can you generate the control signal table?
- How about micro-programmed implementation?
i
l
71Sample Microinstruction
- Ifetch IR lt- MemPC PC lt- PC4
Microinstruction 1d011ddd000100d11
72A few words on MIPS exceptions
73What is an exception?
- Exception
- An event other than a branch or a jump that
changes the normal flow of an instruction
execution - Often called an interrupt as well
- Examples
74Processing exceptions
- For OS to process exception, it must know why it
was caused, which instruction cause it - (i.e. arithmetic exception, invalid instruction)
- One method
- (used in MIPS)
- Have a status register called Cause Register
- Holds a field that indicates reason for exception
- Another method
- Vectored interrupts
- Address to which control is transferred
determined by cause of exception - OS knows reason for the exception by address at
which its initiated
75Need more HW
- To process exceptions we need more HW
- EPC
- A 32-bit register that holds address of affected
instruction - (Needed even with vectored interrupts)
- Cause
- Register used to record cause of exception
- In MIPS, 32 bits
- Well also need 2 more control signals
- EPCWrite and CauseWrite
76Finally, augmenting our FSM
t
Instruction Fetch
MemRead ALUSrcA 0 IorD 0 IRWrite ALUSrcB
01 ALUOp 00 PCWrite PCSource 00
Instruction decode/ Register fetch
1
0
ALUSrcA 0 ALUSrcB 11 ALUOp 00
start
8
9
Branch Completion
Jump Completion
Memory address computation
2
6
Execution
ALUSrcA 1 ALUSrcB 00 ALUOp
01 PCWriteCond PCSource 01
PCWrite PCSource 10
ALUSrcA 1 ALUSrcB 10 ALUOp 00
ALUSrcA 1 ALUSrcB 00 ALUOp 10
10
Op other
Memory access
5
IntCause 1 CauseWrite ALUSrcA 0 ALUSrcB
01 ALUOp 01 EPCWrite PCWrite PCSource 11
IntCause 0 CauseWrite ALUSrcA 0 ALUSrcB
01 ALUOp 01 EPCWrite PCWrite PCSource 11
Memory access
11
RegDst 1 RegWrite MemToReg 0
Overflow
MemRead IorD 1
MemRead IorD 1
3
R-type completion
7
RegDst 0 RegWrite MemToReg 1
4
Memory read completion
77CS 2200 Lecture 7Interrupts, Memory-Mapped I/O
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)
78Interrupts
- Whats an interrupt?
- 1 Idea an unsolicited procedure call.
- Actual procedure called an exception/trap/interrup
t handler - Why do we need them?
- Or put another way, what would we have to do if
we didnt have them?
(Example constantly or periodically check
I/0, peripheral devices, etc.)
79Interrupts
- How can interrupts be generated?
?
80Interrupts
- Different Types (2200 Definitions)
- Exception - Associated with certain instruction
- Overflow
- Illegal Instruction
- Traps System calls
- Interrupt - Asynchronous event not associated
with a certain instruction (e.g. I/O device).
81Interrupts/Exceptions/Traps
82Interrupts
- Hardware
- System bus contains 1 or more interrupt lines.
- Need to know who
- might put device type code on data lines
- might put address of table entry
- might put address of handling routine
- May have priority scheme
- What would priority be based on?
- How would it work?
- What has to happen?
i.e. what do we do, consider if interrupt is
caused by HW?
83Interrupts
- Hardware (Continued)
- Save current PC on stack
- Why the stack?
- Other possibilities?
- Go somewhere to handle interrupt
- Check each device
- Must be quick
- Interrupt vector table
- Located in low memory
- Table of pointers
(interrupt might tell CPU to go to this table
specific location is pointer to routine to
handle analogous to assembly code)
84Interrupts
- Hardware (Continued)
- What if we get interrupted in while handling
interrupt? - What do we do when handling interrupt is
complete? - Special Instruction RETI
- Can a user disable interrupts?
- followed by
- while(1)
85Interrupts
- Software
- System call (Monitor call)
- Why do we need such a construct?
- Concept of Mode
- Mode bit
- User mode
- Can execute limited instruction set
- Supervisor or Kernel or Monitor Mode
- Used by OS
- Can execute all instructions
- Switch to user mode before returning to user.
86Interrupts
- Interrupt handler code
- Like a function
- Pointed to by vector table or address supplied by
device - Must save state of interrupted process
(very much like a procedure call)
87Today Interrupts
- A. Running example an I/O device
- e.g., network interface
- B. Interrupt mechanics Hardware
- C. Interrupt mechanics Software (handlers)
- D. Aside CPU load of interrupts
- E. Generalizing interrupts/exceptions/traps
- and connect back to protection
88A. Running Example
- I/O Device a network interface
89Network Interface?(NI)
?
90Crude Network Interfaceinput-only
- 1. Network sends us messages need some state to
store those messages - 2. Need to know that messages have arrived
- 3. Need some scheme to be sure we read a message
before the network overwrites it.
91Crude Network Interface
1. data area
DAV bit (Data AVailable bit) 2. set by
network 3. reset by software
92How to connect it?make it look like another
memory unit
could use combinational logic in control to
help check/process
93Memory-Mapped I/O
- NI is a 17-word block mapped to 0xF0000000
- Existing 1024-word memory at 0x00000000
- How do you wire up two memory units?
- hardware question
- How do you read messages from the NI?
- software question
LC-2200 address space
0xFFFFFFFF 0xF0000000 0x000003FF 0x000
00000
94Memory-Mapped Devices
- Network, disk, display, sound, keyboard, mouse
- Add data/control registers of each to addr. space
- And continuously check for input??
95B. Interrupt MechanicsHardware
96Interrupts
97Interrupts
Address Bus
Processor
Data Bus
Int
Device 1
Device 2
Add an interrupt request line. A device wishing
to interrupt asserts this line
98Interrupts
Address Bus
Processor
Data Bus
Int
Device 1
Device 2
The interrupt line is connected to the processor
control (state machine)
99Interrupts
Address Bus
Processor
Data Bus
Int
Device 1
Device 2
At the beginning of every instruction execution
sequence a check is made on the status of the
"int" line
100Interrupts
Address Bus
Processor
Data Bus
Int
Device 1
Device 2
If "int" is asserted special states can be used
to handle the interrupt
101Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If the processor decides to handle the interrupt
it asserts the inta (interrupt acknowledege) line
102Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If Device 1 was one of the devices asserting
"int" it receives the acknowledgement and doesn't
pass it on
103Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
If Device 1 wasn't one of the devices asserting
"int" it receives the acknowledgement and passes
it on
104Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
Assume it's Device 2 that wants to interrupt.
105Interrupts
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
Now knowing that the processor is listening,
Device 2 can put the address of it's entry in the
interrupt vector table onto the data bus
106Interrupts
Memory
0x12345678 0x3579BDFA 0x12345678 0x3579BDFE
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
The interrupt vector table is located in very low
memory and consists of a table of pointers to
interrupt handling routines
107Interrupts
Memory
0x12345678 0x3579BDFA 0x12345678 0x3579BDFE
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
This allows the processor to jump to the code to
handle the interrupt
108Interrupts
Memory
0x12345678 0x3579BDFA 0x12345678 0x3579BDFE
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
Once complete the handler executes a "return from
interrupt" instruction
109Hardware Mechanics Summary
- 1. Interrupt signal (INT)
- devices-to-CPU?
- 2. Interrupt Acknowledge (IACK)
- CPU-to-devices
- 3. Forced procedure call to interrupt handler
110Hardware Mechanics SummarySubtleties
- 1. Interrupt signal (INT)
- devices-to-CPU?
- 2. Interrupt Acknowledge (IACK)
- CPU-to-devices
- With multiple interrupts, which device goes
first?? - 3. Forced procedure call to interrupt handler
- How do you get the address of the interrupt
handler?? - Where do you keep the return address?
- n. potential recursion
- What if you get an interrupt while servicing an
interrupt??
111IACK Problemone soln daisy-chain the IACK line
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
Limitations? Alternatives?
If Device 1 was one of the devices asserting
"int" it receives the acknowledgement and doesn't
pass it on
112Which-Handler Problem
(i.e. how do we handle the interruption in the
CPU?)
- Options?
- 1. One handler leave dispatch to software!
- 2. Interrupt vector table
- device provides a number at IACK time
- CPU (microcode) uses number to index into a table
- CPU jumps to address in that table
- Illustrated in preceeding slides
- 3. Raw vector
- device provides an address at IACK time and CPU
jumps - used in Project 2
113Crude Network Interfacea la project 2
Add 18th word NIVEC pointer to interrupt
handler
114Return-Address Problem
- Standard procedure call uses JALR and saves the
return address in register RA - Interrupt procedure call cant use RA
- its unpredictable and would smash whatever is
there! - Options?
- many...
- Last time PRJ2 dedicates a processor register,
K0
115Recursive Interrupt Problem
Memory
0x12345678 0x3579BDFA 0x12345678 0x3579BDFE
Address Bus
Processor
Data Bus
Int
Inta
Device 1
Device 2
What if Device 2 interrupts while the handler for
Device 1 is running? Or vice versa? Or double
interrupt from the same device?
116Recursive Interrupt Problem
Memory
0x12345678 0x3579BDFA 0x12345678 0x3579BDFE
Address Bus
Processor
Data Bus
Int
0
intr enable
Inta
Device 1
Device 2
Add an interrupt enable bit to the
processor 1. cleared at interrupt time 2. set
at RETI time 3. EI/DI instrs.
117C. Interrupt MechanicsSoftware
118Example Device Interrupt(Say, arrival of
network message)
Save registers ? lw r1,20(r0) lw r2,0(r1) addi
r3,r0,5 sw 0(r1),r3 ? Restore registers Clear
current Int RETI
? add r1,r2,r3 subi r4,r1,4 slli
r4,r4,2 Hiccup(!) lw r2,0(r4) lw r3,4(r4) add r2
,r2,r3 sw 8(r4),r2 ?
(callee save)
External Interrupt
Interrupt Handler
code to handle int.
(callee restore)
(reset bit)
(return from interrupt)
119Interrupt Mechanisms
- Basic mechanism forced subroutine call (transfer
of control w/saved return address) - Must have a means to disable interrupts to
prevent nested, recursive interrupts. - one bit
- Additions for performance
- selective disable of multiple interrupt sources
(priority level or a bit-per-source) - hardware to encode the source of the interrupt.
(if another interrupt comes along, we wait or
keep trying to send)
120Nested Interrupts
(if higher priority interrupt comes along, we
could process it first)
Raise priority Reenable All Ints Save
registers ? lw r1,20(r0) lw r2,0(r1) addi
r3,r0,5 sw 0(r1),r3 ? Restore registers Clear
current Int Disable All Ints Restore priority RTE
? add r1,r2,r3 subi r4,r1,4 slli
r4,r4,2 Hiccup(!) lw r2,0(r4) lw r3,4(r4) add r2
,r2,r3 sw 8(r4),r2 ?
Could be interrupted by disk
Network Interrupt
Note that priority must be raised to avoid
recursive interrupts!
121Example Handler
- Init code
- Write to NIVEC register
- Handler code
- save all registers used by handler to stack
- do handler action
- restore all registers used by handler from stack
- JALR K0, ZERO
122D. CPU Load of Interrupts
- Interrupts cost some CPU time
123Suppose we have lots of devices
Address Bus
Processor
Data Bus
Device 37
Device 1
Device 1
Device 1
Device 1
Device 1
Device 2
All generating interrupts...
124How do you know theres enough CPU time?
Device Rate Handler time ------
---- ------------ Network 100/S
1mS Display 50/S 10mS
What fraction of the CPU is consumed by
interrupts? Could we add a sound card if it took
5mS, 100/S?
125How do you know theres enough CPU time?
Device Rate Handler time ------
---- ------------ Network 100/S
1mS --gt 10 Display 50/S
10mS --gt 50
100 int/s 1 ms/int 1s/1000ms 0.1 50 int/s
10 ms/int 1s/1000ms 0.5 100 int/s 5
ms/int 1s/1000ms 0.5
What fraction of the CPU is consumed by
interrupts? ? 60 Could we add a sound card if
it took 5mS, 100/S? ? that would be 50 ...
no!, 6050 gt 100
126E. Generalization
- Interrupts for internal events
- Interrupts as part of protection
127Interrupt/Exception/Trap Classifications
- Interrupts caused by asynchronous, outside
events - I/O devices requiring service (disk, network)
- Clock interrupts (real time scheduling)
- Exceptions relevant to the current instruction
- Faults, arithmetic traps, other synchronous traps
- Traps deliberately caused by the current
instruction - Invoke software on behalf of the currently
executing process - Other, e.g. hardware failure
- Non recoverable ECC, power outage, FPU is on
fire... - asynchronous
- not necessarily recoverable
128Interrupt/Exception/Trap Classifications
- Interrupts caused by asynchronous, outside
events - Exceptions synchronous but unintentional
- Traps synchronous, intentional
- HP Exceptions of which some are interrupts
- SGG Interrupts of which some are
exceptions/traps - occasionally seen
- fault (as in page fault ... an exception in
our terminology) - machine check (unrecoverably fatal condition)
WARNING Inconsistent Terminology Zone
first of several, unfortunately
129Interrupts and Protection
- Interrupts and protection are orthogonal
- However, conventionally, interrupts switch into
supervisor (kernel) state. - some interrupt handlers must be protected
- deliberately-invoked-traps (software traps) make
a nice interface for system calls - therefore, it has been convenient to have all
interrupts go to the kernel
130Summary(note wrap-up visualization follows)
- A. I/O devices memory-map their state
- B. Interrupt mechanics Hardware
- C. Interrupt mechanics Software (handlers)
- D. CPU load of interrupts compute of time
- E. General Mechanism Interrupts/Exceptions/Traps
131Visualization of Program Execution
PC (mem. addr.)
time
132Visualization of Program Execution
a procedure call
a loop
PC (mem. addr.)
an interrupt
time
133Program Execution w/Protection
1. interrupts go to kernel mode 2. system calls
switch to kernel mode to interact w/IO
a loop
user space
PC (mem. addr.)
a system call
kernel space
an interrupt
time
134Program Execution w/Protection ( w/IO)
I/O (kernel) space
a loop
user space
PC (mem. addr.)
a system call
kernel space
an interrupt
time
135Bonus Slides
136Example Device Interrupt(Say, arrival of
network message)
Raise priority Reenable All Ints Save
registers ? lw r1,20(r0) lw r2,0(r1) addi
r3,r0,5 sw 0(r1),r3 ? Restore registers Clear
current Int Disable All Ints Restore priority RTE
? add r1,r2,r3 subi r4,r1,4 slli
r4,r4,2 Hiccup(!) lw r2,0(r4) lw r3,4(r4) add r2
,r2,r3 sw 8(r4),r2 ?
External Interrupt
Interrupt Handler
137Alternative Polling(again, for arrival of
network message)
Disable Network Intr ? subi r4,r1,4 slli
r4,r4,2 lw r2,0(r4) lw r3,4(r4) add r2,r2,r3 sw
8(r4),r2 lw r1,12(r0) beq r1,no_mess lw r1,20(r0)
lw r2,0(r1) addi r3,r0,5 sw 0(r1),r3 Clear
Network Intr ?
Polling Point (check device register)
Handler
no_mess
138Delays of Interrupts/Polling
- Interrupts
- disrupts pipeline (usually must wait for a
pipeline flush) - save/restore registers
- other housekeeping (priority adjustments, kernel
stuff) - Polling
- must perform check whether theres an event
waiting to be processed or not. - if check is periodic, event delivery is delayed
by half a period if events arrive at random.
139Is Polling faster or slower than Interrupts?
- Polling is faster!
- Compiler knows which registers in use at polling
point. Hence, do not need to save and restore
registers (or not as many). - Other interrupt overhead avoided (pipeline flush,
trap priorities, etc). - Interrupts are faster!
- Overhead of polling instructions is incurred
regardless of whether or not handler is run.
This could add to inner-loop delay. - Device may have to wait for service for a long
time. - When to use one or the other?
- Multi-axis tradeoff
- Frequent, regular events are good for polling, as
long as the device can be controlled at user
level. - Interrupts are good for infrequent/irregular
events - Interrupts are good for ensuring predictable
service of events.