Title: INSTRUCTION PIPELINING
1INSTRUCTION PIPELINING
2What is pipelining?
- The greater performance of the cpu is achieved by
instruction pipelining. - 8086 microprocesor has two blocks
- BIU(BUS INTERFACE UNIT)
- EU(EXECUTION UNIT)
- The BIU performs all bus operations such as
instruction fetching,reading and writing operands
for memory and calculating the addresses of the
memory operands. The instruction bytes are
transferred to the instruction queue. - EU executes instructions from the instruction
system byte queue. - Both units operate asynchronously to give the
8086 an overlapping instruction fetch and
execution mechanism which is called as
Pipelining.
3 INSTRUCTION PIPELINING
- First stage fetches the instruction and buffers
it. - When the second stage is free, the first stage
passes it the buffered instruction. - While the second stage is executing the
instruction,the first stage takes advantages of
any unused memory cycles to fetch and buffer the
next instruction. - This is called instruction prefetch or fetch
overlap.
4 Inefficiency in two stage
instruction pipelining
- There are two reasons
- The execution time will generally be longer than
the fetch time.Thus the fetch stage may have to
wait for some time before it can empty the
buffer. - When conditional branch occurs,then the address
of next instruction to be fetched become
unknown.Then the execution stage have to wait
while the next instruction is fetched.
5 Two stage instruction pipelining
Simplified view
wait new address
wait
Instruction
Instruction Result
discard
EXPANDED VIEW
Fetch
Execute
6 Decomposition of instruction processing
- To gain further speedup,the pipeline have more
stages(6 stages) - Fetch instruction(FI)
- Decode instruction(DI)
- Calculate operands (i.e. EAs)(CO)
- Fetch operands(FO)
- Execute instructions(EI)
- Write operand(WO)
7 SIX STAGE OF INSTRUCTION PIPELINING
- Fetch Instruction(FI)
- Read the next expected instruction
into a buffer - Decode Instruction(DI)
- Determine the opcode and the operand
specifiers. - Calculate Operands(CO)
- Calculate the effective address of
each source operand. - Fetch Operands(FO)
- Fetch each operand from memory.
Operands in registers need not be fetched. - Execute Instruction(EI)
- Perform the indicated operation
and store the result - Write Operand(WO)
- Store the result in memory.
8 Timing diagram for instruction pipeline
operation
9 High efficiency of instruction pipelining
- Assume all the below in diagram
- All stages will be of equal duration.
- Each instruction goes through all the six stages
of the pipeline. - All the stages can be performed parallel.
- No memory conflicts.
- All the accesses occur simultaneously.
- In the previous diagram the instruction
pipelining works very efficiently and give high
performance
10 Limits to performance enhancement
- The factors affecting the performance are
- If six stages are not of equal duration,then
there will be some waiting time at various
stages. - Conditional branch instruction which can
invalidate several instruction fetches. - Interrupt which is unpredictable event.
- Register and memory conflicts.
- CO stage may depend on the contents of a register
that could be altered by a previous instruction
that is still in pipeline.
11 Effect of conditional branch on
instruction pipeline operation
12 Conditional branch instructions
- Assume that the instruction 3 is a conditional
branch to instruction 15. - Until the instruction is executed there is no way
of knowing which instruction will come next - The pipeline will simply loads the next
instruction in the sequence and execute. - Branch is not determined until the end of time
unit 7. - During time unit 8,instruction 15 enters into the
pipeline. - No instruction complete during time units 9
through 12. - This is the performance penalty incurred because
we could not anticipate the branch.
13 Simple pattern for high performance
- Two factors that frustrate this simple pattern
for high performance are - At each stage of the pipeline,there is some
overhead involved in moving data from buffer to
buffer and in performing various preparation and
delivery functions.This overhead will lengthen
the execution time of a single instruction.This
is significant when sequential instructions are
logically dependent,either through heavy use of
branching or through memory access dependencies - The amount of control logic required to handle
memory and register dependencies and to optimize
the use of the pipeline increases enormously with
the number of stages.
14 Six-stage CPU instruction pipeline
15 THANK YOU
168086 Pin Function
17Pin Diagram
18Pin Functions
- Out of 40 pins, 32 pins are having same function
in minimum or maximum mode, - And remaining 8 pins are having different
functions in minimum and maximum mode. - Following are the pins which are having same
functions
19Symbol AD15 - AD0, Pin No. 39, 2-16 Type I/O
- ADDRESS DATA BUS time multiplexed memory/IO
address (T1), and data (T2, T3, TW, T4) bus. - These lines are active HIGH and float to 3-state
OFF during interrupt acknowledge and local bus
hold acknowledge''.
20Symbol A19/S6, A18/S5, A17/S4, A16/S3Pin No 35
- 38 Type O
- Address/ Status lines
- During T1 Address and then during T2, T3, Tw, T4
Status - S5 IF flag condition and S6 LOW
A17/S4 A16/S3 Characteristics
0 (Low) 0 1 (High) 1 0 1 0 1 Alternate Data Stack Code or none Data
21Symbol BHE/S7Pin No. 34Type O
BHE A0 Characteristics
0 0 1 1 0 1 0 1 Whole word from even location Upper byte from/to odd address Lower byte from/to even address None
22Symbol RDPin No. 32Type O
- Read RD is active LOW during read cycle in T2,
T3 and Tw clocks and indicates that processor is
performing memory or I/O read
23Symbol READYPin No. 22Type I
- Ready signal is received from memory or I/O
devices to indicate the completion of data
transfer - Synchronized by 8284 clock generator
24Symbol INTRPin No. 18Type I
- Interrupt Request Level triggered input received
from interrupting device - Sampled during last clock of each instruction
cycle - A subroutine is vectored through IVT if interrupt
enable flag (IF) is SET
25Symbol TESTPin No. 23Type I
- Test Input is examined by the wait
instruction, if TEST is LOW processor will
continue execution otherwise wait in an idle
state.
26Symbol NMIPin No. 17Type I
- Non Maskable Interrupt Edge triggered input
causes a TYPE 2 interrupt. - Not maskable internally by software.
27Symbol RESETPin No. 21Type I
- Reset Input causes the processor to immediately
terminate its present activity - Must be HIGH for at least 4 clock cycles
28Symbol CLKPin No. 19Type I
- Clock provides the basic timing for the
processor and bus controller. - It is asymmetric with a 33 duty cycle to provide
optimized internal timing.
29Symbol VccPin No. 40
30Symbol GNDPin No. 1, 20
31Symbol MN/MXPin No. 33Type I
- MINIMUM/MAXIMUM indicates what mode the
processor is to operate in. - HIGH indicates minimum mode (Single processor
system) - LOW indicates maximum mode (Multi-processor
system)
32Pins having different functions in maximum mode
- Pin number 24 to 31 is having different functions
in maximum mode which is explained below
33Symbol S2, S1, S0 Pin No. 26-28Type O
- Status active during T4, T1, and T2 and is
returned to the passive state (1, 1, 1) during T3
or during TW when READY is HIGH - Used by the 8288 Bus Controller to generate all
memory and I/O access control signals
34S2 S1 S0 Characteristics
0 0 0 Interrupt Acknowledge
0 0 1 Read I/O Port
0 1 0 Write I/O Port
0 1 1 Halt
1 0 0 Code Access
1 0 1 Read Memory
1 1 0 Write Memory
1 1 1 Passive
35Symbol RQ/GT0, RQ/GT1Pin No. 30, 31Type
I/O
- Request/Grant Pins are used by other local bus
masters to force the processor to release the
local bus at the end of the processor's current
bus cycle. - RQ/GT0 is having higher priority than RQ/GT1
36Symbol LOCK Pin No. 29Type O
- LOCK output indicates that other system bus
masters are not to gain control of the system bus
while LOCK is active LOW. - Activated by the LOCK'' prefix instruction and
remains active until the completion of the next
instruction.
37Symbol QS1, QS0 Pin No. 24, 25Type O
- Queue Status The queue status is valid during
the CLK cycle after which the queue operation is
performed.
QS1 QS0 Characteristics
0 0 No Operation
0 1 First Byte of Op Code from Queue
1 0 Empty the Queue
1 1 Subsequent Byte from Queue
38Pins having different functions in minimum mode
- Pin number 24 to 31 is having different functions
in minimum mode which is explained below
39Symbol M/IOPin No. 28Type O
- Status Line used to distinguish a memory access
from an I/O access - HIGH for memory operation and
- LOW for I/O operations
40Symbol WRPin No. 29Type O
- Write indicates that the processor is performing
a write memory or write I/O cycle
41Symbol INTAPin No. 24Type O
- Interrupt Acknowledgement used as a read strobe
for interrupt acknowledge cycles - Active LOW during T2, T3 and TW of each interrupt
acknowledge cycle.
42Symbol ALEPin No. 25Type O
- Address Latch Enable It is a HIGH pulse active
during T1 of any bus cycle - Provided by the processor to latch the address
into the 8282/8283 address latch.
43Symbol DT/RPin No. 27Type O
- Data Transmit/Receive used to control the
direction of data flow through the transceiver
44Symbol DENPin No. 26Type O
- Data Enable provided as an output enable for the
8286/8287 in a minimum system which uses the
transceiver
45Symbol HOLD, HLDAPin No. 31, 30Type I, O
- Hold indicates that another master is requesting
a local bus hold.' - The processor receiving the hold'' request will
issue HLDA (HIGH) as an acknowledgement
46Happy Learning
- Emailmadhuoruganti_at_sreenidhi.edu.in
47Combinational Circuits
48Outline
- Boolean Algebra
- Decoder
- Encoder
- MUX
49History Computer and the Rationalist
- Modern research issues in AI are formed and
evolve through a combination of historical,
social and cultural pressures. - The rationalist tradition had an early proponent
in Plato, and was continued on through the
writings of Pascal, Descates, and Liebniz - For the rationalist, the external world is
reconstructed through the clear and distinct
ideas of a mathematics
50History Development of Formal Logic
- The goal of creating a formal language for
thought also appears in the work of George Boole,
another 19th century mathematician whose work
must be included in the roots of AI - The importance of Booles accomplishment is in
the extraordinary power and simplicity of the
system he devised Three Operations
51Three Operations
- three basic Boolean operations can be defined
arithmetically as follows. - x?yxy
- x?yx y - xy
- x1 - x
52Boolean function and logic diagram
- Boolean algebra Deals with binary variables
and logic operations operating on those
variables. - Logic diagram Composed of graphic symbols for
logic gates. A simple circuit sketch that
represents inputs and outputs of Boolean
functions.
53Basic Identities of Boolean Algebra
- x 0 x
- x 0 0
- x 1 1
- x 1 1
- (5) x x x
- (6) x x x
- (7) x x x
- (8) x x 0
- (9) x y y x
- (10) xy yx
- (11) x ( y z ) ( x y ) z
- (12) x (yz) (xy) z
- (13) x ( y z ) xy xz
- (14) x yz ( x y )( x z)
- (15) ( x y ) x y
- (16) ( xy ) x y
- (17) (x) x
54Gates
- Refer to the hardware to implement Boolean
operators. - The most basic gates are
55Boolean function and truth table
56Outline
- Boolean Algebra
- Decoder
- Encoder
- MUX
57Decoder
- Accepts a value and decodes it
- Output corresponds to value of n inputs
- Consists of
- Inputs (n)
- Outputs (2n , numbered from 0 ? 2n - 1)
- Selectors / Enable (active high or active low)
58The truth table of 2-to-4 Decoder
592-to-4 Decoder
602-to-4 Decoder
61The truth table of 3-to-8 Decoder
A2 A1 A0 D0 D1 D2 D3 D4 D5 D6 D7
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 1
623-to-8 Decoder
633-to-8 Decoder with Enable
64Decoder Expansion
- Decoder expansion
- Combine two or more small decoders with enable
inputs to form a larger decoder - 3-to-8-line decoder constructed from two
2-to-4-line decoders - The MSB is connected to the enable inputs
- if A20, upper is enabled if A21, lower is
enabled.
65Decoder Expansion
66Combining two 2-4 decoders to form one 3-8
decoder using enable switch
The highest bit is used for the enables
67How about 4-16 decoder
- Use how many 3-8 decoder?
- Use how many 2-4 decoder?
68Outline
- Boolean Algebra
- Decoder
- Encoder
- Mux
69Encoders
- Perform the inverse operation of a decoder
- 2n (or less) input lines and n output lines
70Encoders
71Encoders with OR gates
72Encoders
- Perform the inverse operation of a decoder
- 2n (or less) input lines and n output lines
73Outline
- Boolean Algebra
- Decoder
- Encoder
- Mux
74Multiplexer (MUX)
A multiplexer can use addressing bits to select
one of several input bits to be the output.
- A selector chooses a single data input and passes
it to the MUX output - It has one output selected at a time.
75Function table with enable
764 to 1 line multiplexer
4 to 1 line multiplexer 2n MUX to 1 n for this
MUX is 2 This means 2 selection lines s0 and s1
S1 S0 F
0 0 I0
0 1 I1
1 0 I2
1 1 I3
77Multiplexer (MUX)
- Consists of
- Inputs (multiple) 2n
- Output (single)
- Selectors ( depends on of inputs) n
- Enable (active high or active low)
78Multiplexers versus decoders
- A Multiplexer uses n binary select bits to
choose from a maximum of 2n unique input lines. - Decoders have 2n number of output lines while
- multiplexers have only one output line.
- The output of the multiplexer is the data input
whose index is specified by the n bit code.
79Multiplexer Versus Decoder
2-to-4 Decoder
4-to-1 Multiplexer
Note that the multiplexer has an extra OR gate.
A1 and A0 are the two inputs in decoder. There
are four inputs plus two selecs in multiplexer.
80Cascading multiplexers
Using three 2-1 MUX to make one 4-1 MUX
F
S1 S0 F
0 0 I0
0 1 I1
1 0 I2
1 1 I3
81Example Construct an 8-to-1 multiplexer using
2-to-1 multiplexers.
I0 I1
S2 S1 S0 F
0 0 0 I0
0 0 1 I1
0 1 0 I2
0 1 1 I3
1 0 0 I4
1 0 1 I5
1 1 0 I6
1 1 1 I7
I2 I3
F
2-1 MUX
S E
I4 I5
S2 E
I6 I7
82Example Construct 8-to-1 multiplexer using one
2-to-1 multiplexer and two 4-to-1 multiplexers
S2 S1 S0 X
0 0 0 I0
0 0 1 I1
0 1 0 I2
0 1 1 I3
1 0 0 I4
1 0 1 I5
1 1 0 I6
1 1 1 I7
83Quadruple 2-to-1 Line Multiplexer
Used to supply four bits to the output. In this
case two inputs four bits each.
84Quadruple 2-to-1 Line Multiplexer
E (Enable) S (Select) Y (Output)
0 X All 0s
1 0 A
1 1 B
85Sequential circuits
- part 2 implementation, analysis design
86More summer fashion
- SR is one of 4 basic flip flops common in
computer design - Others can all be constructed from SR they are
- JK
- D (data)
- T (toggle)
87JK flip flop
- Resolves undefined transition in SR
- J input acts like S (sets device)
- K acts like R (resets)
- When JK 11, have toggle condition switch from
one state to other
88Implementation of JK flip flop
89JK flip flop implementation
- If JK 00, SR 00 because of AND so SR wont
change state when clocked
90JK flip flop implementation
- If JK 10, R must be 0
- if Q0, Q1, so SR10, the set condition flip
flop will change state (to Q1) - if Q1, Q0, SR00 (stable condition) so flip
flop stays in Q1
91JK flip flop implementation
- If JK 01, final state is Q0 (analogous to
JK10)
92JK flip flop implementation
- If JK11, Q connects directly to R, Q to S
- so if Q0, SR10, so Q1
- if Q1, SR01, so Q0
93D flip flop
- D data one input CP
- Q(t1) independent of Q(t) depends only on
value of D at time t - D flip flop holds data until next pulse
94Constructing registers
- Can use D flip flops to construct individual bits
of registers one signal sent to each bit - Setting/resetting flip flop requires a 1 signal
on exactly one of its input lines CP restricts
incoming signal to appropriate time so device
remains in sync - D is split in 2, with one half inverted so
always 1 true, 1 false on data line - Since CP usually false, both inputs normally 0
(no change in flip flop) - When clock goes high, one of 2 lines (S or R)
delivers 1
95Device select signal
- Used in combination with CP D signals to
determine if register should send or receive data - When one register is to send to another, 3
simultaneous signals sent to each register - clock
- device select
- send or receive
- All 3 ANDed together to indicate that specific
register should send or receive at specific time
96T flip flop
- T stands for Toggle
- like D, has one input CP
- acts like control line that specifies selective
toggle - if T0, flip flop doesnt change if T1, toggles
97Implementation of T flip flop
98General sequential network
- Sequential circuit interconnection of gates
flip flops - All gates can be grouped conceptually as
combinational network, all flip flops as group of
state registers - Between clock pulses, combinational part produces
output amount of time needed depends on number
of gates in net
99General sequential network
- Arrows one or more connecting lines
- I/O lines connections to external environment
- Arrow between boxes input lines to flip flops
- Clock line assumed but not shown
100Hardware analysis vs. design
- Analysis determine output given input and
sequential network - Design input and output are known need to
determine makeup of sequential network - General approach
- construct state transition table and transition
diagram - determine output stream for given input stream
101Excitation table
- The excitation table is a design tool for
constructing circuits from a given type of
flip-flop - Given the desired transition from Q(t) to Q(t
1), what inputs are necessary to make the
transition happen?
102Characteristic table vs. Excitation table for SR
flip flop
- Tells what next state is, given current input and
current state
- Tells what current input must be given current
state
103Sequential analysis
- Step 1 List all possible combinations of current
state and current input in an analysis table - Step 2 For each combination, compute the output
and the current inputs to the state registers - Step 3 From the characteristic table, determine
the next state and construct the state transition
table and diagram
104Example problem
- State registers FFA FFB (T flip flops)
- Combinational circuit
- inputs
- X1 AND B (TA)
- X2 OR A (TB)
- TA TB are inputs to FFA FFB
- output
- B AND X1 (Y)
105Example problem
- 2 flip flops, so 4 possible states
- 2 inputs, so 4 possible input combinations
A B
0 0
0 1
1 0
1 1
X1 X2
0 0
0 1
1 0
1 1
106Example problem
- Given a state (AB) and an input (X1X2)
- what is output?
- what will be the state after CP?
- 16 possible answers, as shown on next slide
107Analysis table for sample problem circuit
- 1st 4 columns list possible combinations of
initial state initial input - By the logic diagram, we know
- Y(t)X1(t) AND B(t)
- TA(t)X1(t) AND B(t)
- TB(t)X2(t) OR A(t)
- Compute next 3 columns given above
- Compute last 2 from
- characteristic table for T flip flop
- initial state of flip flop
- flip flops initial input
108State transition table
- Table shows simple rearrangement of selected
columns from table on previous slide - For given initial state A(t)B(t) and input
X1(t)X2(t), lists next state (A1)(t)(B1)(t) and
initial output Y(t) - States listed as ordered pairs next state
followed by initial output
109State transition diagram
- Easier to visualize circuit behavior
- Transitions listed as ordered pairs of input
followed by initial output, with slash separator
110Asynchronous inputs
- An asynchronous input changes state of a
flip-flop immediately without regard to CP - Preset sets Q to 1
- Clear clears Q to 0
- Used to initialize the state of a machine
- Normal operation both lines 0
111Sequential design
- Given the state transition diagram, the output,
and the type of flip-flop to be used, design the
combinational circuit - Any unused input combinations or unused states
are dont care conditions - 2n states are possible with n flip-flops
112Design steps
- Step 1 In a design table, list the initial
state, input, and output, and from the transition
diagram list the next state - Step 2 Use the excitation table for the given
type of flip-flop to determine the input required
for the state registers - Step 3 Use Karnaugh maps to design a minimized
two-level circuit for each flip-flop input
113Sample problem
114Design table for sample problem
115Sequential design K-maps
- Each flip flop in the problem can be considered a
function of four variables - initial state (AB)
- input (X1X2)
- To design the combinational circuit we need a
4-variable K-map for each flip flop input
116K-maps for sample problem
- Figures a and b below show K-maps for S R
inputs to FFA - Row values are AB, columns are X1X2
- X1X2 00 is a dont care condition for both
inputs, so first column of both tables is X
117K-maps for sample problem
- Figures c and d show inputs to FFB
- Note that we can take advantage of dont care
conditions to minimize circuit
118Resulting circuit with original spec
119K-map circuit for output Y
120Another look at the register
- Basic building block of instruction set
architecture - array of D flip flops each is bit in register
- common clock line connected to all flip flops
of flip flops doesnt affect speed of load
operation because all receive clock signal
simultaneously
121Memory
- Conceptually, main memory is just a big array of
registers - Input address lines, control lines, data lines
- Data lines are bidirectional (output also)
- Control signals
- CS Chip select, to enable or select the memory
chip - WE Write enable, to write or store a memory word
to the chip - OE Output enable, to enable the output buffer to
read a word from the chip
122Memory chips
Storage capacity of each is identical (512 bits)
left uses 8-bit word, right uses 1 Generally,
chip with 2n words has n address lines
123Memory access
- To store a word (memory write)
- Select chip by setting CS to 1
- Put data and address on the bus and set WE to 1
- To retrieve a word (memory read)
- Select chip by setting CS to 1
- Put address on the bus, set OE to 1, and read the
data on the bus
1244 x 2 memory chip
- 2 address lines (A0, A1) 2 data lines (D0, D1)
- Stores 4 2-bit words
- each bit is D flip flop
- Address lines drive 2 x 4 decoder
- 1 output is 1, other 3 0
- line with 1 signal selects row of D flip flops
that make up word accessed by chip
125Closer look
Diagram below shows implementation of Read
enable box
Alphabet soup WE write enable CS chip
select OE output enable MMV monostable
multivibrator (CP)
126Read Enable
- Three normal modes
- CS0 (chip not selected)
- CS1, WE1, OE0
- (chip selected for write)
- CS1, WE0, OE1
- (chip selected for read)
- WE OE not permitted to be 1 at same time
127Memory types volatile
- SRAM Static random access memory
- most closely resembles model weve seen
- advantage fast
- disadvantage large several transistors
required for each bit cell - DRAM Dynamic RAM
- overcomes size problem of SRAM one transistor,
one capacitor per cell - advantage high capacity
- disadvantage relatively slow because requires
refresh operation
128Memory types non-volatile
- ROM Read-only memory
- Simplest type, ROM, is prewritten to spec by
manufacturer cant be overwritten - PROM Programmable ROM user can write once (by
blowing embedded fuses) cant be overwritten - EPROM Erasable PROM can be wiped out
reprogrammed (requires removal from computer)
129Memory types non-volatile
- EEPROM Electrically erasable PROM
- Like EPROM, but doesnt require removal to
reprogram - Can reprogram individual cell (doesnt have to be
whole chip) - Flash memory A type of EEPROM
- flash card is array of flash chips
- flash drive has interface circuitry to mimic hard
drive