William Stallings Computer Organization and Architecture - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

William Stallings Computer Organization and Architecture

Description:

CPU must have some working space (temporary storage) ... Ideally instruction cycle time would be halved (if durationF = durationE ...) 32 ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 65
Provided by: ChiChe7
Category:

less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture


1
William Stallings Computer Organization and
Architecture
  • Chapter 12
  • CPU Structure
  • and Function

2
Topics
  • Processor Organization
  • Register Organization
  • Instruction Cycle
  • Instruction Pipelining

3
CPU Organization
  • What does CPU do?
  • Fetch instructions
  • Interpret instructions
  • Fetch data
  • Process data
  • Write data
  • Major components of CPU
  • ALU
  • Control unit
  • Registers

4
Internal Structure of CPU
5
Registers
  • CPU must have some working space (temporary
    storage)
  • e.g., store data temporarily, remember where to
    get next instruction
  • Registers temporary storage in CPU
  • Number and function vary between processor
    designs
  • One of the major design decisions
  • Top level of memory hierarchy
  • Small set of high-speed (expensive) storage

6
Register Organization
  • Two types of registers
  • User visible registers
  • may be referenced by assembly-level instruction
  • Control and status
  • used by control unit to control CPU operations
    and
  • by OS programs

7
User Visible Registers
  • General-Purpose
  • Data
  • Address
  • Condition Codes

8
General-Purpose Registers
  • May be true general purpose
  • E.g., PDP-11 R0 R7
  • May be restricted
  • E.g., stack pointer, floating point registers
  • May be used for data or addressing
  • Data
  • e.g., accumulator, Motorola 68000 D0 D7
  • Addressing
  • e.g., segment, stack pointer, Motorola 68000 A0
    A7

9
Design Issues of General-Purpose Registers (1)
  • General Purpose vs. Specialized
  • General purpose
  • Increase flexibility and programmer options
  • Increase instruction size complexity
  • Specialized (w/ implicit spec. in opcode)
  • Smaller (faster) instructions, as it saves bits
  • Less flexibility
  • Which is better?
  • No final and best answer
  • Trend toward specialized

10
Design Issues of General-Purpose Registers (2)
  • How many?
  • Between 8 - 32
  • Fewer ? more memory references
  • More does not reduce memory references and takes
    up processor real estate
  • See also RISC (later)
  • Advantages of using hundreds of registers

11
Design Issues of General-Purpose Registers (3)
  • How big?
  • Large enough to hold full address
  • Large enough to hold full word
  • Often possible to combine two data registers
  • e.g., PDP-11 two registers to hold one long
    integer

12
Condition Code Registers
  • Sets of individual bits
  • Set by CPU as the result of operation
  • e.g., result of last operation was zero --- Z bit
    ? 1
  • Can be read (implicitly) by programs
  • e.g. Branch if zero
  • Might be set by programs
  • Some instructions can set or clear condition codes

13
Control Status Registers
  • Program Counter
  • Instruction Register
  • Memory Address Register
  • Memory Buffer Register
  • Refresh your memory what do these all do?
  • Program Status Word (PSW)

14
Program Status Word
  • A set of bits
  • Includes Condition Codes
  • Sign of last result
  • Zero
  • Carry
  • Overflow
  • Extension
  • Equal
  • Interrupt enable/disable, interrupt mask
  • Supervisor

15
Supervisor Mode
  • Kernel mode
  • Monitor mode
  • Allows privileged instructions to execute
  • e.g., system (service) call
  • Used by operating system
  • Not available to user programs

16
Program Status Word - Example
  • Motorola 68000s PSW
  • System Byte User Byte
  • Interrupt Mask
  • Supervisor Status
  • Trace Mode

15 14 13 12 11 10 9 8 7 6
5 4 3 2 1 0
T S I2 I1 I0
X N Z V C
17
Other Registers
  • May have registers pointing to (see O/S)
  • Process control blocks
  • Interrupt vector register
  • Page table pointer
  • Design issues
  • OS Support
  • Allocation between registers and memory
  • CPU design and operating system design are
    closely linked!!

18
Example Register Organizations
19
Instruction Cycle
  • Revisit

20
Indirect Cycle
  • May require memory access to fetch operands
  • Indirect addressing requires more memory accesses
  • Can be thought of as additional instruction
    subcycle

21
Instruction Cycle with Indirect
22
Instruction Cycle State Diagram
23
Data Flow (Instruction Fetch)
  • Depends on CPU design
  • In general
  • Fetch
  • 1. PC contains address of next instruction
  • 2. Address moved to MAR
  • 3. Address placed on address bus
  • 4. Control unit requests memory read
  • 5. Result placed on data bus, copied to MBR, then
    to IR
  • 6. Meanwhile PC incremented by 1

24
Data Flow (Fetch Diagram)
2
3
1
6
4
5
25
Data Flow (Data Fetch)
  • IR is examined
  • If indirect addressing, indirect cycle is
    performed
  • 1. Right most N bits of MBR transferred to MAR
  • 2. Control unit requests memory read
  • 3. Result (address of operand) moved to MBR
  • op-code address
  • instruction format

26
Data Flow (Indirect Diagram)
2
1
3
27
Data Flow (Execute)
  • May take many forms
  • Depends on instruction being executed
  • May include
  • Memory read/write
  • Input/Output
  • Register transfers
  • ALU operations

28
Data Flow (Interrupt)
  • Simple, predictable
  • Current PC saved to allow resumption after
    interrupt
  • Include
  • 1. Contents of PC copied to MBR
  • 2. Special memory location (e.g. stack pointer)
    loaded to MAR
  • 3. Control unit WRITE
  • 4. MBR written to memory
  • 5. PC loaded with address of interrupt handling
    routine
  • (Next instruction (first of interrupt handler)
    can then be fetched)

29
Data Flow (Interrupt Diagram)
2
3
5
1
4
30
Instruction Pipelining
  • Similar to assembly line in manufacturing plants
  • Products at various stages can be worked on
    simultaneously
  • ? Performance improved
  • First attempt 2 stages
  • Fetch
  • Execution

31
Prefetch
  • Fetch accessing main memory
  • Execution usually does not access main memory
  • Can fetch next instruction during execution of
    current instruction
  • Called instruction prefetch or fetch overlap
  • Ideally instruction cycle time would be halved
  • (if durationF durationE )

32
Improved Performance (1)
  • But not doubled in reality, why?
  • Fetch usually shorter than execution
  • Prefetch more than one instruction?
  • Any jump or branch (branching) means that
    prefetched instructions are not the required
    instructions
  • e.g., ADD A, B
  • BEQ NEXT
  • ADD B, C
  • NEXT SUB C, D

33
Improved Performance (2)
  • Add more stages to improve performance
  • Reduce time loss due to branching by guessing
  • Prefetch instruction after branching instruction
  • If not branched
  • use the prefetched instruction
  • else
  • discard the prefetched instruction
  • fetch new instruction

34
Two Stage Instruction Pipeline
35
Pipelining
  • More stages ? more speedup (the more the merrier)
  • FI Fetch instruction
  • DI Decode instruction
  • CO Calculate operands (i.e. EAs)
  • FO Fetch operands
  • EI Execute instructions
  • WO Write result
  • Various stages are of nearly equal duration
  • Overlap these operations

36
Timing of Pipeline
37
Speedup of Pipelining (1)
  • 9 instructions 6 stages
  • w/o pipelining __ time units
  • w/ pipelining __ time units
  • speedup _____
  • Q 100 instructions 6 stages, speedup ____
  • Q ? instructions k stages, speedup ____
  • Can you prove it (formally)?

38
Speedup of Pipelining (2)
  • Parameters
  • ? pipeline cycle time time to advance a
    set of instructions one stage
  • k number of stages
  • n number of instructions
  • Assume no branch, time to execute n instructions
    Tk k (n - 1)?
  • Time to execute n instructions without pipelining
    T1 nk?

39
Speedup of Pipelining (3)
  • Speedup of k-stage pipelining compared to without
    pipelining
  • Q ? instructions k stages, speedup ____
  • Can you prove it (formally)?

40
Pipelining - Discussion
  • Not all stages are needed in one instruction
  • e.g., LOAD WO not needed
  • Timing is set up assuming all stages are needed
    by each instruction
  • ? Simplify pipeline hardware
  • Assume all stages can be performed in parallel
  • e.g., FI, FO, and WO ? memory conflicts

41
Limitation by Branching
  • Conditional branch instructions can invalidate
    several instruction prefetches
  • In our example (see next slide)
  • Instruction 3 is a conditional branch to
    instruction 15
  • Next instructions address wont be known till
    instruction 3 is executed (at time unit 7)
  • ? pipeline must be cleared
  • No instruction is finished from time units 9 to
    12
  • ? performance penalty

42
Branch in a Pipeline
43
Six Stage Instruction Pipeline
44
Alternative Pipeline Depiction
45
Speedup Factors with Instruction Pipelining
46
Limitation byData Dependencies
  • Data needed by current instruction may depend on
    a previous instruction that is still in pipeline
  • E.g., A ? B C
  • D ? A E

47
Performance of Pipeline
  • Ideally, more stages, more speedup
  • However,
  • more overhead in moving data between buffers
  • more overhead in preparation
  • more complex circuit for pipeline hardware

48
Dealing with Branches
  • Multiple Streams
  • Prefetch Branch Target
  • Loop buffer
  • Branch prediction
  • Delayed branching

49
Multiple Streams
  • Have two pipelines
  • Prefetch each branch into a separate pipeline
  • Use appropriate pipeline
  • Problems
  • Leads to bus register contention delays
  • Multiple branches (i.e., additional branch
    entering pipelines before original branch
    decision made) lead to further pipelines being
    needed
  • Can improve performance, anyway
  • e.g., IBM 370/168

50
Prefetch Branch Target
  • Target of branch is prefetched in addition to
    instructions following branch
  • Keep target until branch is executed
  • If branch is taken, target is already prefetched
  • Used by IBM 360/91

51
Loop Buffer (1)
  • Small, very fast memory
  • Maintained by fetch (IF) stage of pipeline
  • Contains the n most recently fetched instructions
    in sequence
  • If a branch is to be taken
  • Hardware checks whether the target is in buffer
  • If YES then
  • next instruction is fetched from the buffer
  • else
  • fetch from memory

52
Loop Buffer (2)
  • Reduce memory access time
  • Very good for small loops or jumps
  • If buffer is big enough to contain entire loop,
    instructions in the loop need to be fetched from
    memory only once at the first iteration
  • c.f. cache
  • Used by CRAY-1

53
Loop Buffer Diagram
54
Branch Prediction (1)
  • Predict whether a branch will be taken
  • If the prediction is right
  • ? No branch penalty
  • If the prediction is wrong
  • ? Empty pipeline
  • Fetch correct instruction
  • ? Branch penalty

55
Branch Prediction (2)
  • Predict techniques
  • Static
  • Predict never taken
  • Predict always taken
  • Predict by opcode
  • Dynamic
  • Taken/not taken switch
  • Branch history table

56
Branch Prediction (3)
  • Predict never taken
  • Assume that jump will not happen
  • Always fetch next instruction
  • 68020 VAX 11/780
  • VAX will not prefetch after branch if a page
    fault would result (O/S v CPU design)
  • Predict always taken
  • Assume that jump will happen
  • Always fetch target instruction

57
Branch Prediction (4)
  • Predict by Opcode
  • Some instructions are more likely to result in a
    jump than others
  • Can get up to 75 success
  • Taken/Not taken switch
  • Based on previous history
  • Good for loops
  • Branch history table
  • Like a cache to look up

58
Branch Prediction Flowchart
59
Branch Prediction State Diagram
60
Dealing With Branches
61
Delayed Branching
  • Do not take jump until you have to
  • Rearrange instructions so that branch instruction
    occurs later than actually desired (Chapter 13)

62
Intel 80486 Pipelining
  • Fetch
  • From cache or external memory
  • Put in one of two 16-byte prefetch buffers
  • Fill buffer with new data as soon as old data
    consumed
  • Average 5 instructions fetched per load
  • Independent of other stages to keep buffers full
  • Decode stage 1
  • Opcode address-mode info
  • At most first 3 bytes of instruction
  • Can direct D2 stage to get rest of instruction
  • Decode stage 2
  • Expand opcode into control signals
  • Computation of complex address modes
  • Execute
  • ALU operations, cache access, register update
  • Writeback
  • Update registers flags
  • Results sent to cache bus interface write
    buffers

63
80486 Instruction Pipeline Examples
64
Foreground Reading
  • Processor examples
  • Stallings Chapter 12
  • Manufacturer web sites specs
  • Web pages etc.
Write a Comment
User Comments (0)
About PowerShow.com