Computer architecture - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Computer architecture

Description:

Title: Architektura komputer w Author: Piotr Bilski Last modified by: Piotr Bilski Created Date: 2/18/2006 11:51:16 PM Document presentation format – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 45
Provided by: Piot93
Category:

less

Transcript and Presenter's Notes

Title: Computer architecture


1
Computer architecture
  • Lecture 6 Processors structure
  • Piotr Bilski

2
Procesors tasks
  • Instruction fetching
  • Instruction interpretation
  • Data fetching
  • Data processing
  • Data saving
  • These justify existence of the registers
    (temporary memory space)

3
Internal processors structure
ALU
Status flags
Registers
Shifter
Complementer
Arithmetic and Boolean Logic
Control Unit
4
Block Scheme of Pentium 3 Processor
5
Block Scheme of P6 Core (Pentium Pro) 1995 r.
  • Front-end of the processor
  • Core
  • Completion unit

6
Register types
  • Accessible for the user (addressing, data etc.)
  • Inaccessible for the user (control, status)
  • This categorization is not formal!

7
Registers accessible by the user
  • General Purpose Registers (GPR)
  • Data
  • Addressing (segment pointer, stack, indexing)
  • Conditional codes (state pointer, flags)
    read-only!

8
Control and state registers
  • Basic
  • Program Counter (PC)
  • Instruction Decoding Register (IR)
  • Memory Address Register (MAR)
  • Memory Buffer Register (MBR)
  • Program Status Word (PSW)
  • Interrupt Vector Register
  • Page Table Pointer

9
Program Status Word
0 3 4

15
P
R
OTHER
S
Z
O
I
N
S sign bit Z bit set, if operation result is
zero P carry bit R logical comparison result
bit O overflow bit I Enable/disable
interrupt execution N supervisor mode
10
Registers in the Motorola MC68000 processor
  • Data and address registers (32-bit)
  • Specialization 8 data registers (D0-D7) and 9
    address registers (two used interchangeably in
    the user and supervisor modes)
  • Control bus 24-bit, data bus 16-bit
  • A7 register used as a Stack Pointer (SP)
  • State register (SR)16-bit (another name CCR)
  • Program counter (PC) 32-bit
  • Instructions are stored under even addresses

11
Registers in the Intel 8086 Processor
  • 16-bit address and data registers
  • Data/General Purpose Registers (AX, BX, CX, DX)
  • Pointer and index registers (SP, BP, SI, DI)
  • Segment registers (CS, DS, SS, ES)
  • Instruction pointer
  • State register

12
Intel 8086 Registers (cont.)
SP BP SI DI
Stack pointer
AX BX CX DX
Accumulator
Base pointer
Base
Source index
Counting
Displ. ndex
Data
13
Intel 386 - Pentium Processors Registers
Organization
  • 32-bit data and address registers
  • Eight General Purpose Registers (EAX, EBX, ECX,
    EDX, ESP, EBP, ESI, EDI)
  • For the backward compatibility, the lower part of
    the registers are 16-bit registers
  • 32-bit status register
  • 32-bit instruction pointer

14
Floating-point registers of the Pentium processor
  • Eight 80-bit numerical registers
  • 16-bit control register
  • 16-bit state register
  • 16-bit floating point register content type word
  • 48-bit instruction pointer
  • 48-bit data pointer

15
EFLAGS register
0
15
21
31
ZF
SF
TF
IF
DF
OF
IOF
NT
RF
ID
AF
VM
AC
VIF
VIP
CF
PF
  • TF trap flag
  • IF interrupt enable flag
  • DF direction flag
  • IOPL privileged input/output flag
  • RF resume flag
  • AC alignment control
  • ID identification flag

16
Registers in the Athlon 64 processor
  • Compatibility with x86-64 architecture (40-bit
    physical address space, 48-bit virtual address
    space)
  • Data and address registers 64-bit
  • 8 general purpose registers (RAX, RBX, RCX, RDX,
    RBP, RSI, RDI, RSP), work in the 32-bit
    compatibility mode
  • Opteron contains additional 8 general purpose
    registers (R8-R15)
  • 16 SSE registers (XMM0-XMM15)
  • 8 floating-point registers x87, 80-bit

17
Registers in the PowerPC processor
  • 32 general purpose registers (64-bit) exception
    register (XER)
  • 32 registers for the floating point unit (64-bit)
    state and control register (FPSCR)
  • Branch processing unit registers 32-bit
    condition register, 64-bit counting and binding
    registers

18
Instruction mode
Indirect addressing
Argument address calc.
Argument fetching
Instruction fetch
Multiple arguments
Multiple results
Argument address calc.
Instruction address calc.
Data operation
Writing argument
Instructiondecoding
No interrupts
Return to data
Instruction executed, fetch the next one
Indirect addressing
Interrupts checking
Interrupt handling
19
Instruction fetching cycle
Address bus
Control bus
Data bus
Processor
MAR
PC
Memory
CU
IR
MBR
20
Indirect mode
Address bus
Control bus
Data bus
Processor
MAR
Memory
CU
MBR
21
Interrupt mode
Address bus
Control bus
Data bus
Processor
MAR
PC
Memory
CU
MBR
22
Pipeline
  • Problem during the instruction cycle only one
    instruction is processed
  • Solution divide the cycle into smaller fragments
  • Condition time instants, when no main memory
    access is required!

Cycle 1 Cycle 2
Cycle 3
23
Pipeline example - laundry
3 hours / cycle 9 hours for all
LA DR PA LA DR PA LA DR
PA
CYCLE 1 CYCLE 2
CYCLE 3
3 hours / cycle 5 hours for all !!
LA DR PA
LA DR PA
LA DR PA
24
Prefetch
Instruction
Instruction
Result
Instruction fetch
Execution
New address
Waiting
Waiting
Instruction
Instruction
Result
Instruction fetching
Execution
Denial
  • NOTE acceleration is smaller than double, as the
    memory access lasts longer than the instruction
    execution

25
Basic phases of the instruction cycle
  • Instruction fetching (FI)
  • Instruction decoding (DI)
  • Operands calculation (CO)
  • Operands fetching (FO)
  • Instruction execution (EI)
  • Writing outcome (WO)

1 2 3 4 5 6 7 8
9 10 11
FI DI CO FO EI WO
I1 I2 I3 I4
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
26
Branches and pipelining
1 2 3 4 5 6 7 8
9 10 11 12 13
FI DI CO FO EI WO
I1 I2 I3 I4 I5 I6 I21 I22
FI DI CO FO EI WO
FI DI CO FO
FI DI CO
FI DI
FI
FI DI CO FO EI WO
FI DI CO FO EI WO
27
Pipeline implementation algorithm
28
Problems of the pipelining
  • Subsequent pipe phases dont last the same amount
    of time
  • Transferring data between the buffers may
    significantly increase pipeline execution time
  • Dependency between the registers and memory in
    the pipeline optimization may be minimized with
    high stakes

29
Efficiency of the pipelining
Cycle execution time Time required to execute
all the instructions Instruction pipeline
acceleration ratio
30
Example of the pipeline efficiency
31
Modern Processors Pipelines
  • Pentium 3 10 stages
  • Athlon 10 stages for ALU, 15 stages for FPU
  • Pentium M 12 stages
  • Athlon 64/ 64 X2 12 stages for ALU, 17 stages
    for FPU
  • Pentium 4 Northwood 20 stages (hyperpipeline!!)
  • Pentium 4 Prescott 31 stages
  • Core2Duo 14 stages

32
Hazards
  • They are pipelining disturbances
  • There are data, resources and control hazards

33
Branch handling
  • Pipeline multiplication
  • Prefetch of the instruction
  • Loop buffer
  • Branch prediction
  • Delayed branch

34
Multiplied pipelining
  • Both instructions for simultaneous processing as
    a result of branch are loaded into two pipelines
  • The main problem is to gain memory access for
    both instructions

35
Prefetch and loop buffer
Prefetch
  • When branch instruction is decoded, the target
    instruction is fetched. It is stored until the
    branch is executed

Loop buffer
  • A buffer in memory to store the subsequent
    instructions is created
  • It is useful when there are conditional branch
    instructions and loops involved

36
Conditional Branch Prediction
  • Static
  • Never occuring branch (Sun SPARC, MIPS)
  • Always occuring branch
  • Operation code prediction
  • Dynamic
  • Occured/Didnt occur switch
  • Branch history table

37
Static prediction
  • The simplest, used as the fallback method, for
    instance in the Motorola MPC7450 processor
  • Pentium 4 allowed inserting the code suggesting
    if the static prediction should point at the
    branch or not (so-called prediction hint)

38
Dynamic prediction of the conditional branches
  • A conditional branch instruction history is
    stored
  • It is represented by the bits stored in the cache
    memory
  • Every instruction has its own history bits
  • Another solution is the table storing
    informations about the conditional branch result

39
History bits prediction
40
Branch history table
Branch instruction address History bits Target instruction




41
Local Branch Prediction
  • Requires a separate history buffer for each
    instruction, although the history table can be
    common for all instructions
  • Pentium MMX, Pentium 2 i 3 processors have local
    prediction circuits with 4 history bits and 16
    positions for every type of instruction
  • Local prediction efficiency is estimated at 97

42
Global Branch Prediction
  • A common history for all branch instructions is
    stored in memory. It allows to consider
    dependencies between different branch
    instructions
  • Rarely a better solution than the local
    prediction
  • Hybrid solutions shared unit of the global
    prediction and the history table (AMD processors,
    Pentium M, Core, Core 2)

43
Branch Prediction Unit
  • A processor circuit responsible for prediction of
    the disturbances in the sequential code execution
  • Often connected with the microoperation cache
    memory
  • In Pentium 4 processor, the buffer for the branch
    prediction has 4096, in Pentium 3 only 512.
    Therefore the former has a 33 percent better hit
    ratio than the latter

44
Location of the Branch Prediction Unit
Write a Comment
User Comments (0)
About PowerShow.com