Computer architecture - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Computer architecture

Description:

Title: Architektura komputer w Author: Piotr Bilski Last modified by: Piotr Bilski Created Date: 2/18/2006 11:51:16 PM Document presentation format – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 45

Provided by: Piot93

Category:

more less

Transcript and Presenter's Notes

Title: Computer architecture

1
Computer architecture

Lecture 6 Processors structure
Piotr Bilski

2
Procesors tasks

Instruction fetching
Instruction interpretation
Data fetching
Data processing
Data saving
These justify existence of the registers
(temporary memory space)

3
Internal processors structure
ALU
Status flags
Registers
Shifter
Complementer
Arithmetic and Boolean Logic
Control Unit
4
Block Scheme of Pentium 3 Processor
5
Block Scheme of P6 Core (Pentium Pro) 1995 r.

Front-end of the processor
Core
Completion unit

6
Register types

Accessible for the user (addressing, data etc.)
Inaccessible for the user (control, status)
This categorization is not formal!

7
Registers accessible by the user

General Purpose Registers (GPR)
Data
Addressing (segment pointer, stack, indexing)
Conditional codes (state pointer, flags)
read-only!

8
Control and state registers

Basic
Program Counter (PC)
Instruction Decoding Register (IR)
Memory Address Register (MAR)
Memory Buffer Register (MBR)
Program Status Word (PSW)
Interrupt Vector Register
Page Table Pointer

9
Program Status Word
0 3 4

15
P
R
OTHER
S
Z
O
I
N
S sign bit Z bit set, if operation result is
zero P carry bit R logical comparison result
bit O overflow bit I Enable/disable
interrupt execution N supervisor mode
10
Registers in the Motorola MC68000 processor

Data and address registers (32-bit)
Specialization 8 data registers (D0-D7) and 9
address registers (two used interchangeably in
the user and supervisor modes)
Control bus 24-bit, data bus 16-bit
A7 register used as a Stack Pointer (SP)
State register (SR)16-bit (another name CCR)
Program counter (PC) 32-bit
Instructions are stored under even addresses

11
Registers in the Intel 8086 Processor

16-bit address and data registers
Data/General Purpose Registers (AX, BX, CX, DX)
Pointer and index registers (SP, BP, SI, DI)
Segment registers (CS, DS, SS, ES)
Instruction pointer
State register

12
Intel 8086 Registers (cont.)
SP BP SI DI
Stack pointer
AX BX CX DX
Accumulator
Base pointer
Base
Source index
Counting
Displ. ndex
Data
13
Intel 386 - Pentium Processors Registers
Organization

32-bit data and address registers
Eight General Purpose Registers (EAX, EBX, ECX,
EDX, ESP, EBP, ESI, EDI)
For the backward compatibility, the lower part of
the registers are 16-bit registers
32-bit status register
32-bit instruction pointer

14
Floating-point registers of the Pentium processor

Eight 80-bit numerical registers
16-bit control register
16-bit state register
16-bit floating point register content type word
48-bit instruction pointer
48-bit data pointer

15
EFLAGS register
0
15
21
31
ZF
SF
TF
IF
DF
OF
IOF
NT
RF
ID
AF
VM
AC
VIF
VIP
CF
PF

TF trap flag
IF interrupt enable flag
DF direction flag
IOPL privileged input/output flag
RF resume flag
AC alignment control
ID identification flag

16
Registers in the Athlon 64 processor

Compatibility with x86-64 architecture (40-bit
physical address space, 48-bit virtual address
space)
Data and address registers 64-bit
8 general purpose registers (RAX, RBX, RCX, RDX,
RBP, RSI, RDI, RSP), work in the 32-bit
compatibility mode
Opteron contains additional 8 general purpose
registers (R8-R15)
16 SSE registers (XMM0-XMM15)
8 floating-point registers x87, 80-bit

17
Registers in the PowerPC processor

32 general purpose registers (64-bit) exception
register (XER)
32 registers for the floating point unit (64-bit)
state and control register (FPSCR)
Branch processing unit registers 32-bit
condition register, 64-bit counting and binding
registers

18
Instruction mode
Indirect addressing
Argument address calc.
Argument fetching
Instruction fetch
Multiple arguments
Multiple results
Argument address calc.
Instruction address calc.
Data operation
Writing argument
Instructiondecoding
No interrupts
Return to data
Instruction executed, fetch the next one
Indirect addressing
Interrupts checking
Interrupt handling
19
Instruction fetching cycle
Address bus
Control bus
Data bus
Processor
MAR
PC
Memory
CU
IR
MBR
20
Indirect mode
Address bus
Control bus
Data bus
Processor
MAR
Memory
CU
MBR
21
Interrupt mode
Address bus
Control bus
Data bus
Processor
MAR
PC
Memory
CU
MBR
22
Pipeline

Problem during the instruction cycle only one
instruction is processed
Solution divide the cycle into smaller fragments
Condition time instants, when no main memory
access is required!

Cycle 1 Cycle 2
Cycle 3
23
Pipeline example - laundry
3 hours / cycle 9 hours for all
LA DR PA LA DR PA LA DR
PA
CYCLE 1 CYCLE 2
CYCLE 3
3 hours / cycle 5 hours for all !!
LA DR PA
LA DR PA
LA DR PA
24
Prefetch
Instruction
Instruction
Result
Instruction fetch
Execution
New address
Waiting
Waiting
Instruction
Instruction
Result
Instruction fetching
Execution
Denial

NOTE acceleration is smaller than double, as the
memory access lasts longer than the instruction
execution

25
Basic phases of the instruction cycle

Instruction fetching (FI)
Instruction decoding (DI)
Operands calculation (CO)
Operands fetching (FO)
Instruction execution (EI)
Writing outcome (WO)

1 2 3 4 5 6 7 8
9 10 11
FI DI CO FO EI WO
I1 I2 I3 I4
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
26
Branches and pipelining
1 2 3 4 5 6 7 8
9 10 11 12 13
FI DI CO FO EI WO
I1 I2 I3 I4 I5 I6 I21 I22
FI DI CO FO EI WO
FI DI CO FO
FI DI CO
FI DI
FI
FI DI CO FO EI WO
FI DI CO FO EI WO
27
Pipeline implementation algorithm
28
Problems of the pipelining

Subsequent pipe phases dont last the same amount
of time
Transferring data between the buffers may
significantly increase pipeline execution time
Dependency between the registers and memory in
the pipeline optimization may be minimized with
high stakes

29
Efficiency of the pipelining
Cycle execution time Time required to execute
all the instructions Instruction pipeline
acceleration ratio
30
Example of the pipeline efficiency
31
Modern Processors Pipelines

Pentium 3 10 stages
Athlon 10 stages for ALU, 15 stages for FPU
Pentium M 12 stages
Athlon 64/ 64 X2 12 stages for ALU, 17 stages
for FPU
Pentium 4 Northwood 20 stages (hyperpipeline!!)
Pentium 4 Prescott 31 stages
Core2Duo 14 stages

32
Hazards

They are pipelining disturbances
There are data, resources and control hazards

33
Branch handling

Pipeline multiplication
Prefetch of the instruction
Loop buffer
Branch prediction
Delayed branch

34
Multiplied pipelining

Both instructions for simultaneous processing as
a result of branch are loaded into two pipelines
The main problem is to gain memory access for
both instructions

35
Prefetch and loop buffer
Prefetch

When branch instruction is decoded, the target
instruction is fetched. It is stored until the
branch is executed

Loop buffer

A buffer in memory to store the subsequent
instructions is created
It is useful when there are conditional branch
instructions and loops involved

36
Conditional Branch Prediction

Static
Never occuring branch (Sun SPARC, MIPS)
Always occuring branch
Operation code prediction
Dynamic
Occured/Didnt occur switch
Branch history table

37
Static prediction

The simplest, used as the fallback method, for
instance in the Motorola MPC7450 processor
Pentium 4 allowed inserting the code suggesting
if the static prediction should point at the
branch or not (so-called prediction hint)

38
Dynamic prediction of the conditional branches

A conditional branch instruction history is
stored
It is represented by the bits stored in the cache
memory
Every instruction has its own history bits
Another solution is the table storing
informations about the conditional branch result

39
History bits prediction
40
Branch history table
Branch instruction address History bits Target instruction

41
Local Branch Prediction

Requires a separate history buffer for each
instruction, although the history table can be
common for all instructions
Pentium MMX, Pentium 2 i 3 processors have local
prediction circuits with 4 history bits and 16
positions for every type of instruction
Local prediction efficiency is estimated at 97

42
Global Branch Prediction

A common history for all branch instructions is
stored in memory. It allows to consider
dependencies between different branch
instructions
Rarely a better solution than the local
prediction
Hybrid solutions shared unit of the global
prediction and the history table (AMD processors,
Pentium M, Core, Core 2)

43
Branch Prediction Unit

A processor circuit responsible for prediction of
the disturbances in the sequential code execution
Often connected with the microoperation cache
memory
In Pentium 4 processor, the buffer for the branch
prediction has 4096, in Pentium 3 only 512.
Therefore the former has a 33 percent better hit
ratio than the latter

44
Location of the Branch Prediction Unit

Write a Comment

User Comments (0)