Title: Intel
1Intels P6Processor FamilyArchitecture
?a?a??t??? Ge?????? ?? 438 ?etapt???a??
?????f?????? 3? ?ate????s? ?????µ??e?
????te?t?????? ?p?????st?? ?e?µe???? ???µ??? 2001
2???????e?e? ?pe?e??ast?? Intel IA-32
- P6
- Pentium Pro
- Pentium II
- Celeron
- Pentium III
3?a µ??? t?? ???????e?a? P6
- Pentium PRO
- 133-200MHz
- 66MHz bus
- 0.5µ
- 256, 512 1?? L2 cache
- 88KB L1 cache
- Pentium II
- ? ?ata?a??t??? ??d?s? t?? PPRO
- ???te???? L2 Cache 512KB Slot 1 vs Socket
- 233-300MHz(Klamath)-450MHz (Deschutes)
- 66-100MHz bus
- 0.25µ
- ?a??te??? 16bit code
- 16KB L1 cache
- Celeron
- ??????µ??? e?d?s? t?? PII
- 66-100MHz bus
- 128-256 L2 Cache
- 266MHz 1.2GHz
- 0.25µ-0.18-0.13µ
- SIMD
- 32KB L1 Cache
- Pentium III
- 450550MHz(Katmai)1.33GHz (Coppermine)
- 133MHz bus
- 0.25µ 0.18µ-013µ
- SIMD epe?t?se??
- 256KB on chip cache
- Single Edge Contact Cartridge v2
4??ta????sµ??(?at? t?? eµf???s? t?? ?e???? P6)
- CPUs
- AMDs K5
- NexGens Nx586
- Cyrixs M1
- ??e??e?t?µata
- ?a??te?? pipelining
- Chip te???????a? 0.5µ a????? ?a? 0.35µ
- 133MHz CPU Clock
- On chip L2 cache 256KB
- St???e?a p?? eµfa?????ta? ap? a?ta????st??
- Micro-ops
- Dynamic Execution (out of order)
- Branch Prediction
5P6 S???pt???
6? d?af??? t?? a???te?t?????? P6
- S?µf??es? ??a???? / ???µ??
- ?p?t?????e? ße?t??s? t?? ap?d?s?? µe ?p?????se?
f????? te???????e? µ??µ??. - ??a??st?p???s? t?? ????s?? st? bus t??
s?st?µat??. ? P6 ???s?µ?p??e? ????te?? t?? 25
t?? bandwidth a?t?? gt ?e??ss?te?e? CPUs ?/?a?
IO. - S?µf??es? Se???a??? ??t??es?? ??t????
- Dynamic Execution Out Of Order, Data flow
Analysis, Branch Prediction, Speculative
Execution. - Deeper Pipelining
- ?e?t??s? ?atas?e???
- ?a?µ?? Sµ?????s??.
- ????s? s?????t?ta? ?e?t?????a? a????? e??te????
?a? ?pe?ta ?a? es?te???? .
7Level 3 Super scalar engine.
- ????a??e? t? µ??e??? t?? ???e f?s?? t?? pipeline
?a? a????e? ? a???µ?? t?? f?se?? - 3 ??????? µ???de? e?t??es??
- Fetch/Decode Unit
- Geµ??e? t? Instruction pool µe e?t???? µe ß?s?
t?? instruction pointer. Se pe??pt?s? branch
p??spa?e? ?a p??ß???e? st???. - Dispatch/execute Unit
- ?p? t? instruction pool e?te?e? ?p??a e?t???
??e? d?a??s?µ??? t??? te?est?? t??. Dataflow
analysis ??a ?a e?t?p?ste? ? se??? µe t?? ?p??a
?a e?te?este? ta??te?a t? p????aµµa - ?a ap?te??sµata de? ???st???p?????ta?
(speculative execution) - Retirement Unit
- ??et??e? t? Instruction Pool ??a e?t???? ??
?p??e? p??pe? ?a ???st???p??????? ?a? st???e? t??
a??a??? st? µ??µ?.
- ?p?d?se??
- Fetch/Decode µ???? 3 e?t???? a?? ?????
- Dispatch/execute µ???? 5 e?t???? a?? ????? (3
t?p???) - Retirement µ???? 3 e?t???? a?? ?????
8Non Blocking Caches
- Dual Ported Caches
- L1 Performance
- ??a LOAD ?a? µ?a STORE ta?t?????a se ???e ?????.
- Non-Blocking - 4 Stage Pipelined L2 Cache
- ? ap?t???a e??es?? µ?a? ?ata????s?? de? p???a?e?
pa?s? ?e?t?????a? t?? CPU. - 4 a?t?se?? e??p??et???ta? ?a? ep?p???? 12
µp????? ?a ß??s???ta? se a?aµ???. - Transaction Bus
- ?p?st????? d?s???????. 4 a?t?se?? µp????? ?a
e???eµ??? ??a ??a? p6 e?? 8 ??a t? s????? t??
d?a????. - ?e ?????ta? ?????? se a?aµ????.
- Out Of Order Bus
- ?? p??t?????? d?a?e???s?? t?? d?a???? µe
µ??a??sµ??? Retry Defer, se ??a MP pe??ß?????
ap?????eta? e?t?? se???? (Out Of Order). - 64GB Cacheable memory
9MP Ready
- ?p????????a µe L2 Cache se ?d??t??? d?a???
- ?pe?e?????s? ??a???? ?p???ste?? Hardware
?p?st?????? - MESI
- Snooping memory
- G?a t?? e?????t?ta t?? ?at?stas?? t?? cache.
- 4 CPUs ??a p???? e?µet???e?s? d?a????
- ??s?µ?t?s? APIC
- ?p???ste?? Hardware ?p?st??????
10P6 ???f??a
Dual Independent Bus Architecture (DIB) ??a????
es?te????? ?e?t?????a? a?e???t?t?? a?t?? t??
ep????????a? µe t? s?st?µa, ?? ?p???? ?e?t???????
se e?te??? d?af??et???? s????t?te?. L1 Code
Cache 4 way set associative, ???a??µ??? se
??aµµ?? t?? 32 bytes, pa?a??????e? t?? a?t?se??
Next_IP L1 Data Cache 2 way set associative
???a??µ??? se ??aµµ?? t?? 32 bytes, pa?a??????e?
t?? ?e???t??e? d?e????se?? t?? µ???d?? e?t??es??
e?t????. Registers 40 f?s???? ?ata????t?? t?? 64
bits ??a ???s? se renaming. 8 data registers
???????, ??at?? st?? p????aµµat?st?. 8 FP
Registers ?a? 8 MMX Registers ?? ?p????
ta?t????ta? µe t??? FP. 6 16 bit segment
registers ?a? ??a? flag register. Performance
Counters ?s?te????? ?ata????t?? p?? t?????
stat?st??? st???e?a.
11P6 Functional Units Diagram
- Key points
- In order Section
- Instruction Cache
- Instruction Fetch / Decode Units
- Data Cache
- Out of Order Section
- Execution Engine
- Micro-ops (load / store)
- Reservation Stations
12Pipeline Stages
??e???? branch, a??a?? Next ID
Commit
??t??es? ??t????
??t??s? pe??st??f? s?µe??s? 16 bytes
S?µe??s? ?a? ap?st??? se Instruction Pool
???µ???s? IP
?p???d???p???s?
?et???µas?a ?ata????t??
?p?fas? ?a? ???t?s? ap? IP e?t???? p??? e?t??es?
13Instruction Fetching
- Instruction Prefetcher
- Next IP ?????e? t?? ep?µe?? ??s? µ??µ?? ap? t??
?p??a ??te?ta? e?t???. - Instruction Cache ?pa?t? µe 16 st????sµ??a
bytes st?? a?t?s? t?? Next IP - 3 instruction Decoders pa?a?aµß????? ta 16 bytes
af?? pe??st?af??? ?a? s?µe?????? ?? e?t????
(start/end) - 128bit bus
14Instruction Decoders
- Decorders
- ??a? general decoder µ?a macro op a?? ?????.
- ??? simple decoders µ?a micro op a?? ?????. ??
s??a?t?s??? macro op gt stall ??a ??a ????? ?a?
µetaf??? t?? st?? general - Micro ops Macro ops
- Simple instructions of the register-register form
are only one micro-op. - Load instructions are only one micro-op.
- Store instructions have two micro-ops.
- Simple read-modify instructions are two
micro-ops. - Simple instructions of the register-memory form
have two to three micro-ops. - Simple read-modify write instructions are four
micro-ops. - Complex instructions generally have more than
four micro-ops, therefore they take multiple
cycles to decode.
15Instruction Decoding (µ-ops)
- ? Decoder µetaf???e? ???e IA32 e?t??? se ??a
t??ad??? µ-op - 2-logical sources, 1 logical destination
- S?????? 1 IA32 instruction -gt 1 µ-op
- ??? sp???a 1 IA32 instruction -gt 4 µ-op
- ???? sp???a Microcode Instruction Sequencer.
- Microcode µ?a se??? ap? p??e???aµ??e? a???????e?
ap??? µ-ops.
16Instruction Decoding (RAT)
- ?? µ-ops ap?st????ta? st?? RAT (Register Alias
Table) e? se??? - ???p??e?ta? register renaming ???????
?ata????t?? IA32 se f?s????? P6 - ?? e?t???? µ-ops pe??st???????ta? µe bits
?at?stas?? (allocator stage) - ?? ap?t??esµa p????e?ta? st? ???? a?aµ????
e?t???? (instruction pool) - Instruction pool ReOrder Buffer (ROB)
???p??e?ta? sa? content addressable memory t?? 40
e???af??
17Instruction Dispatch/Execution
- ?p??????ta? e?t???? ap? t? Instruction Pool µe
ß?s? t? status t???. - ?????eta? a? e??a? d?a??s?µ?? ?? te?est??
- ?????eta? a? e??a? d?a??s?µ? ? apa?t??µe?? µ???da
e?t??es??. - ? RS ap?µa????e? t?? e?t??? ap? t? ROB ?a? t??
p????e? st? µ???da. ?e? ??e? s?µas?a ? se??? st?
ROB. - 5 ported RS (?p?? st? s??µa)
- max 5 µ-ops per cycle, 3 sustained
- Data flow analysis (?e?d? FIFO)
18Instruction Dispatch/Execution (Branches)
- ??p??e? ap? t?? µ-ops e??a? branches.
- S?µe?????ta? (tagged) ?a??? e?s????ta? e?t??
se???? µe t? d?e????s? µetap?d?s?? ?a? t?
d?e????s? ap???????. - ?ta? t? branch p?a?µat??? e?te?este? s????????ta?
ta st???e?a µe e?e??a p?? ?p?t????a?. - Se ep?t???a t? branch ep??????eta? (retirement)
?a??? ?a? ??e? ?? e?t???? µeta?? a?t?? ?a? t??
ep?µe??? branch (speculative execution). - ?? BTB p??ß??pe? t?? pe??ss?te?e? a??? ??? ??e?.
?e????e? 512 e???af?? p??????µe??? branches ?a?
st????. - Se ap?t???a ? Jump Execution Unit (JEU)
ap?µa????e? ??e? t?? e?t???? ap? t? ROB ?a? t?
pipeline s??e???e? ap? t? ??a s?st? d?e????s?. - ??st??
- Not taken on hit ?aµ?a ep?pt?s?
- Taken on hit ?a??st???s? e??? ?????? (p??s?
fetch ?a? issue) - Mis-predicted e????st? ??st?? 9 ?????? (t?
µ???? t?? in order issue pipeline, ??a ????? IF
?a? t? ????? t?? ep?????s?? t?? s?st?? branch).
??p??? 10, µ???st? 26 ??????. - Static predictions, conditional ?a? unconditional
on-hit 5-6 ??????.
19Branch Prediction (Dynamic)
- 4 branch predictions/??aµµ? 128bits.
- 2 level adaptive(Yeh method).
- 4 bits p????f???a? a?? branch. ???ß??pe?
a???????e? branches. - Return Stack Buffer (RET instructions)
- ??e???? BTB ??a p??????µe?? e?t??es? t?? branch.
- ?? de? ?p???e? BTB entry ???eta? static
prediction gt e??µ???s? BTB. - ???s???? e?t???? CMOV.
20Branch Prediction (Static)
21Retire Unit
- ??et??e? t?? ?at?stas? t?? µ-ops st? ROB.
- ?se? ????? ?????????e? ?a p??pe? ?a ap?µa????????
ap? t? ROB. - ???pe? ?a f???t?se? ??a t?? ???? t???s? t??
se???? t??? sa? IA32 instructions ?a? µ???sta e?
µeta?? interrupts, traps, faults, breakpoints ?a?
mis-predictions. - ?a??? ap?fas??e? t?? ep?µe?e? e?t???? p???
ep?????s?, a?t?? t?? p????e?, in-order st??
Retirement Register file.
22Bus Interface Unit
- ??? t?p?? e?t???? loads (1 µ-op, address, width,
register) ?a? stores (2 µ-ops, µ?a ?e??? t?
d?e????s? ?a? µ?a ta ded?µ??a) - ??t? ta stores de? e?te????ta? speculatively
??at? de? ?p???e? d??at?t?ta undo. - ??t? ta stores de? a?as??t?ss??ta? µeta?? t???.
- ??a store e?te?e?ta? µ??? ?ta? ?a? ta d?? µ-ops
t?? e??a? ?t??µa ?a? de? e???eµ??? p??????µe?a
stores - ?? MOB e??a? ??t? sa? RS ?a? reorder buffer ??a
loads ?a? stores, t? ?p??? ep?t??pe? loads ?a
pe????? loads ?a? stores ?a? ta epa?e????e? ?ta?
e?de??µe?e? s?????e? µp???a??sµat?? a????ta?
(dependency of resources)
23St???e?a ??t??es??
- ??a load ?a? ??a store st?? ?d?a d?e????s?
µp????? ?a e?te?est??? st?? ?d?? ?????. - ? ?a??st???s? t?? stores de? ??e? µe???? s?µas?a
st?? ap?d?s? (3-5 ?at? t?? Intel). - Register Renaming ???e read e??? ???????
register a?af??eta? st?? ?d?? f?s???. ???e
ep?µe?? write a?af??eta? se ??? f?s??? register. - ?e? e??a? d??at?? ?a e?te?este? µ?a FMUL se ?????
aµ?s?? ep?µe?? t?? p??????µe??? t??. - FPU stages ?p?? P5, d??
- ?etat??p? te?est?? se es?te???? format
- ??t??es? ?e?t?????a? se ?????te?? ep?ped?
a???ße?a? - St???????p???s? ?a? µetat??p? te?est?? se
standard format - ??af??? sf??µat??
24Execution Modes
- Protected mode. The native state of the
processor. All instructions and architectural
features are available, providing the highest
performance and capability. Recommended mode for
all new applications and operating systems. Also
offers the ability to directly execute
real-address mode 8086 software in a protected,
multi-tasking environment. This (Virtual-8086)
mode is not actually a processor modebut a
protected mode attribute that can be enabled for
any task. - Real-address mode. Provides the programming
environment of the Intel 8086 processor, with a
few extensions (such as the ability to switch to
protected or system management mode). The
processor is placed in real-address mode
following power-up or a reset. From real-address
mode, only a single instruction is required to
switch to protected mode. - System management mode. A standard architectural
feature unique to all Intel processors, beginning
with the Intel386 SL processor. Provides an
operating system or executive with a transparent
mechanism for implementing platform-specific
functions such as power management. The processor
enters SMM the external SMM interrupt pin (SMI)
is activated or an SMI is received from the
advanced programmable interrupt controller
(APIC). In SMM, the processor switches to a
separate address space while saving the entire
context of the currently running program or task.
SMM-specific code may then be executed
transparently. Upon returning from SMM, the
processor is placed back into its state prior to
the system management interrupt.
25Addressing Modes
- Flat memory model memory appears to a program
as a single, continuous address space, called a
linear address space. Code (a programs
instructions), data, and the procedure stack are
all contained in this address space. The linear
address space is byte addressable. - Segmented memory model memory appears to a
program as a group of independent address spaces
called segments. Code, data, and stacks are
typically contained in separate segments. To
address a byte in a segment, a program must issue
a logical address (far pointer), which consists
of a segment selector and an offset. The segment
selector identifies the segment to be accessed
and the offset identifies a byte in the address
space of the segment. Up to 16,383 segments of
different sizes and types. The processor
translates each logical address into a linear
address to access a memory location,
transparently to the application program.
Increases the reliability of programs and
systems. - Real-address model same as the Intel 8086
processor, for backward compatibility. Uses a
specific implementation of segmented memory in
which the linear address space for the program
and the operating system/executive consists of an
array of equally sized segments. - In Protected Mode all modes can be used
- In Real and SM mode only the Real-address mode is
available
26???f??a T?µata
- Data flow analysis
- ?? pe??ss?te?e? t?? µ?a? e?t???? e??a? ?t??µe?
st? ROB p??? ap?st??? se µ?a µ???da e?t??es??
t?te ???eta? ep????? ß?se? ?e?d? FIFO a??????µ??
? ?at??????. - ECC
- Se ep??tas? t?? e?????? ?s?t?µ?a? sta s?µata t??
address bus ?a? t?? e?????? d?s???????,
?p?st????e? ??e??? ?a? d?????s? sfa??µt?? sta
data signals ?a? ??a t? d?a??? t?? L2 Cache a???
?a? t? d?a??? s?st?µat??. ?ts? p??stat????ta?
???s?µa ded?µ??a, af?? d????????ta? single bit
errors ?a? e?t?p????ta? ta double bit errors. ?a
sf??µata µp????? ?a ?ata???f??ta? ?a? ?ts? st?
s????e?a ?a e?t?p????ta? ?? ast???e? t??
s?st?µat??. - MMX
- Math Matrix Extensions. ??p?? SIMD.
- ?p??tas? t?? ßas???? ?epe?t????? e?t???? t?? x86
??a t? d?a?e???s? e????a? video ?a? ????, t?
?p??? eµfa???eta? st??? P5. - Streaming SIMD extensions
- 70 ??e? e?t???? p?? ep?t??p??? d?a?e???s?
e????a?, 3D ape?????s??, ????, video ?a?
a?a?????s?? f????. - Intel Processor Serial Number (PII ?a? PIII)
- ??s????a ??t??es?? Unaligned e?t????
- ??t???? p?? p????pt??? ap? 16bit ??d??a t??
pa?e????t??. ?????ase st? ??????? eµf???s? t??
PII - PGE (page global enable)
- ?p?t??pe? t? s?µe??s? se??d?? sa? Global (p?
kernel pages) ?ste ?a µ?? e??a?a?????ta? ??
e???af?? t?? TLB ?at? t? context switching.
27MESI MODEL
St???? ? pa?a???????s? t?? ?at?stas?? t?? ??aµµ??
t?? cache ????? t?? ?d??? ap?????? t???.
- Modified
- ?a ded?µ??a ????? a??a?te? ???? write hit. Ta
p??pe? ?a ??aft??? p?s? st?? ????a µ??µ?. - Exclusive
- ?a ded?µ??a p??a??tata pe??????ta? µ??? st?
t????? cache - Shared
- ?a ded?µ??a ß??s???ta? p??a??tata ?a? se ???a
caches a??? de? ????? ???e? a??a??? - Invalid
- ?????? ?at?stas? ?p?? ? ??aµµ? t?? cache e??a?
?????
28Associative Caches / CAM
- Associative
- ??a block a?t?st?????eta? se ?p??ad?p?te ??aµµ?
t?? cache - ? d?e????s? µetaf???eta? se tag ?a? word
- ?? tag µ??ad??? p??sd?????e? ??a block
- ?? tag ???e ??aµµ?? e????eta? ??a s?µpt?s? gt
?a??st???s? a?a??t?s?? - Set Associative
- ?????eta? se sets
- ???e set pe???aµß??e? ??p??e? ??aµµ?? (lines)
- ??a ded?µ??? block a?t?st?????eta? se µ?a ap? t??
??aµµ?? e??? ?a? µ??? set (n-way ?p?? n a???µ??
??aµµ??) ?? 2-way set associative pe????e? sets
t?? d?? ??aµµ??. ????ap?? blocks a?t?st?????? st?
?d?? set. - ?e???eta? d?aµµat??? ? ?????? a?a??t?s??
- CAM
- ?a ?d?a ded?µ??a pe?????f??? t? s?µe??
ap????e?s??, pe????????ta? t?? a????e? a?a??t?s?? - ??p??a ded?µ??a ????? ??a ?a? µ??? ??a s?µe??
ap????e?s?? t? ?p??? e??a? p???a????sµ??? - ?a pa??µ??a ded?µ??a ap????e???ta? se ?e?t??????
??se?? - ? ?a??? pa???e? ta ded?µ??a ?a? pa???e? t?
d?e????s? µ?sa se ??a µ??? ????? - ???sfata ???p??????a? hardware components CAM.
29?p?te??sµata
- 90 ???? p??ß?e?? branches a??µa ?a? ?ta? a?t?
ß??s???ta? se µe???? ß???? - 25 ???s? t?? bandwidth d?a???? s?st?µat??
- ?e?t??s? ep?d?se?? e?t???? a???a???
30S?????t???
31Rerefences
- Intels P6 Uses Decoupled Superscalar Design by
Linley Gwennap (Microdesign Resources vol 9 No 2
, 16 February 1995) - The P6 Architecture Background Information for
Developers by Intel Corporation
(1995-p6arc.pdf) - IA-32 Intel Architecture Software
DevelopersManual Volume 2 Instruction Set
Reference by Intel Corporation (2001
24547104.pdf) - Pentium Pro Family Developers Manual
Vol2Programmers Reference Manual , by Intel
Corporation (Dec 1995 - 24269101.pdf) - The Pentium Pro At 150, 166 , 180 and 200MHz,
by Intel Corporation (Jun 1997 - 24276905.pdf) - The Intel Architecture Optimization Manual, by
Intel Corporation (1997 -24281601.pdf) - Pentium II Developers Manual, by Intel
Corporation (Oct 1999 - 24350201.pdf) - The Intel Celeron Processor up to 1.1GHz
Datasheet, by Intel Corporation (Aug 2001 -
24365819.pdf) - The P6 Family of Processors Hardware
Developers Manual, by Intel Corporation (Sep
1998 - 24400101.pdf) - Pentium III Processor at 450 MHz to 1.13 GHz
Datasheet, by Intel Corporation (Jul 2000 -
24445208.pdf)
32Functional Units (e?a??a?t??? ???)