IA32 aus Systemarchitektursicht Jochen Liedtke Theo Ungerer SS 1999 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

IA32 aus Systemarchitektursicht Jochen Liedtke Theo Ungerer SS 1999

Description:

IA-32 aus Systemarchitektursicht. Jochen Liedtke. Theo Ungerer. SS 1999 ... continue to the reorder buffer (ROB) and to the reservation station unit (RSU) ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 25
Provided by: unge3
Category:

less

Transcript and Presenter's Notes

Title: IA32 aus Systemarchitektursicht Jochen Liedtke Theo Ungerer SS 1999


1
IA-32 aus SystemarchitektursichtJochen
LiedtkeTheo UngererSS 1999
  • Vorlesung Donnerstag 1545-1715 Uhr, Raum -102,
    Info.-Hauptgebäude (am 22.4., 29.4., 6.5. als
    Televorlesung!)
  • Sprechstunde Ungerer Donnerstag 1000-1130 Uhr,
    Raum 159, Geb. 20.20 Liedtke noch nicht
    bekannt
  • Infos http//goethe.ira.uka.de/ungerer/

2
Course Schedule
  • 15.4. IA-32 and Pentium II/III processor
  • 22.4. Memory hierarchy design cache1 (Hen/Pat
    ch. 5.1,5.2)
  • 29.4. Memory hierarchy design cache2 (Hen/Pat
    ch. 5.3,5.4,5.5)
  • 6.5. Cache -- Consequences for Systems
    Construction,
  • Sample Pentium II/III,
  • 20.5. Memory hierarchy design main memory
    (Hen/Pat ch. 5.6)
  • Cache Memory -- Consequences for
    Systems Construction
  • 27.5. Memory hierarchy design virtual memory
    (Hen/Pat ch. 5.7, 5.8, 5.9),
  • 10.6. The Segment System -- And Why Nobody Used
    It
  • The VM System -- Potential, Problems
    Tricks,
  • Sample processor Pentium II/III

3
Course Schedule
  • 17.6. Bus system (memory vs. I/O bus)
  • 24.4. Chip sets, board design and PCI bus
  • 1.7. Operating system interface, UNIX file
    system? (Hen/Pat ch. 6.6-6.8)
  • 8.7. Kernel User -- HW Support and Annoyance
  • Multiprocessor Systems -- Support, Problems
    Limitations
  • - Virtualizing a Machine?

4
Literature
  • J. L. Hennessy, D. A. Patterson Computer
    Architecture A Quantitative Approach Morgan
    Kaufmann Publishers, 2nd Edition 1996
  • Intel Pentium II Processor Developers Manual,
    October 1997.
  • Intel Intel Architecture Software Developers
    Manual, Vol. 1-3, 1997.
  • B. Shriver, B. Smith the Anatomy of a
    High-Performance Microprocessor - A Systems
    Perspective IEEE Computer Society Press 1998

5
Todays Lecture
  • IA-32 Intel Pentium II/III

6
The Intel P5 and P6 family
7
Micro-Dataflow im PentiumPro 1995
  • ... The flow of the Intel Architecture
    instructions is predicted and these instructions
    are decoded into micro-operations (uops), or
    series of uops, and these uops are
    register-renamed, placed into an out-of-order
    speculative pool of pending operations, executed
    in dataflow order (when operands are ready), and
    retired to permanent machine state in source
    program order. ...
  • R.P. Colwell, R. L. Steck A 0.6 ?m BiCMOS
    Processor with Dynamic Execution, International
    Solid State Circuits Conference, Feb. 1995.

8
PentiumPro and Pentium II
  • The PentiumPro, Pentium II and III processors use
    basically the same dynamic execution (i.e.
    out-of-order superscalar) microarchitecture
    principles.
  • Three-way superscalar, pipelined
    micro-architecture.
  • Decoupled, multi-stage superpipeline,
  • Pentium II has twelve stages (with a pipestage
    time 33 percent less than the Pentium
    processor) gt a higher clock rate on any given
    manufacturing process. gt less work per pipe
    stage for more stages.
  • A wide instruction window using an instruction
    pool.
  • Execute phase is replaced by decoupled issue,
    execute, and retire phases.
  • gt instruction execution is started in any order
    but always be retired in the original program
    order.
  • Processors in the P6 family may be thought of as
    three independent engines coupled with an
    instruction pool.

9
PentiumPro Processor and Pentium II
Microarchitecture
10
Pentium II
11
Pentium II The In-Order Section
  • The instruction fetch unit (IFU) accesses a
    non-blocking I-cache and contains Next IP unit.
  • The Next IP unit provides the I-cache index
    (based on inputs from the BTB), trap/interrupt
    status, and branch-misprediction indications from
    the integer FUs.
  • Branch prediction
  • two-level adaptive scheme of Yeh and Patt,
  • BTB contains 512 entries, maintains branch
    history information and the predicted branch
    target address.
  • Branch misprediction penalty at least 11 cycles,
    on average 15 cycles
  • The instruction decoder unit (IDU) is composed of
    three separate decoders,
  • A decoder breaks the IA-32 instruction down to
    mops, each comprised of an opcode, two source and
    one destination operand. These mops are of fixed
    length.
  • Most IA-32 instructions are converted directly
    into single micro ops (by any of the three
    decoders),
  • some instructions are decoded into one-to-four
    mops (by the general decoder),
  • more complex instructions are used as indices
    into the microcode instruction sequencer (MIS)
    which will generate the appropriate stream of
    mops.

12
Pentium II The In-Order Section (Continued)
  • The mops are send to the register alias table
    (RAT) where register renaming is performed,
    i.e., the logical IA-32 based register
    references are converted into references to
    physical registers.
  • Then, with added status information, mops
    continue to the reorder buffer (ROB) and to the
    reservation station unit (RSU).

13
The Fetch/Decode Unit
14
The Out-of-Order Execute Section
  • When the mops flow into the ROB, they effectively
    take a place in program order.
  • mops also go to the RSU which forms a central
    instruction window with 20 reservation stations
    (RS), each capable of hosting one mop.
  • mops are issued to the FUs according to dataflow
    constraints and resource availability, without
    regard to the original ordering of the program.
  • After completion the result goes to two different
    places, RSU and ROB.
  • The RSU has five ports and can issue at a peak
    rate of 5 mops each cycle.

15
Latencies and throughtput for Pentium II FUs
16
Issue/Execute Unit
17
The In-Order Retire Section.
  • A mop can be retired
  • if its execution is completed,
  • if it is its turn in program order,
  • and if no interrupt, trap, or misprediction
    occurred.
  • Retirement means taking data that was
    speculatively created and writing it into the
    retirement register file (RRF).
  • Three mops per clock cycle can be retired.

18
Retire Unit
19
The Pentium II Pipeline
20
Pentium Pro Processor Basic Execution
Environment
21
Application Programming Registers
22
Multimedia Unit- Typical Instruction Execution
  • Therefore called SIMD principle or subword
    parallelism

23
MMX TECHNOLOGY
  • Eight MMX registers (MM0 through MM7).
  • Four MMX data types (packed bytes, packed words,
    packed doublewords, and quadword).
  • The MMX instruction set.
  • The MMX technology uses the single instruction,
    multiple data (SIMD) technique for performing
    arithmetic and logical operations on the bytes,
    words, or doublewords packed into 64-bit MMX
    registers.
  • For example, the PADDSB instruction adds 8 signed
    bytes from the source operand to 8 signed bytes
    in the destination operand and stores 8
    byte-results in the destination operand.
  • The SIMD technique allows the same operation to
    be carried out on multiple data elements in
    parallel. The MMX technology supports parallel
    operations on byte, word, and doubleword data
    elements when contained in MMX registers.

24
Pentium II Offsprings.
  • Pentium III (Feb. 99) formerly code-named
    Katmai, initially at 450 MHz (0.25 micron
    technology) and at 500 MHz.
  • two 32 kB primary caches, faster floating-point
    performance
  • the ISSE (internet streaming SIMD extensions)
    formerly Katmai new instructions (KNI)
    instruction set, which includes floating-point
    SIMD instructions and 128-bit floating-point
    SIMD registers to accelerate 3D graphics.
  • Coppermine will be a shrink of Pentium III down
    to 0.18 micron.
  • Cascades will be a cheaper version of Pentium III
    Xeon with clock speed of more than 600 MHz and
    on-die 256 kB L2 cache.
  • For mid-2000 Intel expects to launch Merced,
    first member of the Intel's P7 family of 64-bit
    processors based on the EPIC.
Write a Comment
User Comments (0)
About PowerShow.com