Title: Arun Hariharan N.M.S.U
1 2MOTIVATION
- Need for high speed computing and Architecture
- More complex compilers (JAVA)
- Large Database Systems
- Distributed Computing on Internet
- Peer competition from other manufacturers
3GOALS OF ARCHITECTURE
- Overcome performance limiters
- Branches
- Memory Latency
- Sequential Program Model
- Long Architectural Life
- Large Register File
- Fully Interlocked Architecture Not tied to any
particular design - No Fixed Issue ex. Instructions length.
4REGISTER RESOURCES
- 128 65-bit General Registers (1 KB) ( 64
1NaT ) - 128 82-bit Floating Point Registers
- Space for up to 128 64-bit special-purpose
application registers (1 KB) - Eight 64-bit branch registers for function call
linkage and return - 64 one-bit predicate
5(No Transcript)
6(No Transcript)
7INSTRUCTION ENCODING
- Also called Template
- Helps to decode and route instruction
- Marks end of basic block
- Key Words
- Long life
- Instruction bundle
8(No Transcript)
9(No Transcript)
10DISTRIBUTING RESPONSIBILITY
- Shift a lot of the complexity to the compiler
- ILP
- Out-of-Order Execution
- Control Flow Parallelism
- Influencing Dynamic Events Learn hints from
compiler about branch prediction,
instruction/data caching pre-fetching.
11- ILP Instruction Level Parallelism
- Sequential In-Order execution was not enough to
have maximum parallelism - Out-of-order execution Compilers task to
creates instruction groups so that all
instructions in an instruction group can be
safely executed in parallel
12CONTROL FLOW PARALLELISM
- Traditional execution
- Compare a and 0
- Check flag if true
- Store flag value for further computation
- Compare b lt 5
- Check flag if true
- Store flag value for further computation
-
-
- Compare if any one had set the flag.
- Move 8 to r3
- In IA-64
- Initialize p1 to false
- Set compare conditions prerequisite
- Compare in parallel
- Branch
13FINDING AND CREATING PARALLELISM
BRANCHES LIMIT ILP Sequential, no-predict
normal bank teller Sequential, predict fill out
slip in advance (predict whether deposit or
withdrawal) Predicated Execution fill out both
slips, throw away whichever is wrong
14FINDING AND CREATING PARALLELISM (cont..)
Scheduling and Speculation Moving basic blocks
ahead of barriers - compilers task to find
possible route and schedule it instead of the
processor. Use of basic blocks (Define) Best
possible Route Most predicted flow of program
(speculation), not all instructions are
executed Compilers Have a birds eye view of
program, unlike the processor.
15CONTROL SPECULATION
Removing branches Expensive Not all can be
removed Moving basic blocks call cause
Exceptions
16DATA SPECULATION
17REGISTER MODEL
- 128 64bit registers of which 32 are fixed for
µP operations (like RISC) - 96 are free to compiler to use.
- Unlimited registers use possible as they are
paged to memory in background using the RSE
(Register Stack Engine) - Alloc to specify number for registers for
local and output (for parameters to calls. - Programs renames registers to start from 32 to
127.
18RSE (Register Stack Engine)
- Automatically saves/restores stack registers
without software intervention (Can work
synchronously) - Provides the illusion of infinite physical
registers by mapping to a stack of physical
registers in memory - Overflow Alloc needs more registers than
available needs more - Underflow Return needs to restore frame saved
in memory - RSE may be designed to utilize unused memory
bandwidth to perform register spill and fill
operations in the background - (Asynchronously - Speculatively to load and store
data)
19SOFTWARE PIPELINE
Time complexity is calculated by O(n) This
notation is used to count time spent in loops
That is because loops take most execution time
Time complexity is calculated by ____ ?
- Can we implement loops in parallel ?
- ANS Yes. If we resolve some problems.
- Managing the loop count,
- Handling the renaming of registers for the
pipeline, - Finishing the work in progress when the loop
ends, - Starting the pipeline when the loop is entered,
and - Unrolling to expose cross-iteration parallelism.
- IA-64 Solution
- Special architecture
- Loop count LC
- Epilog count EC
- Use of register rename base (rrb)
20(No Transcript)
21SUMMARY
- Synergy
- ILP by compiler and hardware
- Data and Control Speculation
- Multi-chip and multi-processing
- EPIC Explicit parallel instruction computing
22RISC Vs IA-64 Whitepaper by Intel HP(1999)
- RISC architectures claim to match many of the
features of IA-64 with similar sounding
instructions. However, just like a tank formed by
bolting weapons and armor to an old truck, the
benefits are limited to specific conditions, but
fall short in the heat of battle. - Existing RISC architectures that use cmoves and
similar instructions may remove branches, but at
the cost of adding so many instructions that the
benefits are nearly outweighed by the code-bloat
(hardly worth the trade-off). The reason why ILP
works with IA-64 is the use of completely new
architectural constructs such as predicates that
are not available to any existing RISC
architecture. - Traditional RISC architectures can use a
non-faulting load to avoid costly error
handling when loading data ahead of time which
may not be valid. But if you want to turn off the
errors, why have errors in the first place?
Traditional RISC architectures face one of two
alternatives add extra error-checking code
which, once again, cancels out the performance
benefit of speculative execution or work
without a net, risking disastrous undetected
errors due to turning off the error messages.
IA-64 gets around both problems by offering a
novel architectural approach to dealing with
errors when loading data.
23Benchmark comparison
24BACKWARD COMPATIBILITY
Intel promises compatibility with the 32-bit
software (IA-32). It should be possible to run
software in real mode (16 bits), protected mode
(32 bits) and virtual mode 86 (16 bits).
25(No Transcript)
26Questions?
REFERENCES
- Ricardo Zelenovsky and Alexandre Mendonca
Intel 64-bit Architecture 2001 - Bruce Jacob The IA-64 Architecture
University of Maryland (College Park) - Whitepaper IA-64 Architecture Innovations HP
Intel 1999 - Carole Dulong et al. - An overview of Intel
IA-64 Compiler - M. F. Guest - Intels Itanium IA-64 Processor
Overview and Initial Experience CLRC Daresburg
Laboratory