Title: Instruction Set Architecture Overview Target ISA: Intel Itanium IA64 Itanium 2
1Instruction Set Architecture OverviewTarget
ISA Intel? Itanium? IA-64 (Itanium 2)
CECS 440, Spring 2003
- Team
- James Callahan
- Charles Pickman
Date May 5, 2003 Class MW 7-750 PM, Professor
G. C. Hill
2Contents
- Section Page
- Introduction Overcoming CPU Bottlenecks---------
--3 - Introduction Itanium? Chronology----------------
--4 - Introduction Technology Roadmap-----------------
--5 - Introduction Photos-----------------------------
--6-7 - Introduction Exploded Packaging and
Concept-------8 - Introduction Articles---------------------------
--9-11 - Introduction - Overview, EPIC---------------------
--12 - Introduction Implemented in Lite and Not
---------13 - Introduction - Hardware Architecture--------------
--14 - Itanium Branches and Predication
-----------------15 - Itanium General Instruction Format--------------
--16-17 - Itanium Using Predication to Eliminate
Branches---18-19 - Itanium Memory Hierarchy------------------------
--20 - Itanium Speculation-----------------------------
--21-27 - ISA Classification--------------------------------
--28 - Register Set Integer----------------------------
--29 - Data Types ---------------------------------------
--30 - Addressing Modes----------------------------------
--31
3Overcoming CPU bottlenecks
- Why 64 bits?
- VLSI technology is increasing the number of
transistors available on a single die. - Compiler technology is very advanced now,
however, it still has some limitations. - Multithreading is becoming more pervasive.
- "Media-rich" means parallelism.Â
- Modularity and scalability will become
increasingly important. - Goals for Intels next generation CPUs
- Simplicity
- Extensibility
- Parallelism
- Compiler-oriented
- 64-bit computing
- Extremely large file support
- Extremely large physical memory support
- A huge virtual address space for applications
- 64-bit computation
4Introduction - Itanium? Chronology
- 1994 - Intel and Hewlett Packard work together on
Itanium (codename Merced) - 1999 - Prototypes were promised to be released
mid-year 1999. - 2000 Demonstrated a 4-CPU Itanium at Linux
World, rollout delayed until 2001. - 2000 units shipped for demonstration and 500
units sold - 2001 Itanium 2 (codename McKinley) is due to
arrive in late 2001, eclipsing the first Itanium
rollout. - 2002 Reported cost of Itanium development is
over 1 Billion - Federal patent suits find Intel guilty of using
Intergraph technology on Itanium - 2003 Supercomputing applications finally
kick-in and show what this bold new Intel
architecture can do!
5Introduction - Intel? Itanium? Processor Family
Roadmap
6Introduction - Photos
Itanium-1 (L3 Cache External to Die)
Itanium-2
7Itanium? CPU Layout
8Itanium? Exploded Packaging and Concept
- Designed to take complexity away from processor,
and making the programmer, compiler and assembler
more complex. - 3x5 cartridge
- CPU L3 cache
- 130W Power
- 420mm2
- Transistors
- CPU 25 million,
- L3 Cache 300 million
9One of Many Supercomputer Itanium Articles
Intel Itanium Architecture to be Foundation for
One of World's Most Powerful Scientific Computing
SystemsAugust 9, 2001 3300 Intel Processors to
be Linked in a System Capable of Calculating More
Than 13.6 Trillion Operations Per Second Intel
today announced that its Itanium family of
processors will be used to build a distributed
scientific computing system expected to be the
largest of its kind in the world. The computing
system, dubbed the "TeraGrid," is part of a 53
million award by the National Science Foundation
(NSF) to four facilities to address complex
scientific research by creating a Distributed
Terascale Facility (DTF). The TeraGrid will link
computers powered by more than 3,300 Intel
Itanium family processors. It will be capable of
more than 13.6 trillion calculations per second
(13.6 teraflops) and have the ability to store,
access and share more than 450 trillion bytes of
information. The TeraGrid will be accessible to
researchers across the United States so that they
can more quickly analyze, simulate and help solve
some of the most complex scientific problems.
Examples of research areas include molecular
modeling for disease detection, cures and drug
discovery, automobile crash simulations, research
on alternative energy sources and climate and
atmospheric simulations for more accurate weather
predictions. "The Itanium processor family is
bringing a new level of performance, scalability
and lower costs to high-performance computing,"
said Abhi Talwalkar, Intel vice president and
assistant general manager, Enterprise Platforms
Group. "Today's NSF award is a major show of
support for Itanium technology. All of us at
Intel are proud of the role our products play in
helping to advance the progress of scientific
discovery." The system announced today has been
dubbed "TeraGrid" due to its speed, distributed
design and deployment across multiple networked
geographic sites. It will achieve "tera"
performance with its ability to calculate
trillions of floating point operations per second
(teraflops) and store trillions of bytes
(terabytes) of data. The grid is a resource for
researchers to mutually access the system and
collaborate using shared computing hardware,
software and information. Expected to be
available in 2002, the TeraGrid is planned to be
the most comprehensive distributed scientific
computing infrastructure of its kind. It will
build upon an existing one-teraflops solution
with more than 300 Itanium processors now being
deployed at the National Center for
Supercomputing Applications (NCSA). The TeraGrid
will be based on both Intel's Itanium and
"McKinley" processors. McKinley is the code name
for the second product in Intel's Itanium
processor family, due in 2002. The largest
portion of the DTF computing power will be at the
NCSA at the University of Illinois in
Urbana-Champaign. NCSA has three DTF partners
which will also deploy Itanium systems the San
Diego Supercomputer Center (SDSC) at the
University of California, San Diego Argonne
National Laboratory in suburban Chicago and the
California Institute of Technology in
Pasadena. The system will consist of clustered
IBM servers running the Linux operating system,
and will be connected by a Qwest high-speed
optical network. In addition to providing the
processors powering the IBM systems, Intel will
supply the TeraGrid with key compilers, software,
tools and engineering design, and tuning support
services. The Itanium architecture design
enables breakthrough capabilities in processing
terabytes of data at high speeds and processing
complex computations. Itanium-based solutions are
providing the highest levels of floating-point
performance for complex, numerical-intensive
applicationssurpassing many of the best
RISC-based results and benchmarks to date. The
Itanium processor's floating-point engine enables
up to 6.4 billion operations per second and
includes increased system memory bandwidth.
Intel, the world's largest chip maker, is also a
leading manufacturer of computer, networking and
communications products. Additional information
about Intel is available at http//www.intel.com/p
ressroom/. Intel is a registered trademark and
Itanium is a trademark of Intel Corporation.
Third party marks and brands are property of
their respective holders.
http//www.teragrid.org/news/080901_intel.html
10Recent Itanium? Articles
April 10, 2003
http//www.businessweek.com/technology/cnet/storie
s/996357.htm
Itanium gets supercomputing software Researchers
build full Itanium support into software that can
be used to assemble supercomputers out of
clusters of Linux computers. Researchers at
the National Partnership for Advanced
Computational Infrastructure have built full
Itanium support into software that can be used to
assemble supercomputers out of clusters of Linux
computers. Version 2.3.2 of the NPACI Rocks
software, code-named Annapurna, is the first
version to support Itanium, Intel's high-end
processor, NPACI said in a statement Thursday.
The software makes it easier to install the Linux
operating system on numerous computers despite
differences between each machine. There already
was an Itanium version of the Rocks software, but
it didn't include all the software components of
the version for computers using Intel's Pentium
and Xeon or Advanced Micro Devices' Athlon chips.
The move will make it easier for Rocks users to
add Itanium systems into clusters that use the
other chips, according to Philip Papadopoulos,
program director for the San Diego Supercomputing
Center's (SDSC) grid and cluster computing group.
Because Itanium understands a completely
different set of instructions from lower-end
Intel processors, software must be completely
rebuilt for the newer chips. That barrier has
hindered adoption of Itanium in broad business
markets, but it's been less of a problem in the
supercomputing niche, where customers often
control their own software instead of relying on
products such as Oracle's database or Computer
Associates' management software. Indeed,
Gartner analyst John Enck said in a March 26
report that Itanium systems are fine for
supercomputing clusters and will expand this year
to some mainstream markets. "Gartner believes
(the Itanium processor family) is safe for
high-performance computer clusters immediately
and will be ready for mainstream database use on
all operating systems by year-end 2003," Enck
said. "Other application usage models will
quickly follow." The NPACI Rocks software is
being used at a host of academic and government
sites, including Northwestern University, Pacific
Northwest National Laboratory, the Scripps
Institution of Oceanography, Stanford University
and the University of Macedonia. Rocks is an
open-source program that's developed by the NPACI
at the SDSC by the University of California at
Berkeley, Singapore Computing systems and
individual programmers. It's based on Red Hat
Linux version 7.3. The program includes cluster
software for tasks such as sending messages from
one computer to another, monitoring each system's
performance and scheduling jobs across the
cluster. By Stephen Shankland, Staff Writer,
CNET News.com
11Recent Itanium? Articles
Thursday, Apr 24 _at_ 1631 PDTSingapore - The
Linux Competency Centre at Singapore Computer
Systems (SCS-LCC) has commissioned a new
60-processor CPU Intel Itanium 2-based cluster
for the Singapore-MIT Alliance (SMA) at the
National University of Singapore. The SMA
cluster, named HydraIII, is the first large-scale
Intel Itanium 2-based Beowulf cluster to be
deployed into production using the open-source
Rocks cluster toolkit, whose development is led
by the San Diego Supercomputer Center. The
cluster was installed with Rocks and had
applications running in less than a day. "The
rapid deployment by SCS of the HP system
demonstrates that 64-bit high performance
clusters are now as easy to build as 32-bit x86
processor systems, said Leslie Ong, Director, HP
Business Critical Systems, South East Asia. "Such
efficiency in rollout underscores the growing
momentum to move to open standards from
proprietary systems in the scientific community,
he added. "The increasing demand for
high-performance computing power will be a major
driver of computing innovation throughout the
next decade. We expect clusters and grids using
the open standard Intel Itanium processor family
to deliver the performance and affordability
required by the industry," said William Wu,
Itanium processor family marketing manager, Asia
Pacific. HydraIII cluster supports about 50 SMA
researchers and post-graduate students involved
in various projects, ranging from computational
fluid dynamics to bio-engineering. The cluster
consists of fifteen HP rx5670 nodes, each with
four Itanium 2 processor, and is interconnected
with a high-performance, high-bandwidth,
low-latency switching system from Myrinet. The
cluster's operating system software is Red Hat
Linux, managed by the tools of NPACI Rocks
version 2.3.2. Current Linpack performance
achieves around 70 of theoretical peak
processing power (240GFLOPS) at 167GFLOPS over
the Myrinet interconnect. "We are very pleased
with the performance and ease of management of
the Rocks-based Itanium 2 cluster," said Prof.
Khoo Boo Cheong, Program Co-Chair of High
Performance Computation for Engineered Systems at
SMA. "We intend to encourage more researchers to
migrate to HydraIII over the next few months. The
technical expertise and assistance that the
SCS-LCC team has provided to us made a huge
difference to our transition to 64-bit Linux
parallel computing." "The team took less than a
day to install the cluster with Rocks and getting
the cluster operational. This is a testimony to
the amount of work that has gone into making
Rocks one of the best and easiest to use cluster
toolkits in the world," said Laurence Liew,
manager of the SCS Linux Competency Centre.
"SCS Linux Competency Centre collaborates
closely with the San Diego Supercomputer Center
on NPACI Rocks and provides critical support in
the areas of file systems and queuing systems,"
said Dr Philip Papadopoulos, program director for
SDSC's Grid and Cluster Computing group. "The
Rocks user community benefits greatly from SCS'
expertise and their significant contributions to
this community toolkit."
http//www.supercomputingonline.com/article.php?si
d1392
12Overview EPIC (Explicitly Parallel Instruction
Computing)
- Designed to take complexity away from processor
hardware, and making the programmer, compiler and
assembler more complex. - Much of the parallelism is handled by the
compiler with hardware support - The compiler can spend days with many resources
optimizing (parallelizing) the code at the vendor - All the runtime user applications benefit from
optimal parallel code, so IA-64 does not need to
optimize at runtime - Many hardware and compiler driven methods are
used to speedup operation - A large (10-stage) pipeline increases speed, but
requires accurate branch prediction, this is a
important reason why predication is provided
(explained later) - Branch misses are very difficult to repair
because of the large pipeline - Predication simply uses a 1-bit predicate
register to allow either branch of an if
statement to take effect, both branches of all
predicated if statements are run concurrently. - Predication allows both branch streams to be
merged into a single stream, elminating branches
and misses which need to be corrected - Many functional hardware units are available for
performing operations in parallel. - The instructions are bundled into groups of 3
instructions with a added 5-bit template for a
complete 128-bit instruction bundle. - The 3 instructions in the bundle are determined
to be non-interfering by the compiler - Speculative loads allows operands to be fetched
in advance, removing memory access latency
13Overview Itanium Lite Implemented/ Not
- Product Features Implemented
- Â Â Â Â Â Â Â Â IA-64 ISA
- Â Â Â Â Â Â Â Â RISC Instruction Set
- Â Â Â Â Â Â Â Â Predication (note all instructions take
1 clock cycle to execute) - Â Â Â Â Â Â Â Â Control Speculation
- Â Â Â Â Â Â Â Â Branch adder
- Â Â Â Â Â Â Â Â Physical Register Subset, 32 registers
each 64-bits - Â Â Â Â Â Â Â Â Split L1 Cache for instructions and
data, each has independent non-blocking main
memory access - Â Â Â Â Â Â Â Â Instructions are 41-bit fixed-format
- Â Â Â Â Â Â Â Â Delayed Branch for branches with NOP
insertion, in anticipation of being pipelined in
the future ref. 6, p. 558 - A single NOP insertion is adequate as placeholder
after all conditional branches to avoid
performing unintended instructions - When the pipeline is eventually implemented the
placeholder NOPs can be replaced with sufficient
number of NOP insertions - Features Not Presently Supported
- Â Â Â Â Â Â Â Â IA-32 ISA
- Â Â Â Â Â Â Â Â Pipeline
- Â Â Â Â Â Â Â Â Floating point
- Â Â Â Â Â Â Â Data Speculation
- Â Â Â Â Â Â Â Â Multiple execution units
14Introduction Hardware Architecture
15Branches and Predication
- Traditional Architectures
- Intel estimates that 20 to 30 of processor
performance is eaten up by branch
miss-predictions. - Branches limit your freedom to schedule the code
for optimum performance. - If-Then-Else Conditional Statement.
- Could evaluate the If, then depending on the
outcome process the Then or the Else paths. - Alternative is to use Branch Prediction. While
waiting for the If, just guess which branch and
execute it. - If you get it right, you haven't wasted any time
if you get it wrongthat's where that 20-30
performance hit comes into effect. But even
assuming you get it right, you might still have a
number of execution slots going to waste.
PredicationEPIC deals with the problems which
branching introduces by just getting rid of
branches whenever it can. When IA-64 comes upon a
conditional branch, instead of trying to predict
which branch the program will take, it just takes
them both. To understand how this process works,
it's best to look at an example.
16Intel Itanium Instruction Format
- A typical Itanium instruction is a three operand
instruction, with the following syntax - (qp) mnemonic.comp1.comp2 dests srcs
- Some examples of different Itanium instructions
- Simple Instruction add r1 r2, r3
- Predicated instruction (p4)add r1 r2, r3
- Instruction with immediate add r1 r2, r3, 1
- Instruction with completer cmp.eq p3 r2, r4
17Intel Itanium Instruction Format
- (qp) A qualifying predicate is a predicate
register indicating whether or not the
instruction is executed. When the value of the
register is true (1), the instruction is
executed. When the value of the register is false
(0), the instruction is executed as a NOP. - Instructions that are not explicitly preceded by
a predicate, assume the first predicate register,
p0, which is always true. Some instructions
cannot be predicated. - mnemonic A unique name identifying the
instruction. - comp1comp2 Some instructions may include one
or more completers. Completers indicate optional
variations on the basic mnemonic. - dests, srcs Most Itanium instructions have at
least two source operands and a destination
operand. Source operands are used as input.
Typically, the source operands are registers, or
immediates. The destination operand(s) is
typically a register to which the result is
written.
18Using Predication to Eliminate Branches
- Predication is the conditional execution of
instructions based on a qualifying predicate. - When the predicate is true (1), the instruction
is executed. - When it is false (0), the instruction is treated
as a NOP. - Predicates are set by various instructions,
including the compare instructions. - Predication enables you to convert a control
dependency to a data dependency, thus eliminating
branches in the code. -
These code examples show the control flow of code
with and without predication. In the predicated
code example below, a data dependency exists
between the cmp and the two predicated
instructions, which execute in parallel.
Predicated Code movl r1,type ld4 r1
r2 cmp.eq p1,p2, a r2 cmp.eq p3,p4, b
r2 (p1) add r2 10, r2 (p3) add r2 20,
r2 st4 r1 r2 default
C Code Example switch (type) case 'a'
type type 10 break case 'b'
type type 20 break default break
19Predication Summary
- All conditional instructions are predicated
- Avoids short branches that inject bubbles into
the pipeline - Executes both branch paths simultaneously
- Discards irrelevant path as predicate is
evaluated - Delays final result effect, so allows time to
resolve qualifying predicates - Example 1
- Original code Predicated Pseudo-code Predicated
Code - r1 r2 r3 if (p5) r1 r2 r3 (p5) add r1
r2, r3 -
- Example 2
- Original code Predicated Pseudo-code Predicated
Code - if (agtb) c c 1 pT, pF compare(agtb) cmp
pT, pF ra, rb - else d d e f if (pT) c c 1 (pT) add
c 1, c - if (pF) d d e f (pF) shladd d d, e,
f
20Memory Hierarchy
- A solution to obtaining quick memory access
relies on locality of reference - most programs do not access all code or data
uniformly - Generally smaller hardware is faster than larger
hardware - Faster hardware is expensive
- Any Instruction Load or Data Load can take a
large number of CPU clocks (large amount of time
or latency) - Speculation (pre-fetching) reduces effective
access time of instructions and data
21Speculation
- Fast processor speeds are of limited value if
computational registers sit idle while the
processor retrieves required data from memory - Speculation allows the compiler to identify
future data needs, so essential data can be
pre-loaded into the processor - This technique can significantly reduce or
eliminate processor wait times - There is no 100 guarantee that any speculative
attempt to perform either an instruction
(control) or data fetch ahead of time will be
successful - Many hardware / ISA attempt to reduce negative
impacts of bad speculations
22Control Speculation
- Load transfers data stored in memory to a general
register and can take a long time - The data transferred can either be software
instructions from a program or purely data - To reduce effective access time special
mechanisms are provided to allow for
compiler-directed speculation
- Control speculation is compiler optimization
- An instruction or sequence of instructions are
executed before it is known (exactly) that the
dynamic control flow of the program will actually
reach the point in the program where the sequence
of instructions are needed. - Starting execution early allows compiler to
overlap the execution with other work, increasing
parallelism and decreasing overall execution
time. - This optimization is performed when it is
determined that the calculation will be required - In cases where control flow does not need the
calculation, the results are discarded or not
used - Since the speculative instruction sequence may
not be required after all, then any exceptions
should be delayed until the actual sequence is
known to be required - A mechanism is provided for these exceptions to
be recorded and deferred, to be signalled later - A special token is written into the target
register extra bit, NaT (Not a Thing).
23Control Speculation
- Instructions are either speculative and
non-speculative - Non-speculative instructions will raise
exceptions immediately and are unsafe to be
scheduled before they are known to be executed - Speculative instructions defer exceptions, so can
be scheduled before they are needed - At the point in the program where it is known
that the speculative calculation result is
necessary, then a speculation check (chk.s)
instruction is used - The check is made for the deferred exception
token in NaT. - If no deferred exceptions are found than the
speculative calculation was successful and
execution continues normall - If a deferred exception token is found, then the
speculative calculation was unsuccessful and must
be re-done, this time by branching to a new
address - A branch is taken to a new address with a
non-speculative version of the same code - On this second try to run the code the exceptions
are handled normally (non-speculative) - Original code Speculated code
- if (agtb) load(ld_addr1, target1) sload(ld_addr1,t
arget1) - else load(ld_addr2, target2) sload(ld_addr2,targ
et2) - / other operations including uses of
target1 and target2 / - if (agtb) scheck(target1, recovery_addr1)
else scheck(target2, recovery_addr2)
24Control Speculation
- Computational instructions do not generally cause
exceptions - The only instructions which generate deferred
exception tokens are speculative loads - Other speculative instructions propagate deferred
exeption tokens, but do not generate them - Compare instructions (cmp and tbit) read general
registers and write one or two predicate
registers - If any source contains a deferred exception
token, all predicate targets are either cleared
or left unchanged. - Software uses this method to ensure any dependent
conditional branches are not taken and any
dependent predicated instructions are nullified - Deferred exception tokens can also be tested
using test NaT (tnat) - Tnat tests the NaT bit corresponding to the
specified general register and writes two
predicated results - A non-speculative instruction that reads a
register containing a deferred exception token
will raise a Register NaT Consumption fault. - Such instructions are thought of as performing a
non-recoverable speculation check operation - The operating system also has control over
exception deferral - The O/S has option to select which exceptions are
deferred automatically in hardware - Other exceptions may be handled (and possibly
deferred) by software - Special Register Spill and Fill instructions both
store and load a register to memory which
preserve any deferred exception token.
25Data Speculation
- Similar to control speculation, allows compiler
to schedule instructions across some types of
ambiguous data dependencies. - An ambiguous data or memory dependency exists
between a store, which updates the memory state,
and a load from memory to registers when it
cannot be determined whether the load and store
might access overlapping regions of memory.
- A store that cannot be disambiguated relative to
a particular load is said to be ambiguous
relative to that load. - In such cases, the compiler cannot change the
order in which the load and store instructions
were originally specified in the program. - To overcome this scheduling limitation a special
kind of load instruction called an advanced load
can be scheduled to execute earlier than the one
or more stores that are ambiguous relative to
that load.
26Data Speculation
- The compiler can also speculate operations that
are dependent upon the advanced load and later
insert a check instruction to determine if the
speculation was successful or not - For data speculation, the check can be placed
anywhere the original non-speculative data load
would have been scheduled. - A data speculative sequence of instructions
consists of an advanced load, zero or more
instructions dependent on the value of that load,
and a check instruction. - Original code Speculated code
- store(st_addr, data) aload(ld_addr,target)
- load(ld_addr,target) / other opeations
including uses of target / - use(target) store(st_addr,data)
- acheck(target,recovery_addr)
- use(target)
27Data Speculation
- Data Speculation and Instructions
- Advanced loads are available in many forms
(integer, floating-point, floating-point pair) - When an advanced load is executed, it allocates
an entry in a structure called the Advanced Load
Address Table (ALAT). Later, when a
corresponding check insertion (e.g. chk.a) is
executed, the presence of an entry indicates that
the data speculation succeeded - The advanced load check (chk.a) is used when an
advanced load and several instructions that
depend on the loaded data value are scheduled
before a store that is ambiguous relative to that
advanced load. - The chk.a works like the chk.s, if the
speculation was successful then execution
continues inline and no recovery is necessary - If the speculation was unsuccessful the chk.a
branches to compiler-generated recovery code. - The recovery code contains instructions that will
re-execute all the work that was dependent on the
failed data speculative load up to the point of
the check instruction. - The ALAT is searched for a matching entry to
determine success or failure
28ISA Classification
- ISA classification is based on the operand
addressing of data manipulation operations (i.e.
ADD, SUB, MUL) - two parameters of interest (M, N) N is maximum
number of operands that can be explicitly
addressed, M is maximum number of operands that
can be explicitly addressed in memory. - The Itanium is classified as a (0,3)
- Three address operand for each data manipulation
instruction - Zero memory direct operands
- Generally this is known as a RISC ISA
classification - Note the bundling of 3 instructions to make a
128-bit word is generally considered
very-long-instruction word (VLIW), so Itanium has
combinations of features from both complex and
RISC processors
29Register Set Integer
- 32 x 64-bit general purpose registers
- Zero address returns zero value
- 32 x 1-bit Not-A-Thing (NaT) registers,
correspond to the general purpose registers - zero address returns zero value
- 64 x 1-bit Predicate Registers
- zero address returns one value
30Data Types
- Digital only, no floating point.
- 64-bit Integer
- Byte Ordering
- Big Endian
31Addressing Modes
- The Itanium has only one simple addressing mode,
register indirect. - This reduces the amount of overhead per clock
cycle, since it does not have to deal with the
address-generation units required for multiple
addressing modes. - Example 1 Example 2
- ld8 r1 r3 st8 r3 r2
- loads 8 bytes from address indicated stores 8
bytes from register r2 to address indicated - by value in r3 into register r1 by value in r3
- PC-relative is also used to perform branches
32Instruction Set Format
33Instructions Set Itanium Lite
34Instructions Set Itanium Lite
35Lite Instruction Formats not Covered