Title: Computer Architecture and Organization
1Computer Architecture and OrganizationMiles
Murdocca and Vincent Heuring
Chapter 10 Advanced Computer Architecture
2Chapter Contents
- 10.1 Parallel Architecture
- 10.2 Superscalar Machines and the PowerPC
- 10.3 VLIW Machines, and the Itanium
- 10.4 Case Study Extensions to the Instruction
Set The Intel MMX/SSEX and Motorola Altivec
SIMD Instructions - 10.5 Programmable Logic Devices and Custom ICs
- 10.6 Unconventional Architectures
3Parallel Speedup and Amdahls Law
- In the context of parallel processing, speedup
can be computed
Amdahls law, for p processors and a fraction f
of unparallelizable code
For example, if f 10 of the operations must
be performed sequentially, then speedup can be no
greater than 10 regardless of how many processors
are used
4Efficiency and Throughput
- Efficiency is the ratio of speedup to the
number of processors used. For a speedup of 5.3
with 10 processors, the efficiency is
Throughput is a measure of how much computation
is achieved over time, and is of special concern
for I/O bound and pipelined applications. For the
case of a four stage pipeline that remains
filled, in which each pipeline stage completes
its task in 10 ns, the average time to complete
an operation is 10 ns even though it takes 40 ns
to execute any one operation. The overall
throughput for this situation is then
5FlynnTaxonomy
Classification of architectures according to
the Flynn taxonomy (a) SISD (b) SIMD (c) MIMD
(d) MISD.
6Network Topologies
Network topologies (a) crossbar (b) bus (c)
ring (d) mesh (e) star (f) tree (g) perfect
shuffle (h) hypercube.
7Crossbar
Internal organization of a crossbar.
8Crosspoint Settings
(a) Crosspoint settings for connections 0 3
and 3 0 (b) adjusted settings to accommodate
connection 1 1.
9Three-Stage Clos Network
1012-Channel Three-Stage Clos Network with n p 6
1112-Channel Three-Stage Clos Network with n p 2
1212-Channel Three-Stage Clos Network with n p 4
1312-Channel Three-Stage Clos Network with n p 3
14C function computes (x2 y2) y2
15Dependency Graph
(a) Control sequence for C program (b)
dependency graph for C program.
16Matrix Multiplication
(a) Problem setup for Ax b (b) equations for
computing the bi.
17Matrix Multiplication Dependency Graph
18The PowerPC 601 Architecture
19128-Bit IA-64 Instruction Word
Each 41 bit instruction consists of three
register addresses (each 7 bits 128 possible
registers), a predicate register (6 bits) and the
opcode and flags or general purpose register (14
bits, varies by instruction).
20Itanium Instruction Types
21Allowable Combinations of IA-64 Instruction Types
Assigned to Instruction Slots
22IA-64 Instruction Issues
Maximum number of IA-64 instructions that can
be executed for each pairing of bundles.
23Intel MMX (MultiMedia eXtensions)
Vector addition of eight bytes by the Intel
PADDB mm0, mm1 instruction
24Intel and Motorola Vector Registers
Intel aliases the floating point registers as
MMX registers. This means that the Pentiums 8
64-bit floating-point registers do double-duty as
MMX registers. Motorola implements 32 128-bit
vector registers as a new set, separate and
distinct from the floating-point registers.
25MMX and AltiVec ArithmeticInstructions
26Comparing Two MMX Byte Vectors for Equality
27Conditional Assignment of an MMX Byte Vector
28A PAL Device
PLAs and PALs are similar except that the OR
gates in a PAL have a fixed number of inputs and
the inputs are not programmable. PALs are more
prevalent than PLAs because they are easier to
manufacture and are less complex.
29Complex Programmable Logic Device
CPLDs are PAL-like or PLA-like blocks that can be
combined with programmable interconnections.
Commercial CPLDs may contain as many as 200,000
equivalent gates and have over 3,000 macrocells.
30Field Programmable Gate Array
Unlike CPLDs, which employ large logic blocks and
fewer interconnection options, FPGAs employ small
logic blocks that can be programmably
interconnected.
31Quantum Computing
Single-particle interference experiment.
32Multi-Valued Logic
Truth tables for binary and ternary comparison
functions
33Neural Networks
Model of a living neuron, and model of an
artificial neuron (below).
34Artificial Neural Network Example
Two simple, feed-forward neural networks with
inputs, weights, and thresholds as shown.