The x86 Server Platform - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

The x86 Server Platform

Description:

A great deal of innovation has centered around approximating this perfect world. CISC ... Clock speeds increase because instructions are simpler ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 35

Provided by: michaela46

Category:

more less

Transcript and Presenter's Notes

Title: The x86 Server Platform

1
The x86 Server Platform

.. Resistance is futile.
Dec 6, 2004

2
Server shipments Total vs x86
3
Market Share Servers, United States, 2Q04
Michael McLaughlin, Market Share Servers, United
States, 2Q04 7 October 2004, Gartner
4
x86 Platform CPUs

Intel
Xeon MP Gallatin (future is Potomac)
Xeon SP/DP EM64T - Nacona
Itanium II MP Madison (future is Montecito)
AMD
Opteron

5
Gallatin - MP

130 nm
3 GHz
4 MB L3 Cache
FSB - 400 MHz

6
ES7000 32 Gallatins
7
Nacona Single Processor with EM64T

90 nm
Clock Speed 3.2-3.6 GHz
L3 4 MB
FSB 800 Mhz

8
Itanium II - Madison

130 nm
9 MB L3 cache
1.6 GHz
FSB 400 MHz

9
(No Transcript)
10
(No Transcript)
11
STOP

Why Multi-Core?
.. And while were at it, why Multi-Threading?
Its all about the balance of
Silicon real estate
Compiler technology
Cost
Power
. to meeting the constant pressure to double
performance every 18 months

12
Memory Latency vs CPU Speed
MicroprocessorOperating Frequency (GHz)
DRAM AccessFrequency (10-9 sec)-1
10.0
10.0
1.0
1.0
Microprocessor on-chip clock
Commodity DRAM
0.1
0.1
0.01
0.01
1990
1995
2000
2005
2010
Production Year
13
Processor Architecture

When latency ? Ø and bandwidth ? 8 we will have
the perfect CPU
A great deal of innovation has centered around
approximating this perfect world
CISC
CPU Cache
RISC
EPIC
Multi-Threading
Multiple Cores

14
Complex Instruction Set Computer

Hardware implements assembler instructions
MULT A, B
hardware loads registers, multiplies and stores
results
Multiple clocks needed for an instruction
RAM requirements are relatively small
Compilers translate high level languages down to
assembler instructions Von Neumann

hardware
http//www.hardwarecentral.com/hardwarecentral/tut
orials/2427
15
CPU Cache

When CPU speeds started to increase, memory
latency emerged as a bottleneck
CPU caches were used to keep local references
close to the CPU
For SMP systems, memory banks were more than a
clock away
It is not uncommon today to find 3 orders of
magnitude between the fastest and slowest memory
latency

16
Reduced Instruction Set Computer

Hardware is simplified fewer transistors are
needed for full instruction set
RAM requirements are higher to store intermediate
results and more code
Compilers are more complex
Clock speeds increase because instructions are
simpler
Deterministic, simple instructions allow
pipelining

17
Pipelining
25 busy
Higher Clock Speeds!
100 busy
80 busy
60 busy
40 busy
18
Branch Prediction

While processing in parallel, branches occur
Branch prediction is used to increase the
probability that a specific branch will be
followed
If incorrect, the pipeline is dead and the CPU
stalls
Statistics
10-20 of instructions are branches
Predictions are incorrect about 10 of the time
As the pipeline increases, probability of miss
increases and cycles will be discarded
80-deep pipeline / 20 branches / 10 miss 80
chance of miss and a penalty of 80 cycles

19
Itanium II Epic Instruction SetExplicitly
Parallel Instruction Computing

Compiler can indicate code that can be executed
in parallel
Both branches are pipelined
No lost cycles due to miss-prediction
Pipeline can be deeper
Complexity continues to move into the compiler

20
Multi-Threading
21
(No Transcript)
22
Multiple Cores

Fabrication sizes continue to diminish
The additional real estate has been used to put
more and more memory on the die
Multi-core technology provides a new way to
exploit the additional space
The clock rates cannot continue to climb due to
the excessive heat
P C V2 f C - switch capacitance V
Supply Voltage f clock frequency
Multiple cores is the next step to providing
faster execution times for applications

23
(End of 2005?)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
AMD Opteron 800 Series

130 nm
Clock Speed 1.4-2.4 GHz
L2 1 MB
6.4 GB/s Hypertransport

31
Architectural Comparison
Hypertransport - 6.4 GB/s
Opteron
Opteron
Xeon
Xeon
Xeon
Xeon
6.4 GB/s
Opteron
Opteron
PCI-XBridge
MemoryAddressBuffer
DDR 144-bit
PCI-XBridge
SNC
PCI-XBridge
PCI-XBridge
MemoryAddressBuffer
PCI-XBridge
I/OHub
OtherBridge
MemoryAddressBuffer
I/OHub
MemoryAddressBuffer
32
Mapping Workloads onto Architecture

Consider a dichotomy of workloads
Large Memory Model This needs a large, single
system image and a large amount of coherent
memory
Database apps - SQL Server / Oracle
Business Intelligence Data Warehousing
Analytics
Memory-resident databases
64 bit architectures allow memory addressability
above 1 TB
Small/Medium Memory Model This can be
cost-effective in workloads that do not require
extensive shared memory/state
Stateless Applications and Web Services
Web Servers
Clusters of systems for parallelized applications
and grids

33
Large Server Vendors

Intel Announcement (Nov 19)
Otellini said product development, marketing and
software efforts (for Itanium) will all now be
aimed at "greater than four-way systems". He also
said, "The mainframe isn't dead. That's where I'd
like to push Itanium over time."
The size of the SMP is affected by Intels chip
set support for coherent memory
OEM Vendors (Unisys, HP, SGI, Fujitsu, IBM)
Each has unique chip set to build basic
four-ways into large SMP systems
IBM has Power5, which is a direct competitor
Intel 32-bit and EM674T
This could emerge as the flagship product

34
Where Are We Going?

Since the early CISC computers, we have moved
more and more of the complexity out to the
compiler to achieve parallelism and fully exploit
the silicon real estate
The power requirements, along with the smaller
fabrication sizes, have pushed the CPU vendors to
exploit multiple cores
The key to performance for these future machines
will be the applications ability to exploit
parallelism