Rethinking data centers - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Rethinking data centers

Description:

If each server draws 200W, the rack is 8KW, the container is 256 KW. ... Maybe Flash memory has a home here. What do they look like? More on Computers ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 27

Provided by: ChuckT

Category:

more less

Transcript and Presenter's Notes

Title: Rethinking data centers

1
Rethinking data centers
There is nothing more difficult to take in hand,
more perilous to conduct, or more uncertain in
its success, than to take the lead in the
introduction of a new order of things Niccolo
Machiavelli

Chuck Thacker
Microsoft Research
October 2007

2
Problems

Today, we need a new order of things.
Problem Today, data centers arent designed as
systems. We need apply system engineering to
Packaging
Power distribution
Computers themselves
Management
Networking
Ill discuss all of these.
Note Data centers are not mini-Internets

3
System design Packaging

We build large buildings, load them up with lots
of expensive power and cooling, and only then
start installing computers.
Pretty much need to design for the peak load
(power distribution, cooling, and networking).
We build them in remote areas.
Near hydro dams, but not near construction
workers.
They must be human-friendly, since we have to
tinker with them a lot.

4
Packaging Another way

Suns Black Box
Use a shipping container and build a parking
lot instead of a building.
Dont need to be human-friendly.
Assembled at one location, computers and all. A
global shipping infrastructure already exists.
Suns version uses a 20-foot box. 40 might be
better (Theyve got other problems, too).
Requires only networking, power, and cooled
water, so build near a river. Need a pump, but
no chiller (might have environmental problems).
Expand as needed, in sensible increments.
Rackable has a similar system. So (rumored) does
Google.

5
The Black Box
6
Suns problems

Servers are off-the-shelf (with fans).
They go into the racks sideways, since theyre
designed for front-to-back airflow.
Servers have a lot of packaging (later slide).
Cables exit front and back (mostly back).
Rack must be extracted to replace a server
They have a tool, but the procedure is complex.
Rack as a whole is supported on springs.
Fixing these problems Later slides.

7
Packaging

A 40 container holds two rows of 16 racks.
Each rack holds 40 1U servers, plus network
switch. Total container 1280 servers.
If each server draws 200W, the rack is 8KW, the
container is 256 KW.
A 64-container data center is 16 Mw, plus
inefficiency and cooling. 82K computers.
Each container has independent fire suppression.
Reduces insurance cost.

8
System Design Power distribution

The current picture
Servers run on 110/220V 60Hz 1F AC.
Grid -gt Transformer -gt Transfer switch -gt
Servers.
Transfer switch connects batteries -gt UPS/ Diesel
generator.
Lots and lots of batteries. Batteries require
safety systems (acid, hydrogen).
The whole thing nets out at 40 50 efficient,
gt Large power bills.

9
Power Another way

Servers run on 12V 400 Hz 3F AC.
Synchronous rectifiers on motherboard make DC
(with little filtering).
Rack contains a step-down transformer.
Distribution voltage is 480 VAC
Diesel generator supplies backup power.
Uses a flywheel to store energy until the
generator comes up.
When engaged, the frequency and voltage drop
slightly, but nobody cares.
Distribution chain
Grid -gt 60Hz motor -gt 400Hz generator -gt
Transformer -gt Servers.
Much more efficient (probably gt 85).
The Cray 1 (300 KW, 1971) was powered this way.
Extremely high reliability.

10
The Computers

We currently use commodity servers designed by
HP, Rackable, others.
Higher quality and reliability than the average
PC, but they still operate in the PC ecosystem.
IBM doesnt.
Why not roll our own?

11
Designing our own

Minimize SKUs
One for computes. Lots of CPU, Lots of memory,
relatively few disks.
One for storage. Modest CPU, memory, lots of
disks.
Like Petabyte?
Maybe one for DB apps. Properties TBD.
Maybe Flash memory has a home here.
What do they look like?

12
More on Computers

Use custom motherboards. We design them,
optimizing for our needs, not the needs of
disparate application.
Use commodity disks.
All cabling exits at the front of the rack.
So that its easy to extract the server from the
rack.
Use no power supplies other than on-board
inverters (we have these now).
Error correct everything possible, and check what
we cant correct.
When a component doesnt work to its spec, fix it
or get another, rather than just turning the
feature off.
For the storage SKU, dont necessarily use
Intel/AMD processors.
Although both seem to be beginning to understand
that low power/energy efficiency is important.

13
Management

Were doing pretty well here.
Automatic management is getting some traction,
opex isnt overwhelming.
Could probably do better
Better management of servers to provide more
fine-grained control than just reboot, reimage.
Better sensors to diagnose problems, predict
failures.
Measuring airflow is very important (not just
finding out whether the fan blades turn, since
there are no fans in the server).
Measure more temperatures than just the CPU.
Modeling to predict failures?
Machine learning is your friend.

14
Networking

A large part of the total cost.
Large routers are very expensive, and command
astronomical margins.
They are relatively unreliable we sometimes see
correlated failures in replicated units.
Router software is large, old, and
incomprehensible frequently the cause of
problems.
Serviced by the manufacturer, and theyre never
on site.
By designing our own, we save money and improve
reliability.
We also can get exactly what we want, rather than
paying for a lot of features we dont need (data
centers arent mini-Internets).

15
Data center network differences

We know (and define) the topology
The number of nodes is limited
Broadcast/multicast is not required.
Security is simpler.
No malicious attacks, just buggy software and
misconfiguration.
We can distribute applications to servers to
distribute load and minimize hot spots.

16
Data center network design goals

Reduce the need for large switches in the core.
Simplify the software. Push complexity to the
edge of the network.
Improve reliability
Reduce capital and operating cost.

17
One approach to data center networks

Use a 64-node destination-addressed hypercube at
the core.
In the containers, use source routed trees.
Use standard link technology, but dont need
standard protocols.
Mix of optical and copper cables. Short runs are
copper, long runs are optical.

18
Basic switches

Type 1 Forty 1 Gb/sec ports, plus 2 10 Gb/sec
ports.
These are the leaf switches. Center uses 2048
switches, one per rack.
Not replicated (if a switch fails, lose 1/2048 of
total capacity)
Type 2 Ten 10 Gb/sec ports
These are the interior switches
There are two variants, one for destination
routing, one for tree routing. Hardware is
identical, only FPGA bit streams differ.
Hypercube uses 64 switches, containers use 10
each. 704 switches total.
Both implemented with Xilinx Virtex 5 FPGAs.
Type 1 also uses Gb Ethernet Phy chips
These contain 10 Gb Phys, plus buffer memories
and logic.
We can prototype both types with BEE3.
Real switches use less expensive FPGAs

19
Data center network
20
Hypercube properties

Minimum hop count
Even load distribution for all-all communication.
Reasonable bisection bandwidth.
Can route around switch/link failures.
Simple routing
Outport f(Dest xor NodeNum)
No routing tables
Switches are identical.
Use destination based input buffering, since a
given link carries packets for a limited number
of destinations.
Linkbylink flow control to eliminate drops.
Links are short enough that this doesnt cause a
buffering problem.

21
A 16-node (dimension 4) hypercube
22
Routing within a container
23
Bandwidths

Servers gt network 82 Tb/sec
Containers gt cube 2.5 Tb/sec
Average will be lower ?.

24
Objections to this scheme

Commodity hardware is cheaper
This is commodity hardware. Even for one center
(and we build many).
And in the case of the network, its not cheaper.
Large switches command very large margins.
Standards are better
Yes, but only if they do what you need, at
acceptable cost.
It requires too many different skills
Not as many as you might think.
And we would work with engineering/manufacturing
partners who would be the ultimate manufacturers.
This model has worked before.
If this stuff is so great, why arent others
doing it?
They are.

25
Conclusion

By treating data centers as systems, and doing
full-system optimization, we can achieve
Lower cost, both opex and capex.
Higher reliability.
Incremental scale-out.
More rapid innovation as technology improves.

26
Questions?

Write a Comment

User Comments (0)