Internet Scale Overlay Hosting - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Internet Scale Overlay Hosting

Description:

CDNs use overlay methods to enhance performance ... Conventional servers have dreadful performance on IO-intensive applications ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 28

Provided by: kare199

Category:

more less

Transcript and Presenter's Notes

Title: Internet Scale Overlay Hosting

1
Internet Scale Overlay Hosting
Jon Turner withPatrick Crowley, John DeHart,
Brandon Heller, Fred Kuhns, Sailesh Kumar, John
Lockwood, Jing Lu, Mike Wilson, Charlie Wiseman
and Dave Zar
2
Overview

Overlay networks are key tool for overcoming
Internet limitations
CDNs use overlay methods to enhance performance
also useful for voice, video-streaming,
multi-player games
myriad other capabilities demonstrated using
overlays
Overlay hosting services can enable more
widespread use of overlays
PlanetLab has demonstrated potential in research
space
nothing exactly comparable yet in commercial
space
although utility computing is similar in spirit
need more integrated and scalable platforms
internet-scale traffic volumes
low latency for delay-sensitive traffic
flexible resource allocation and traffic isolation

3
Overlay Hosting Service
hostingplatform
overlaynetwork
overlaynode
provisioned backbone
access via Internet

Flexible platforms shared by multiple overlays
Provisioned backbone, internet for access

4
Overlay Hosting Platform

Processing Engines (PEs) implement overlay nodes
GPE conventional server blade
NPE network processor blade
nearly 4 Mp/s per NP vs 50 Kp/s
100 ms latency vs. 1-300 ms
shared or dedicated
IO Cards terminate external links, mux/demux
streams
Shared PEs managed by substrate
Dedicated PEs may be fully controlled by overlay
switch and IO Cards provide protection and
isolation
PEs in larger overlay nodes linked by logical
switch
allows scaling up for higher throughput

5
PlanetLab

Canonical overlay hosting service, using PC
platform
Applications run as user-space processes in
virtual machines
Effective and important research testbed
But, low throughput and widely variable latency
limits its potential as service deployment
platform

obtains slice descriptions from PlanetLab database
standard socket programs
token bucket-based scheduling of VMs
6
Supercharging PlanetLab
slow-path runs in a standard PlanetLab environment
exceptional packets forwarded to slow-path
existing PlanetLab applications can run unchanged
on GPE
fast-path handles most traffic
fast-path runs on a network processor
7
SPP Components
conventional server which coordinates system
components and synchronizes with PlanetLab
conventional server blades supporting standard
PlanetLab environment
blade containing 10GE data switch and 1GE control
switch
dual Intel IXP 2850 blade which forwards packets
to correct PEs
dual Intel IXP 2850 blades supporting application
fast-paths
8
ATCA Boards

Radisys switch blade
up to 16 slot chassis
10 GbE fabric switch
1 GbE control switch
full VLAN support
Scaling up
5x10 GbE to front
2 more to back

Radisys NP blades
for LC and NPE
dual IXP 2850 NPs
3xRDRAM
4xSRAM
shared TCAM
2x10GbE to backplane
10x1GbE external IO(or 1x10GbE)

Intel server blades
for CP and GPE
dual Xeons (2 GHz)
4x1GbE
on-board disk
Advanced Mezzanine Card slot

9
What You Need to Build Your Own
10
IXP 2850 Overview

16 multi-threaded MicroEngines (MEs)
8 thread contexts with rapid switching capability
fast nearest-neighbor connections for pipelined
apps
3 SDRAM and 4 SRAM channels (optional TCAM)
Management Processor (MP) for control

11
Pipelining Multi-threading

Limited program store per ME
parallelize by dividing program among pipeline
stages
Use multi-threading to hide memory latency
high latency to off-chip memory (gt100 cycles)
modest locality of reference in net workloads
interleave memory accesses to keep processor busy
sequenced hand-offs between threads maintains
order
works well when limited processing time variation

12
NPE Hosting Multiple Apps

Parse and Header Format include slice-specific
code
parse extracts header fields to form lookup key
Hdr Format makes required changes to header
fields
Lookup uses opaque key for TCAM lookup
Multiple static code options can be supported
multiple slices per code option
each has own filters, queues and block of private
memory

13
Sharing the NPE
each application has private queues
each application has private lookup entries
forms key for lookup
formats outgoing packet headers
14
System Control
CP
GPE
Local Node Manager
Global Node Manager
PLC
VM
Control Interface
Internet
VM
Local Resource Manager
Global Resource Manager
User
...
Control Switch

Instantiate new application
Open socket
Instantiate fast-path

NPE
Fast-path Manager
Line Card Manager
LC
Data Interfaces
Fast-path
Filter
Filter
...
SPP
15
Evaluation

Slice 1 IPv4
packets arrive/depart in UDP tunnels
Slice 2 Internet Indirection Infrastructure
(i3)
packets contain triggers matched to IP addresses
no match at local node results in Chord forwarding

16
IPv4 Throughput Comparison
10x improvement for 1400 byte payloads
NPE almost keeps up with full line rate for 0
byte payloads
80x improvement for 0 byte payloads
17
So, what this means is...

price-performance advantage of gt15X also, big
power and space advantage
18
IPv4 Latency Comparison

8 IPv4 instances
Measured ping delay against background traffic

19
IPv4/i3 Fast-Path Throughput Comparison
IPv4 0B payload
i3 0B payload
IPv4 40B payload
i3 40B payload

Constant input rate of 5 Gb/s

20
Scaling Up

14 slot chassis
3 Line Cards
2 switch blades
9 processing blades (NP or server)

Multi-chassis systems
direct connection using expansion ports up to 7
chasses
indirect connection using separate 10 GbE
switches up to 24 chasses

21
Other ATCA Components
22
Open Network Lab

Internet-accessible networking lab
(onl.wustl.edu)
built around set of extensible gigabit routers
intuitive Remote Lab Interface makes it easy to
get started
extensive facilities for performance monitoring
Expansion underway
14 new Network Processor (NP) based routers
packet processing implemented in software for
greater flexibility
high performance plugin subsystem for user-added
features
support larger experiments and more concurrent
users
70 new rack-mount computers to serve as end
systems
4 stackable 48 port GbE switches for configuring
experiments

23
Sample ONL Session
Bandwidth Usage
Routing Table
Queue Lengths
Network Configuration
ssh window to host showing ping delays
PacketLosses
Queue Parameters
RouterPluginCommands
24
ONL NP Router
25
Expanded ONL Configuration
26
Equipment Photos
27
Summary

Next step add netFPGA to SPP and ONL
Interesting time for networking research
highly capable subsystem components readily
available
many vendors, variety of products
greater opportunity for network service
innovation
Growing role of multi-core processors
to use them effectively, must design for
parallelism
requires deeper understanding of performance
Conventional servers have dreadful performance on
IO-intensive applications
partly hardware, but mostly software
to fix, need to push fast-path down into drivers
and program for multi-core parallelism