Eric Keller - PowerPoint PPT Presentation

About This Presentation
Title:

Eric Keller

Description:

The way packets forwarded hasn't (IP) Meant for communication between machines ... Cliff Click graph to Verilog, standard interface on modules ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 60
Provided by: Eri7112
Category:
Tags: eric | keller | verilog

less

Transcript and Presenter's Notes

Title: Eric Keller


1
Multi-Level Architecture for Data Plane
Virtualization
  • Eric Keller
  • Oral General Exam
  • 5/5/08

2
The Internet (and IP)
  • Usage of Internet continuously evolving
  • The way packets forwarded hasnt (IP)
  • Meant for communication between machines
  • Address tied to fixed location
  • Hierarchical addressing
  • Best-effort delivery
  • Addresses easy to spoof
  • Great innovation at the edge (Skype/VoIP,
    BitTorrent)
  • Programmability of hosts at application layer
  • Cant add any functionality into network

3
Proposed Modifications
  • Many proposals to modify some aspect of IP
  • No single one is best
  • Difficult to deploy
  • Publish/Subscribe mechanism for objects
  • Instead of routing on machine address, route on
    object ID
  • e.g. DONA (Data oriented network architecture),
    scalable simulation
  • Route through intermediary points
  • Instead of communication between machines
  • e.g. i3 (internet indirection infrastructure),
    DOA (delegation oriented architecture)
  • Flat Addressing to separate location from ID
  • Instead of hierarchical based on location
  • e.g. ROFL (routing on flat labels), SEIZE
    (scalable and efficient, zero-configuration
    enterprise)

4
Challenges
  • Want to Innovate in the Network
  • Cant because networks are closed
  • Need to lower barrier for who innovates
  • Allow individuals to create a network and define
    its functionality
  • Virtualization as a possible solution
  • For both network of future and overlay networks
  • Programmable and sharable
  • Examples PlanetLab, VINI

5
Network Virtualization
  • Running multiple virtual networks at the same
    time over a shared physical infrastructure
  • Each virtual network composed of virtual routers
    having custom functionality

Physical machine
Virtual router
Virtual network e.g. blue virtual routers plus
Blue links
6
Virtual Network Tradeoffs
  • Goal Enable custom data planes per virtual
    network
  • Challenge How to create the shared network nodes

Programmability
Performance
Isolation
7
Virtual Network Tradeoffs
  • Goal Enable custom data planes per virtual
    network
  • Challenge How to create the shared network nodes

Programmability
How easy is it to add new functionality? What is
the range of new functionality that can be
added? Does it extend beyond software routers?
Performance
Isolation
8
Virtual Network Tradeoffs
  • Goal Enable custom data planes per virtual
    network
  • Challenge How to create the shared network nodes

Programmability
Does resource usage by one virtual networks have
an effect on others? Faults? How secure is it
given a shared substrate?
Performance
Isolation
9
Virtual Network Tradeoffs
  • Goal Enable custom data planes per virtual
    network
  • Challenge How to create the shared network nodes

Programmability
How much overhead is there for sharing? What is
the forwarding rate? Throughput? Latency?
Performance
Isolation
10
Virtual Network Tradeoffs
  • Network Containers
  • Duplicate stack or data structures
  • e.g. Trellis, OpenVZ, Logical Router
  • Extensible Routers
  • Assemble custom routers from common functions
  • e.g. Click, Router Plug Ins, Scout
  • Virtual MachinesClick
  • Run operating system on top of another operating
    system
  • e.g. Xen, PL-VINI (Linux-VServer)

Programmability
Programability, Isolation, Performance
Performance
Isolation
Programmability, Isolation, Performance
Programmability, Isolation, Performance
11
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

12
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

13
User Experience (Creating a virtual network)
  • Custom functionality
  • Custom user environment on each node (for
    controlling virtual router)
  • Specify single nodes packet handling as graph of
    common functions
  • Isolated from others sharing same node
  • Allocated share of resources (e.g. CPU, memory,
    bandwidth)
  • Protected from faults in others (e.g. another
    virtual router crashing)
  • Highest performance possible

For example
User Control Environment
Determine Shortest Path
Config/Query interface
Populate routing tables
A1
A2
A3
From devices
To devices
Check Header, Destination Lookup
A4
A5
14
Lightweight Virtualization
  • Combine graphs into single graph
  • Provides lightweight virtualization
  • Add extra packet processing (e.g. mux/demux)
  • Needed to direct packets to the correct graph
  • Add resource accounting

Graph 1
Master graph
combine
Graph 2
Master Graph
Graph 1
Output port
Input port
Graph 2
15
Increasing Performance and Isolation
  • Partition into multiple graphs across multiple
    targets
  • Each target with different capabilities
  • Performance, Programmability, Isolation
  • Add connectivity between targets
  • Unified run-time interface (it appears as a
    single graph)
  • To query and configure the forwarding capabilities

Target0 graph
Graph 1
Target1 graph
Master graph
partition
combine
Graph 2
Target2 graph
16
Examples of Multi-Level
  • Fast Path/Slow Path
  • IPv4 forwarding in fast path, exceptions in slow
    path
  • i3 Chord ring lookup function in fast path,
    handling requests in slow path
  • Preprocessing
  • IPSec do encryption/decryption in HW, rest in
    SW
  • Offloading
  • TCP Offload
  • TCP Splicing
  • Pipeline of coarse grain services
  • e.g. transcoding, firewall
  • SoftRouter from Bell Labs

17
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

18
Implementation
  • Each network has custom functionality
  • Specified as graph of common functions
  • Click modular router
  • Each network allocated share of resources
  • e.g. CPU
  • Linux-VServer single resource accounting for
    both control and packet processing
  • Each network protected from faults in others
  • Library of elements considered safe
  • Container for unsafe elements
  • Highest performance possible
  • FPGA for modules with HW option, Kernel for
    modules without

19
Click Background Overview
  • Software architecture for building flexible and
    configurable routers
  • Widely used commercially and in research
  • Easy to use, flexible, high performance (missing
    sharable)
  • Routers assembled from packet processing modules
    (Elements)
  • Simple and Complex
  • Processing is directed graph
  • Includes a scheduler
  • Schedules tasks (a series of elements)

20
Linux-VServer
21
Linux-VServer Click NetFPGA
click
click
click
Coordinating Process
Installer
Installer
Installer
Click
Click on NetFPGA
22
Outline
  • Architecture
  • Implementation
  • Virtualizing Click in the Kernel
  • Challenges with kernel execution
  • Extending beyond software routers
  • Evaluation
  • Conclusion/Future Work

23
Virtual Kernel Mode Click
  • Want to run in Kernel mode
  • Close to 10x higher performance than user mode
  • Use library of safe elements
  • Since Kernel is shared execution space
  • Need resource accounting
  • Click scheduler does not do resource accounting
  • Want resource accounting system-wide (i.e. not
    just inside of packet processing)

24
Resource Accounting with VServer
  • Purpose of Resource Accounting
  • Provides isolation between virtual networks
  • Unified resource accounting
  • For packet processing and control
  • VServers Token Bucket Extension to Linux
    Scheduler
  • Controls eligibility of processes/threads to run
  • Integrating with Click
  • Each individual Click configuration assigned to
    its own thread
  • Each thread associated with VServer context
  • Basic mechanism is to manipulate the task_struct

25
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond software routers
  • Evaluation
  • Conclusion/Future Work

26
Unyielding Threads
  • Linux kernel threads are cooperative (i.e. must
    yield)
  • Token scheduler controls when eligible to start
  • Single long task can have short term disruptions
  • Affecting delay and jitter on other virtual
    networks
  • Token bucket does not go negative
  • Long term, a virtual network can get more than
    its share

Tokens added (rate A)
Size of Bucket (S)
Min tokens to exec (M)
Tokens consumed (1 per scheduler tick)
27
Unyielding Threads (solution)
  • Determine maximum allowable execution time
  • e.g. from token bucket parameters, network
    guarantees
  • Determine pipelines execution time
  • Elements from library have known execution times
  • Custom elements execution times are unknown
  • Break pipeline up (for known)
  • Execute inside of container (for unknown)

elem1
elem2
elem3
elem1
elem2
elem3
elem1
elem2
elem3
From Kern
To User
28
Custom Elements Written in C
  • Elements have access to global state
  • Kernel state/functions
  • Click global state
  • Could
  • Pre-compile in user mode
  • Pre-compile with restricted header files
  • Not perfect
  • With C, you can manipulate pointers
  • Instead, custom elements are unknown (unsafe)
  • Execute in container in user space

29
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

30
Extending beyond commodity HW
  • PC Programmable NIC (e.g. NetFPGA)
  • FPGA on PCI card
  • 4 GigE ports
  • On board SRAM and DRAM
  • Jon Turners Pool of Processing Elements with
    crossbar
  • PEs can be GPP, NPU, FPGA
  • Switch Fabric Crossbar

Partition between FPGA and Software Generalize
Partition among PEs
31
FPGA Click
  • Two previous approach
  • Cliff Click graph to Verilog, standard
    interface on modules
  • CUSP Optimize Click graph by parallelizing
    internal statements.
  • Our approach
  • Build on Cliff by integrating FPGAs into Click
    (the tool)
  • Software Analogies
  • Connection to outside environment
  • Packet Transfer
  • Element specification and implementation
  • Run-time querying and configuration
  • Memory
  • Notifiers
  • Annotations

FromDevice (eth0)
Element (LEN 5)
ToDevice (eth0)
32
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

33
Experimental Evaluation
  • Is multi-level the right approach?
  • i.e. is it worth effort to support kernel and
    FPGA
  • Does programmability imply less performance?
  • What is the overhead of virtualization?
  • From container when you need to go to user
    space.
  • From using multiple threads when running in
    kernel.
  • Are the virtual networks isolated in terms of
    resource usage?
  • What is the maximum short-term disruption from
    unyeilding threads?
  • How long can a task run without leading to
    long-term unfairness?

34
Setup
n1
Modify header (IP and ETH) To be from n1 to n2.
PC3000 on Emulab 3GHz, 2GB RAM
n0
n2
rtr
Generates Packets from n0 to n1, tagged with
time Receives packets, diffs the current time
and packet time (and stores avg in mem)
The router under test (Linux or a Click config)
n3
35
Is multi-Level the right approach?
  • Performance benefit going from user to kernel,
    and
  • Kernel to FPGA
  • Programmability imply less performance?
  • Not sacrificing performance by introducing
    programmability

36
What is the overhead of virtualization?From
container
  • When you must go to user space, what is the cost
    of executing in a container?
  • Overhead of executing in a VServer is minimal

37
What is the overhead of virtualization? From
using multiple threads
Thread (each runs X tasks/yield)
Put same click graph in each thread Round robin
traffic between them
4portRouter (compound element)
RoundRobin
PollDevice
ToDevice
4portRouter (compound element)
38
How long to run before yielding
  • tasks per yield
  • Low gt high context switching, I/O executes often
  • High gt low context switching, I/O executes
    infrequently

39
What is the overhead of virtualization? From
using multiple threads
  • Given sweet spot for each of virtual networks
  • Increasing number of virtual networks from 1 to
    10 does not hurt aggregate performance
    significantly
  • Alternatives to consider
  • Single threaded with VServer
  • Single threaded, modify Click to do resource
    accounting
  • Integrate polling into threads

40
What is the maximum short-term disruption from
unyeilding threads?
  • Profile of (some) Elements
  • Standard N port router example - 5400 cycles
    (1.8us)
  • RadixIPLookup (167k entries) - 1000 cycles
  • Simple Elements
  • CheckLength - 400 cycles
  • Counter - 700 cycles
  • HashSwitch - 450 cycles
  • Maximum Disruption is length of longest task
  • Possible to break up pipelines

RoundTrip CycleCount
Infinite Source
Elem
Discard NoFree
41
How long can a task run without leading to
long-term unfairness?
Chewy
4portRouter (compound element)
Infinite Source
Discard
Limited to 15
4portRouter (compound element)
Infinite Source
Discard
Count cycles
42
How long can a task run without leading to
long-term unfairness?
10k extra cycles / task
Zoomed In
  • Tasks longer than 1 token can lead to unfairness
  • Run long executing elements in user-space
  • performance overhead of user-space is not as big
    of an issue

43
Outline
  • Architecture
  • Implementation
  • Virtualizing Kernel
  • Challenges with kernel execution
  • Extending beyond commodity hardware
  • Evaluation
  • Conclusion/Future Work

44
Conclusion
  • Goal Enable custom data planes per virtual
    network
  • Tradeoffs
  • Performance
  • Isolation
  • Programmability
  • Built a multi-level version of Click
  • FPGA
  • Kernel
  • Container

45
Future Work
  • Scheduler
  • Investigate alternatives to improve efficiency
  • Safety
  • Process to certify element as safe (can it be
    automated?)
  • Applications
  • Deploy on VINI testbed
  • Virtual router migration
  • HW/SW Codesign Problem
  • Partition decision making
  • Specification of elements (G language)

46
Questions
47
Backup
48
Signs of Openness
  • There are signs that network owners and equipment
    providers are opening up
  • Peer-to-peer and network provider collaboration
  • Allowing intelligent selection of peers
  • e.g. Pando/Verizon (P4P), BitTorrent/Comcast
  • Router Vendor API
  • allowing creation of software to run on routers
  • e.g. Juniper PSDP, Cisco AXP
  • Cheap and easy access to compute power
  • Define functionality and communication between
    machines
  • e.g. Amazon EC2, Sun Grid

49
Example 1 User/Kernel Partition
  • Execute unsafe elements in container
  • Add communication elements

container
u1
fk
tk
u1
User Kernel
s1
s2
s3
tu
fu
Safe (s1, s2, s3) Unsafe (u1)
s1
s2
s3
ToUser (tu), FromKernel (fk) ToKernel(tk),
FromUser (fu)
50
Example 2 Non-Commodity HW
  • PC Programmable NIC (e.g. NetFPGA)
  • FPGA on PCI card
  • 4 GigE ports
  • On board SRAM and DRAM
  • Jon Turners Pool of Processing Elements with
    crossbar
  • PEs can be GPP, NPU, FPGA
  • Switch Fabric Crossbar

Partition between FPGA and Software Generalize
Partition among PEs
51
Example 2 Non-Commodity HW
  • Redrawing the picture for FPGA/SW
  • Elements can have HW implementation, SW
    implementation, or both (choose one)

fd
td
sw1
sw1
Software FPGA
hw1
hw2
hw3
tc
fc
Software (sw1) Hardware (hw1, hw2, hw3)
hw1
hw2
hw3
ToCPU (tc), FromDevice (fd) ToDevice(td), FromCPU
(fc)
52
Connection to outside environment
  • In Linux, the Board is set of devices (e.g.
    eth0)
  • Can query Linux for whats available
  • Network driver (to read/write packets)
  • Inter process communication (for comm with
    handlers)
  • FPGA is a chip on a board
  • Using eth0 needs
  • Pins to connect to
  • Some on chip logic (in form of IP Core)
  • Board API
  • Specify available devices
  • Specify size of address block - used by char
    driver
  • Provide elaborate() function
  • Generates a top level Verilog module
  • Generates a UCF file (pin assignments)

53
Packet Transfer
  • In software it is a function call
  • In FPGA use a pipeline of elements with a
    standard interface
  • Option1 Stream packet through, 1 word at a time
  • Could just be the header
  • Push/Pull a bit tricky
  • Option2 Pass pointer
  • But would have to go to memory (inefficient)

data
Element1
Element2
ctrl
valid
ready
54
Element specification and implementation
  • Need
  • Meta-data
  • Specify packet processing
  • Specify run-time querying handling (next slide)
  • Meta-data
  • Use Click C API
  • Ports
  • Registers to use specific devices
  • e.g. FromDevice(eth0) registers to use eth0
  • Packet Processing
  • Use C to print out Verilog
  • Specialized based on instantiation parameters
    (config. string)
  • Standard interface for packet
  • Standard interface for handler
  • Currently memory mapped register

55
Run-time querying and configuration
  • Query state and update configuration in elements
  • e.g. add ADDR/MASK GW OUT

user
  • When Creating Element
  • Request Addr Block
  • Specify software handlers
  • Uses read/write methods to get data
  • Allocating Addresses
  • Given total size, and
  • size of each elements requested block
  • Generating Decode Logic

click
telnet
char driver
kernel
PCI
FPGA
decode
elem1
elem2
elem3
56
Memory
  • In software
  • malloc
  • static arrays
  • Share table through global variables or passing
    pointer
  • Elements that do no packet processing (passed as
    configuration string to elements)
  • In FPGA
  • Elements have local memory (registers/BRAM)
  • Unshared (off-chip) memories treat like a
    device
  • Shared (off-chip) global memories (Unimplemented)
  • Globally shared vs. Shared between subset of
    elements
  • Elements that do no packet processing
    (Unimplemented)

57
Notifiers, Annotations
  • Notifiers
  • Element registers as listener or notifier
  • In FPGA, create extra signal(s) from notifier to
    listener
  • Annotations
  • Extra space in Packet data structure
  • Used to mark packet with info not in packet
  • Which input port packet arrived in
  • Result of lookup
  • In software
  • fixed byte array
  • In FPGA
  • packet is streamed through, so adding extra bytes
    is simple

58
User/Kernel Communication
  • Add communication elements
  • Use mknod for each direction
  • ToUser/FromUser store packets and provide file
    functions
  • ToKernel/FromKernel use file I/O

container
u1
fk
tk
u1
User Kernel
s1
s2
s3
tu
fu
Safe (s1, s2, s3) Unsafe (u1)
s1
s2
s3
ToUser (tu), FromKernel (fk) ToKernel(tk),
FromUser (fu)
59
FPGA/Software Communication
  • Add communication elements
  • ToCPU/FromCPU uses device that communicates with
    Linux over PCI bus
  • Network driver in Linux
  • ToDevice/FromDevice standard Click element

fd
td
sw1
sw1
Software FPGA
hw1
hw2
hw3
tc
fc
hw1
hw2
hw3
Software (sw1) Hardware (hw1, hw2, hw3)
ToCPU (tc), FromDevice (fd) ToDevice(td), FromCPU
(fc)
Write a Comment
User Comments (0)
About PowerShow.com