Substrate Control: Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Substrate Control: Overview

Description:

5. Washington. WASHINGTON UNIVERSITY IN ST LOUIS. Fred Kuhns ... node components not in hub (switch, GPEs, Development Hosts) FPk. FPk. FPx. NPE. SRAM ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: fredk5
Category:

less

Transcript and Presenter's Notes

Title: Substrate Control: Overview


1
Substrate Control Overview
  • Fred Kuhns
  • fredk_at_arl.wustl.edu
  • Applied Research Laboratory
  • Washington University in St. Louis

2
Defining Terms and Models
3
The SPP Node
  • Slice instantiation
  • Allocate virtual machine (VM)instance on a GPE
  • may request code option instance, NPE resources
    and bandwidth
  • Share a common set of (global) IPaddresses
  • UDP/TCP port space shared across GPE/NPEs
  • Line card TCAM Filters direct traffic
  • unregistered traffic originating outside the
    nodeis sent to the CP.
  • unregistered traffic originating within node
    usesNAT (on line card)
  • application may register server ports. Causes
    filter to be inserted in the line card directing
    traffic to specific GPE
  • application must register ports (or tunnels)
    associated with fast path instances
  • It is assumed that fast path instances will use
    tunnels (overlays) to send traffic between
    routing nodes.
  • Currently we only support UDP tunnels but will
    extend to include GRE and possibly others.

GPE
NPE
local delivery/exceptions, uses an Internal UDP
Tunnel
LC
Egress
map flow to internal destination

SCD (ARP, nat)
Ingress
Internet
4
Meta-Interfaces and Tunnels
  • Slice Fast path (Code option instance, allocated
    resources) are assumed to sit at one end of a
    tunnel
  • currently only UDP tunnels are supported.
  • UDP Tunnel is defined by the 4-tupleUDP tunnel
    peer ipaddr, peer port, local ipaddr, local
    port
  • Meta-interface or MI Represents a tunnel
    endpoint as viewed by a slices the fast path
    router. A meta-interface is defined by the local
    endpoints addressMeta-Interface local ipaddr,
    local UDP port
  • The encapsulated packet is processed by the fast
    path.
  • packet is always encapsulated within a tunnel by
    the substrate
  • code option instance processes the encapsulated
    frame
  • In the SPP context, slice registers MI and
    substrate manages encapsulation headers
  • Guard against forging source address
  • A filter is installed in the corresponding line
    cards TCAM to send matching packets to the
    correct NPE
  • NPEs decap module verifies the encapsulation
    header and provides isolation between slices
    (based on local IP and port number values in the
    tunnel header)
  • Fabric VLANs are used to provide link level
    isolation between slice instances. The VLAN label
    is also used by the substrate to associate
    packets with slice fast paths.

MI IP Address UDP Port
0 192.168.1.2 6060
1 192.168.1.3 6060
2 192.168.1.2 6061
3 192.168.1.2 6062
4 192.168.1.3 6061
5 192.168.1.3 6062
6 192.168.1.3 6063
MI local tunnel endpoint (UDP), external
ipaddr, udp_port
fast path (FPx)
meta-interfaces
0
1
2
3
4
5
6
5
Lookup Table, TCAM, Use
6
Lookup filters Key, Action and Result
  • A lookup key is then created from the packets
    header fields and the receiving meta-interface
  • code option extracts fields from the encapsulated
    packet
  • substrate adds the receiving meta-interface
    identifier
  • If no entry is found then the packets no_route
    exception attribute is set, otherwise a result is
    returned containing an action field and
    forwarding information (output meta-interface and
    next hop address)
  • a code option may define additional exception
    attributes
  • The complete filter specification lookup_key,
    result_vector
  • lookup_key RxMI, copt_key
  • RxMI Meta interface ID on which the packet was
    received.
  • copt_key Lookup key defined by the code option.
    The IPv4 keydaddr(32),saddr(32),sport(16),dport
    (16),tcp_flgs(8),proto(8)
  • result_vector sindx, action, qid, TxMI,
    nexthop
  • sindx stats index
  • action Packet disposition, one of drop, fwd,
    ld
  • drop drop packet
  • fwd forward packet using next hop value
    (fwdkey)
  • ld local delivery, code option instance has
    local address information??
  • qid packet Queue
  • TxMI Meta-interface used for sending packet,
    corresponds to a previously registered local
    tunnel endpoint. Used to fill in the local
    address of the outgoing packet tunnel header.
  • nexthop Tunnel endpoint for the next hop. For
    UDP tunnels, this is the IP address and UDP port
    number of the next hop device.

7
Slice view of the Lookup Key
user specified lookup key (4 - 32-bit words)
slice defined fields
xmi
xsid
128-N
N
12
  • When a packet is received the substrate creates a
    lookup key using the target slices xsid and the
    receiving meta-interface. The remaining bits are
    defined by the code option.
  • xsid represents the internal slice ID and may
    differ from the value of xsid. For implementation
    efficiency, this is the VLAN identifier assigned
    to the slice.
  • xmi Internal representation of the
    meta-interface (MI), encoding of the received
    tunnel endpoint.
  • For UDP tunnels this field includes a 4-bit
    interface id and the 16 bit local UDP port
    number. The 4-bit id is used as an index into a
    table of local IP addresses.
  • The IPv4 code option defined fields are shown
    below where pr is the IP protocol field and tcp
    is the TCP header flags.

8
IPv4 TCAM Filter Formats (on NPE)
Defined by the IPv4 Code Option, 112bits
Substrate defined
tcp/proto
if
daddr
saddr
sport
dport
vlan
T
RX port
11
1
16
4
32
32
16
16
16
Represents input meta-interface
6
8
2
T 0 Normal Lookup T 1 substrate only lookup
Result, 64 bits
L
D
rsv
TX IP daddr
TX dport
TX sport
rsv
QM
rsv
sindx
qid
Sch
32
16
12
15
16
3
11
3
1
1
2
16
TX IP address and sport represents the output
meta-interface. The dport is provided by the
slice. (RMP maps miid to tx tunnel params, use
dport provided by slice)
global stats index (SCD maps slices sindx to
global value)
20-bit internal qid (SCD maps slices miid to QM
and Sch. SCD Also maps slices qid to global qid
value)
D Drop packet L Local delivery
Slice parameters
Key Input miid, IPv4 fltr daddr, saddr, sport,
dport, tcp/proto
Result Flags Drop, GPE, sindx, Output miid, QID
9
Lookup
  • Parse block make copt_key.
  • Substrate add the xsid and xmi fields.
  • Substrate uses the TxMI and nexthop fields to
    construct encapsulation header

parse block
decap
Lookup A
xsidRxMIcopt_key
sindxactionqidTxMInexthop
TxMInexthop
...
...
10
Version 2 and Multicast
  • In version 2 there will be 2 stages to the
    lookupadd fanout (count) to lookup B.
  • if fanout gt 1 then address of fanout else result
    vector Chain fanout blocks
  • TxMI includes an interface vector 4-bit field
    that is used to lookup interface IP address and
    MAC address.

fanout Table
qidTxMInexthop
VLAN table in header format and VLAN table in
Decap/Parse
parse block
...
sindex passed from side A
decap
overloaded with fanout address
rindx
LookupA
LookupB
lookup_key
actionsindxrindx
sindxqidTxMInexthop
result_index
...
...
...
11
Lookup Example
  • When a code option is requested the slice is
    allocated the requested number of TCAM entries
    fid e 0,..., Nf-1
  • all TCAM operations accept a TCAM entry ID (fid)
  • Entries are listed in priority order with fid0
    the highest priority and entry Nf-1 the lowest.
  • It is up to the slice control path to order the
    lookup entries.
  • For example if we have the simple routing
    database
  • 10.10.2.1/32 Local delivery (GPE)
  • 10.5.2.0/24 NH A
  • 10.5.1.0/24 NH B
  • 10.5.0.0/16 NH C

Slice Meta-Interfaces
Slice BW Allocations
MI IP Address UDP Port
0 192.168.1.2 6060
1 10.50.10.2 6061
2 10.50.10.2 6062
3 10.1.1.1 6060
Interface BW ipAddr
0 BE 192.168.1.2
1 100Mbps 10.50.10.2
2 10Mbps 10.1.1.1
Slice Queue Bindings
QID Interface BW max Bytes
0 0 - Local
1 1 40 1024
2 1 60 1024
3 2 100 1024
Desired Route Table (LPM)
prefix TxMI nexthop
10.10.2.1/32 0 Local
10.5.2.0/24 1 NH A
10.5.1.0/24 2 NH B
10.5.0.0/16 3 NH C
  • Then the control software could use the
    following
  • write_fltr(fid, rxmi, prefix,width, action,
    qid,TxMI,nexthop)
  • write_fltr(0, , 10.10.2.1, 0xFFFFFFFF, LD)
  • write_fltr(1, , 10.5.2.0, 0xFFFFFF00, fwd,
    1, 1, NHA)
  • write_fltr(2, , 10.5.1.0, 0xFFFFFF00, fwd,
    2, 2, NHB)
  • write_fltr(3, , 10.5.0.0, 0xFFFF0000, fwd,
    3, 3, NHC)

12
Example IPv4 LPM
  • In general for longest prefix match a good
    strategy is to divide allocated filters into 32
    sets
  • For example assume 1024 TCAM entries have been
    allocated and we are using LPM.
  • Divide the filters into 32 sets of 32 filters
    each and associate a prefix length with each
  • Then for a particular prefix width add it to the
    appropriate set.
  • Entries within a set are non-overlapping so their
    order doesnt matter.
  • This is the scheme used by software written by
    IDT, the manufacturer of the TCAM we currently
    use.

Prefix Width Filter ID Range
32 0 - 31
31 32-63
w (32-w)32 (0...31)
1 992 - 1023
13
Keeping track of TCAM entries
  • Substrate will have to manage the mapping of VM
    TCAM filter IDs to the actual filter ID.
  • VM control software will use a normalized filter
    index list (starts at 0 and has the requested
    number of filters entries).
  • The SCD (xscale daemon) must map the per-VM index
    into the actual TCAM Index.
  • Source for managing TCAM entries.
  • NPU A and B share a common TCAM and index range
    so this must be managed across the two xscales.
  • See C implementation of the RangeMap class in
    WUSRC/range
  • Class will also be used for managing the QID name
    space.

14
Control SoftwareResource Management
15
System Resource Manager
Resource DB
Support fast path configuration via the PLC
SRM
CP
GPE
vmx
NMP
NPE
SCD
RMP
LC
SRAM
SCD
FPk
FPk
root context
FPx
planetlab OS
vnet
Exception and Local delivery traffic. Includes
shim header with RxMI.
16
Partitioning of (substrate) Responsibilities
  • Virtual Machine (Slice control SW) Application
    logic, code option specific control and data
    operations.
  • traditional PlanetLab slice operations
  • manage code option specific lookup tables, stats,
    memory and configuration blocks
  • implements interface with fast path for exception
    and local delivery traffic
  • vnet
  • flow isolation filtering traffic through the
    linux kernel
  • add support for VLAN- based filtering and port
    reservation
  • Resource Manager Proxy (aka Local Resource
    Manager)
  • all VM commands are issued to the RMP
  • the RMP is able to validate command sender
    (authenticate)
  • enforce access restrictions (authorize)
  • decouples VMs from substrate control entities.
    That is, maps exported abstractions and
    interfaces to specific hardware and software
    interfaces.
  • verifies (or inserts) substrate message header
    slice IDs to prevent deliberate or accidental
    masquerading - part of ensuring isolation and
    security.
  • in tandem with SRM implements device independent
    logic
  • System Resource Manager
  • device independent logic
  • responsible for implementing and enforcing
  • system resource abstractions
  • resource isolation and allocation policies

17
Responsibilities
System tables
Interfaces
ifntype,ipaddr,linkBW,availBW
...
endpoint (port) maps
resvMap
availMap
usedMaps
xsidMap
Per Slice Tables
xsid
vlan
SRM (the Decider)
RMP
request allocation
SCD (NPE)
Tables in data Path
  • RMP Responsibilities
  • Translate slice MI to local endpoint. Either call
    SRM or cache mappings.
  • Add xsid to subMsg header
  • Pass through identifiers mapped by SCD qid, fid
    and stats.
  • Pass through relative queue weights, SCD maps to
    global weight.

SRAM
base
real indx
xsidoffset
VLAN Table
vlan
xsidoffset
fid
xsidsize
real indx
real indx
Queue Params
make allocation
sid
qid
HF Control Block?
xsidrange
  • SCD Responsibilities
  • Translate slice specific indices to global
    indices qid, fid and stats.
  • Knows the location of all tables
  • Interprets commands to add, remove and modify
    entries to data path tables.
  • Knows per slice interface BW allocation and maps
    relative queue weight to global weight.
  • Each interface schedule is assigned (by SRM) max
    rate.

code option control blocks?
ranges are not required to be contiguous
Per interface scheduler and rate limits
Per Slice data
Slice Maps
xsid qidMap,FidMap,statsMap Interface BW
18
Queuing and allocating Interface Bandwidth
19
Simple Queuing Example
Slice Interface and Queue Allocations Port,
BW, QList, Qlist qid, weight, threshold,...
NPE
wrr
Physical Port (Interface) Attributes ifn,
type, ipaddr, linkBW, availBW ifn Interface
number type Internet, Peering Operations get_
interfaces() get_ifattrs(ifn) get_ifpeer(ifn) a
lloc_ifbw(ifn,xsid,bw)
q10
q11
FP slice1
...
qid in 0...n-1
BW11
q1n
q20
LC
q21
FP slice2
...
qid in 0...m-1
q2m
wrr
FP1
BW1
FP2
ipAddr
BW11 BW21 BW1
GPE
linkBW
GPE
BW21
20
Substrate Message Format
21
Substrate Message
  • Assume a simple command response (two-way)
    messaging framework. But will support one-way
    schemes..
  • Supports asynchronous communications using a
    message ID.
  • The command field is overloaded for the return
    code.
  • Every server is expected to implement a simple
    Version command (cmd 0) which return the
    servers ID and Version number as two 32-bit
    fields.
  • primary use is for monitoring health of servers
    and debugging.
  • All other command values are uniique only to a
    particular server.
  • Uses UDP as the transport protocol.
  • All commands are expected to be idempotent

0
15
0
15
msg header
  • mlen Total message length, including the header.
  • mid Message ID, used to support synchronous
    message processing.
  • cid context identifier. Specifies context within
    which the message is processed. A value of 0
    indicates substrate context.
  • cmd Command to execute or a return code.
  • The 4 header fields are each 16 bites.
  • body 0 or more bytes of command data.

22
Overview
  • In the interface specifications I provide a
    c-like description of the operations and results.
  • The descriptions are only intended to describe
    the actual message format, data fields and
    returned results. It is not meant to specify an
    application level library.
  • The arguments are to be encoded into the message
    body in the order that are given, using network
    byte order (Big Endian) and without padding.
  • All commands result in
  • No return response one-way call semantics
  • an error occurs processing the message or command
    encounters and unexpected condition or error. In
    this case the return message will have the error
    return code in the cmd field.
  • The command completes and does not indicate and
    error to the message framework then the message
    result code indicates success. The message body
    contains any result data.

23
Example Message
  • Slice with xsid of 0x10 requests the allocation
    of a global UDP port (decimal 17) for the local
    IP address 128.252.130.34 (hex 0x80FC8222).
  • Assume the alloc_port command ID is 4.
  • port alloc_port(0x80FC8222, 0, 17)
  • Allocate a global UDP (decimal 17) port for the
    local IP address 128.252.130.34 (hex 0x80FC8222),
    and let the system assign the next available port
    number.
  • The resource manager allocates port 5050
    (0x13BA), the return code of 0 indicates success.

Command Message
Reply Message
24
NAT
25
  • Problem
  • UDP, TCP 2 or more GPEs attempt to use same
    global IP, Port and Proto
  • ICMP ???

26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com