Control Update 1: Phase 0 - PowerPoint PPT Presentation

About This Presentation
Title:

Control Update 1: Phase 0

Description:

Control Update 1: Phase 0 Fred Kuhns fredk_at_arl.wustl.edu Applied Research laboratory Department of Computer Science and Engineering Washington University in St. Louis – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 29
Provided by: fre67
Category:
Tags: basic | control | phase | update

less

Transcript and Presenter's Notes

Title: Control Update 1: Phase 0


1
Control Update 1Phase 0
  • Fred Kuhns
  • fredk_at_arl.wustl.edu
  • Applied Research laboratory
  • Department of Computer Science and Engineering
  • Washington University in St. Louis

2
Whats in the slides?
  • Some guiding requirements impacting design
  • What will the overlay networks look like (to me)
  • simple picture summarizing the relationships
    between the diversified networking model with our
    current design
  • Mapping IP to Ethernet addresses
  • simple picture depicting how we may associate the
    MAC layer next hop with a network layer next hop
  • Basic slice creation (i.e. conventional Planetlab
    Slice)
  • Creating an NP-based slice (what we add)
  • Run-time (production) support
  • dynamic control and configuration
    requirements/needs
  • Boot/configure time support
  • initial configuration of data plane and any debug
    needs
  • Meta-Router control
  • local delivery and exception packets
  • Configuration tool (cmd shell)
  • Testing packet generation using sp.

3
Goals/Charge
  • Create high performance PlanetLab node
  • Maintain compatibility with existing plab
    nodes/interfaces
  • external interfaces same as existing plab
  • where possible conform to existing plab
    abstractions, models, interfaces and development
    paradigms.
  • Extend interfaces
  • internal interfaces add NP abstractions,
    distributed resource management.
  • Special issues/concerns
  • Node audit service Meta-Net traffic (flow)
    accounting conforming to existing netflow stats
  • Virtual machine model and node manager interface
    extending rspec to account for NPs
  • Slice model extending to include heterogeneous
    nodes (realizing slivers)

4
Visualizing Ports, Links and Nodes
  • Meta-router uses a single UDP port number (i.e.
    meta-port)
  • any host/router may send traffic to the
    advertised IP address/UDP port pair
  • Only works if all meta-net traffic uses a single
    line card and physical port
  • Meta-router uses a UDP port per physical
    interface in use.
  • UDP tunnels act as meta-links
  • define a unique UDP tunnel between pairs of
    meta-routers
  • may have multiple UDP ports for each physical
    interface in use.

IPA
IPB
P0
P0
IPC
IPD
P0
P0
IPA
IPB


P0
P1
Pm
IPC
IPD


5
Mapping IP to Ethernet Destination Simple Case
  • A Meta-Router encapsulates its packets within a
    UDP datagram using the destination IP address and
    port number obtained from the lookup.
  • The packet is then sent to the line card
    encapsulated within an Ethernet 802.1p/q frame.
    The Ethernet destination address is obtained from
    the lookup.
  • The line card must replace the Ethernet header
    with one specifying the MAC layer next hop (eth
    addr).
  • For the demo we will assume there is only one
    next hop Ethernet device.

Substrate Router
Simplifying assumption For a given physical
output port, all packets use the same Ethernet
header, in particular the same Ethernet
destination address regardless of the IP
destination address.
Meta-Router
Line Card
IPW
Eth1
Eth2
IP rtr
IPZ
IPX
IPY
6
Mapping IP to Ethernet Destination Not so Simple
  • Context In general we can not assume there will
    only be one next hop Ethernet device.
  • Problem We can not assume the destination IP
    address corresponds to the next hop Ethernet
    device (the current designs built-in
    assumption).
  • Solutions
  • Create table mapping packet IP destination
    addresses to next hop Ethernet addresses.
  • Line card performs IP route lookup to obtainthe
    next hop IP address then uses ARP.
  • Meta-router supplies the next hop IPaddress then
    use ARP.
  • Meta-router suppliesthe next hop
    Ethernetaddress.

Substrate Router
Meta-Router
Line Card
IPW
Eth1
Ethernet Switch
Eth3
Eth2
IPY
7
A meta-router may use multiple physical ports
Meta-Router (NPE)
Ethernet Switch (in chasis)
Line Card
RTM


8
Basic Slice Creation No changes
  • Slice information is entered into PLC database.
  • Current Node manager pools PLC for slice data.
  • Planned PLC contacts Node manager proactively.
  • Node manager (pl_conf) periodically retrieves
    slice table.
  • updates slice information
  • creates/deletes slices
  • Node manager (nm) instantiates new virtual
    machine (vserver) for slice.
  • User logs into vserver using ssh
  • uses existing plab mechansism on GPE.

NPE
GPE
root ctx
NM
per Slice contexts
new Slice (Y)

X1
RM
slice X
Preallocated Ports (UDP)


sys-sw vnet
Eth1
Eth2
Ethernet Switch
Eth3
Line card (NPE)
Lookup table (TCAM)
filter
result
TUNX
Eth2
VLANX

default
Eth1
VLAN0
Default configuration forward traffic to the
(single) GPE, in this case the users ssh login
session.
9
Requesting NP
  • User requests shared-NP
  • Specify code option
  • Request UDP port number for overlay tunnel
  • Request local UDP port for exception traffic
  • Substrate Resource Manager
  • Configure SW Assign local VLAN to new meta
    router. Enable VLAN on switch ports.
  • Configure NPE allocates NP with requested code
    option (decision considers both current load and
    available options)
  • Configure LC(s) Allocate an externally visible
    UDP port number (from the preallocated pool of
    UDP ports for the external IP address). Add
    filter(s)
  • Ingress packets destination port to- local
    (chassis) VLAN and MAC destination address
  • Egress IP destination address (??) to- MAC
    destination address and RTM physical output port
  • Configure GPE Open local UDP port for exception
    and local delivery traffic from NPE. Transfer
    local port (socket) and results to client slice

GPE
NPE
root ctx
per Slice contexts
NM

X
slice X
RM
Slice Y
Preallocated Ports (UDP)


sys-sw vnet
Y
Eth2
Eth1
Ethernet Switch
VLANY
Exception and local delivery traffic. Only need
to install filter in TCAM.
Eth3
Line card (NPE)
Lookup table (TCAM)
filter
result
TUNX
Eth2
VLANX
TUNY
Eth2
VLANY

default
Eth1
VLAN0
Meta-network traffic uses UDP tunnels. Only need
to install filter in TCAM.
10
Software maintained Tables/Maps
  • Mappings/Associations needed for creating filters

Substrate Interface Substrate Interface Meta-Port Identifier Meta-Port Identifier
Line Card Physical Interface External IPAddress UDP port
Slot 0 Port 3 192.168.100.4 32405

Slot 1 Port 5 192.168.100.4 32400
Meta-Port to Physical Interface Table
Physical Port Next Hop MAC
1 Eth1

N EthN
Line Card Next Hop Table
11
Configure Ethernet Switch Step 1
  • Allocate next unused VLAN id for meta-net.
  • In this scenario can a meta-net have multiple
    meta-routers instantiated on a node?
  • If so then do we allocate switch bandwidth and a
    VLAN id for the meta-net or for each meta-router?
  • Configure Ethernet switch
  • enable VLAN id on applicable ports
  • need to know line card to meta-port (i.e. IP
    tunnel) mappings
  • if using external GigE switch then use SNMP
    (python module pysnmp)
  • if using Radisys blade then use SNMP???
  • set default QoS parameters, which are???
  • other ??

12
Configure NPE Step 2
  • vlan table
  • code option and instance number
  • memory for code options
  • instance base address, size and index/instance
  • each instance is given an instance number to use
    for indexing into a common code option block of
    memory
  • each code option is assigned a block of memory
  • code option base address and size. Also Max
    number of instances that can be supported.
  • Select NPE to host client MR
  • Select eligible NPEs (those that have the
    requested code option)
  • Select best NPE based on current load and do
    what???
  • Configure NPE
  • Add entry to SRAM table mapping VLANPORT to MR
    instance
  • What does this table look like?
  • Where is it?
  • Allocate memory block in SRAM for MR.
  • Where in SRAM are the eligible blocks located?
  • How do I reference the block?
  • 1) allocate memory for code option at load time
    2) allocate memory dynamically
  • Allocate 3 counter blocks for MR

13
Configure LC(s) Step 3
  • User may request specific UDP port number
  • Open UDP socket (on GPE)
  • open socket and bind to external IP address and
    UDP port number. This prevents other slices or
    the system from using selected port
  • Configure line card to forward tunnel(s) to
    correct NPE and MR instance
  • Add ingress and egress entries to TCAM
  • how do I know IPto-Ethernet destination address
    mapping for egress filter?
  • For both ingress and egress allocate QID and
    configure QM with rate and threshold parameters
    for MR.
  • Do I need to allocate a Queue (whatever this
    means)?
  • Need to keep track of qids (assign qid when
    create instance etc)
  • For egress I need to know the output physical
    port number. I may also need to know this for
    ingress (if we are using external sw).

14
Configuring GPE Step 4
  • Assign local UDP port to client for receiving
    exception and local delivery traffic.
  • user may request specific port number.
  • use either a preallocated socket or open a new
    one.
  • use UNIX domain socket to pass socket back to
    client along with other results.
  • all traffic will use this UDP tunnel, this means
    the client must perform IP protocol processing of
    encapsulated packet in user space.
  • for exception traffic this makes sense.
  • for local delivery traffic the client can use a
    tun/tap interface to send packet back into Linux
    kernel so it can perform more complicated
    processing (such as TCP connection management).
    Need to experiment with this.
  • should we assign a unique local IP address for
    each slice?
  • Result of shared-NPE allocation and socket sent
    back to client.

15
Run-Time Support for Clients
  • Managing entries in NPE TCAM (lookup)
  • add/remove entry
  • list entries
  • NPE Statistics
  • Allocate 2 blocks of counters pre-queue and
    post-queue.
  • clear block counter pair (Byte/Pkt) ???
  • get block counter pair (Byte/pkt)
  • specify block and index
  • get once, get periodic
  • get counter group (Byte/pkt)
  • specify counter group as set of tuples (index,
    block),
  • SRAM read/write
  • read/write MR instance specific SRAM memory block
  • relative address and byte count, writes include
    new value as byte array.
  • Line card Meta-interface packet counters, byte
    counters, rates and queue thresholds
  • get/set meta-interface rate/threshold
  • Other
  • Register next hop nodes as the tuple (IPdst,
    ETHdst), where IPdst is the destination address
    in the IP packet. The ETHdst is the corresponding
    Ethernet address.
  • Can we assume the destination ethernet address is
    always the same?

16
Boot-time Support
  • Initialize GPE
  • Initialize NPE
  • Initialize LC
  • things to init
  • spi switch
  • memory
  • microengine code download
  • tables??
  • default Line card tables
  • default code paths
  • TCAM

17
IP Meta Router Control
  • All meta-net traffic arrives via a UDP tunnel
    using a local IP address.
  • raw IP packets must be handled in user space.
  • complete exception traffic processing in user
    space.
  • local delivery traffic can we inject in Linux
    kernel so it performs transport layer protocol
    processing? This would also allow application to
    use the standard socket interface.
  • should we use two different IP tunnels, one for
    exception traffic and one for local delivery?
  • Configuration responsibilities?
  • Stats monitoring for demo?
  • get counter values
  • support for traceroute and ping
  • ONL -like monitoring tool
  • Adding/removing routes
  • static routing tables or do we run a routing
    protocol?

18
IP-Meta Router
  • Internal packet format has changed.
  • see Jings slides
  • Redirect not in this version of the meta-router

19
XScale Control Software
  • Substrate Interface
  • Raw interface for reading/writing arbitrary
    memory locations.
  • substrate stats?
  • add new meta-router
  • Meta-router/Slice interface
  • all requests go through a local entity (managed)
  • not needed authenticate client
  • validate request (verify memory location and
    operation)
  • Node Initialization
  • ??

20
Command/Configuration Tool
  • Simple command interpreter with syntax similar to
    lisp
  • Basic syntaxexpr cmd argarg ( expr
    ) array scalar string
  • Commands are either arithmetic expressions or
    some system defined operation (mem, vmem, set,
    etc.)
  • Command arguments are typed scalar and array
    values integers, double and string
  • Allow you to read/write any location in physical
    memory interactively or via a script.

arg type POSIX Traditional
dw1 uint8_t unsigned char
dw2 uint16_t unsigned short
dw4 uint32_t unsigned int
dw8 uint64_t unsigned long long
string NA char BUF
double double double
21
Example Operations
cmdgt a (dw4 0x01010101 \
0x02020202 \ 0x03030303) cmdgt b
a (dw4 0x01010101 0x02 4) ltb,0x2020202,0x202
0204,0x4040408gt cmdgt c 3 b2 2 - 4
ltc, 134744079gt cmdgt (dw4 c) ltTEMP16, 0x808080f
(RO)gt cmdgt t "text one" \ " two"
ltt, "text one two"gt cmdgt set Symbol Table
lta,0x1010101,0x2020202,0x3030303gt
ltb,0x2020202,0x2020204,0x3030307gt
ltc,101058061gt ltt,"text one two"gt
cmdgt help Usage lttypegt type is one of
int, dw8, dw4, dw2, dw1, dbl
load "file_name" mem commands to manage
internal memory maps mem read maps
mem show maps mem read paddr type count
mem write paddr value vmem read/write to
kernel virutal memory vmem read
vaddr type count vmem write vaddr value
22
Reading Memory Maps
  • cmdgt mem read maps
  • Adding symbols
  • ltDRAM0_PADDR, 0gt ltDRAM0_VADDR, 0xa7480000gt
    ltDRAM0_SIZE, 0x20000000gt
  • ltDRAM0_CSR_PADDR, 0xd0009000gt
    ltDRAM0_CSR_VADDR, 0xa73d0000gt ltDRAM0_CSR_SIZE,
    0x1000gt
  • ltDRAM1_CSR_PADDR, 0xd000a000gt
    ltDRAM1_CSR_VADDR, 0xa73f0000gt ltDRAM1_CSR_SIZE,
    0x1000gt
  • ltDRAM2_CSR_PADDR, 0xd000b000gt
    ltDRAM2_CSR_VADDR, 0xa7410000gt ltDRAM2_CSR_SIZE,
    0x1000gt
  • ltSRAM0_PADDR, 0x80000000gt ltSRAM0_VADDR, 0gt
    ltSRAM0_SIZE, 0gt
  • ltSRAM1_PADDR, 0x90000000gt ltSRAM1_VADDR,
    0xc7490000gt ltSRAM1_SIZE, 0x800000gt
  • ltSRAM2_PADDR, 0xa0000000gt ltSRAM2_VADDR,
    0xc7ca0000gt ltSRAM2_SIZE, 0x800000gt
  • ltSRAM3_PADDR, 0xb0000000gt ltSRAM3_VADDR,
    0xc84b0000gt ltSRAM3_SIZE, 0x800000gt
  • ltSRAM0_CSR_PADDR, 0xcc010000gt
    ltSRAM0_CSR_VADDR, 0xa7440000gt ltSRAM0_CSR_SIZE,
    0x1000gt
  • ltSRAM1_CSR_PADDR, 0xcc410000gt
    ltSRAM1_CSR_VADDR, 0xa7450000gt ltSRAM1_CSR_SIZE,
    0x1000gt
  • ltSRAM2_CSR_PADDR, 0xcc810000gt
    ltSRAM2_CSR_VADDR, 0xa7460000gt ltSRAM2_CSR_SIZE,
    0x1000gt
  • ltSRAM3_CSR_PADDR, 0xccc10000gt
    ltSRAM3_CSR_VADDR, 0xa7470000gt ltSRAM3_CSR_SIZE,
    0x1000gt
  • cmdgt mem show maps
  • DRAM Channel 0 kpa 0x00000000, kva 0xa7480000,
    Size 536870912 (cachable 0, bufferable 0)
  • DRAM CSR Ch 0 kpa 0xd0009000, kva 0xa73d0000,
    Size 65536 (cachable 0, bufferable 0)
  • DRAM CSR Ch 1 kpa 0xd000a000, kva 0xa73f0000,
    Size 65536 (cachable 0, bufferable 0)
  • DRAM CSR Ch 2 kpa 0xd000b000, kva 0xa7410000,
    Size 65536 (cachable 0, bufferable 0)

23
Possible configuration script
  • set MYTABLE_START 0xXXXXXXX
  • mem write MYTABLE_START (dw4 0x00000000 \
  • DEFAULT_ADDR \
  • DEFAULT_VLAN)
  • set ETHER_ADDR 00e44d330000
  • ETHER_ADDR5 2
  • mem write ETHER_TABLE0 ETHER_BASE
  • ETHER_BASE5 3
  • mem write ETHER_TABLE0 ETHER_BASE
  • mem write (MYTABLE_START 20) (mem read
    SOMEPLACE dw4 1)

24
Testing Generating Packets
  • sp ltargumentsgt
  • ---------Packet/data Sending Rate
    --------------
  • (-n--pcnt) n Number of pkts to send.
    default 100000.
  • (-x--pps) rate Pkt/sec, default 100
  • --Kbps rate Kbps for IP datagrams.
    default 0 Kbps
  • --KBps KBps KBps for IP datagram
  • ---------When pkts are sent, see below
    for description--------
  • (-m--mode) m m is one of
    contburstswait
  • (-p--period) p send ltbgt pkts every
    ltpgt msecs. default 0 msec
  • (-B--batch) b Number of pkts to
    send in a batch, default 0 pkts
  • --pdelay n nsec inter-packet gap
  • ---------Packet Size, Specify only one
    --------------------------
  • --dlen b Size of payload in bytes,
    default 4
  • --------Flags affecting pkt size or
    content -----------------
  • --dtype) type Packet data type (zero,
    seq, UDP)
  • --ftype type Type of frame to send
    (raw, udp, tcp, data)
  • --file name Name of file containing
    the raw packet data (ftype raw)
  • ---------Network addressing information
    -------------------------
  • --sa host Use local address "host",
    default INADDR_ANY

25
Example Command
  • Example using a constant inter-packet gap
  • sp -n 10 --pps 1 --mode cont --ftype raw \
  • --file Rx_NPUA_Dev_0_Port_0.log --ifn
    eth2
  • -n 10 send a total of 10 packets
  • --pps 1 send at a rate of 1 packets per
    second
  • --mode cont use a constant inter-packet delay
    calculated from pps
  • --ftype selects the RAW packet interface
    protocol family
  • --file Rx_NPUA_Dev_0_Port_0.log read packet
    contents from file
  • --ifn eth2 send packets out interface eth2
  • Or for low packet rates use burst mode
  • sp -n 10 --pps 1 --mode burst --ftype raw \
  • --file Rx_NPUA_Dev_0_Port_0.log --ifn
    eth2
  • only difference is the option --mode burst
  • Printing the help message
  • sp --help
  • ...
  • I have copied an example packet file into the bin
    directory (/opt/bin)

26
Example File Rx_NPUA_Dev_0_Port_0.log
  • 0102030405060708090a0b0c81000aaa080045000050000000
    00ff003a5ac0a80001c0a8000200010002003cff1b00850000
    4500003000000000ff113a69c0a80001c0a800020001000200
    1cd3b4ddddddddddddddddddddddddddddddddddddddddcaa0
    8273
  • 0102030405060708090a0b0c81000aaa080045000050000000
    00ff003a5ac0a80001c0a8000200010002003cfedc00c40000
    4500003000000000ff113a69c0a80001c0a800020001000200
    1cd3b4dddddddddddddddddddddddddddddddddddddddd3069
    42f9
  • 0102030405060708090a0b0c81000aaa08004500004c000000
    00ff003a5ec0a80001c0a80002000100020038ffa855000030
    00000000ff112a69c0a80001c0a8000200010002001cd3b4dd
    dddddddddddddddddddddddddddddddddddddddb16526b

27
Testing Environment Generating Traffic
  • What sort of packet generations features are
    useful? What do you need?
  • Generate packets identical to those used in
    simulation?
  • specify on command line?
  • Do you need to generate arbitrary Ethernet
    headers or can we preconfigured the hosts
    Ethernet interface to use VLANs?
  • Do you need to specify arbitrary UDP tunnel
    headers or can we use the standard socket
    mechanism to establish the tunnel?
  • The encapsulated IP and transport headers will be
    built up by the program (sp) and thus must be
    specified on the command line.
  • or is there a default encapsulated header that
    will do and can preconfigured at compile time?
    This can be overloaded at run time.

28
Expected Ethernet Frame Format (see 802.3ac)
Destination (6 B)
Destination Address cont.
Ethernet Hdr
Source Address (6 B)
EtherType (vlan 0x8100)
Source Address cont.
prio
CFI
Original EtherType
VID
Version
HdrLen
TOS
Total length
Fragment offset
Identification
Flags
IP Hdr
TTL
Protocol
IP Header checksum
Tunnel Headers
IP Source Address
IP Destination Address
UDP Hdr
dport
sport
cksum
length
Version
HdrLen
TOS
Total length
Fragment offset
Identification
Flags
TTL
Protocol
IP Header checksum
IP Source Address
IP Destination Address
Encapsulated IP Datagram
transport header
Payload
Frame Check Sequence (FCS)
Tag control information (TCI) Priority (3-bits),
Canonical format indicator (CFI) (1-bit), VLAN ID
(VID) (12-bit), Length/Type (16-bit). CFI should
always be set to zero (CFI 0). VID 0
identifies priority frames (what does this
mean?). VID 4095 (0xfff) is reserved. Minimum
frame size is 65B
Write a Comment
User Comments (0)
About PowerShow.com