Code Review for IPv4 Metarouter Header Format - PowerPoint PPT Presentation

About This Presentation
Title:

Code Review for IPv4 Metarouter Header Format

Description:

Code Review for IPv4 Metarouter Header Format – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 16
Provided by: kareny
Category:

less

Transcript and Presenter's Notes

Title: Code Review for IPv4 Metarouter Header Format


1
Code Review for IPv4 MetarouterHeader Format
Jing Lu jl1_at_arl.wustl.edu
2
Header Format
Lookup
Tx
Header Format
Rx
Parse
Substr Decap
  • Main functions
  • Put on MN Internal header (slow path), tunnel
    frame header (IP/UDP header) and Ethernet VLAN
    header based on
  • Exception flags raised by Parse block
  • TTL expired bit 0 of exception flags
  • IP option bit 1 of exception flags
  • Lookup result
  • Hit, Drop, Local delivery bits
  • If Rx UDP DPort Tx UDP SPort, packet should be
    redirected
  • Increment pre-queue packet counter and byte
    counter for each incoming packet based on counter
    index
  • Update buffer descriptor with new buffer/packet
    size, buffer offset and counter index
  • pass relevant fields to QM
  • NN communication
  • Single thread

3
Where is the code
  • Dispatch loop
  • IPv4_MR\src\dispatch_loop\PL\hdr_format_dl.c,h
  • IPv4_MR\src\dispatch_loop\PL\dl_source.c,h
  • IPv4_MR\src\dispatch_loop\PL\nn_rings.c,h
  • Header format
  • IPv4_MR\src\hdr_format\PL\hdr_format.c,h
  • Ipv4 header format
  • IPv4_MR\src\ipv4\PL\ipv4_hdr_format.c,h
  • External Dependencies
  • Ring Data format
  • IPv4_MR/src/dispatch_loop/PL/ring_formats.h
  • System definitions and memory locations
  • IPv4_MR/build/PL/dispatch_loop/dl_system.h

4
Required Includes
  • Files
  • IXA_SDK_4.0\microengineC\src\intrinsic.c
  • IXA_SDK_4.0\microengineC\src\rtl.c
  • Directories
  • IXA_SDK_4.0\src\library\microblocks_library\microc
    \
  • IXA_SDK_4.0\MicroengineC\include\..\..\..\..\
  • IXA_SDK_4.0\src\library\dataplane_library\microc\
  • These are required to gain access to the buffer
    libraries and intrinsic functions!

5
Input and Output
Buf Handle(32b)
Port (4b)
QID(20b)
Rsv_1 (4b)
Rsv_2 (4b)
Cntr Index (16b)
MN Fram Length (16b)
Hdr Format
Lookup
Buf Handle(32b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
Rx UDP DPort(16b)
Slice ID (VLAN) (16b)
Cntr Index (16b)
R S V d (1b)
H (1b)
D (1b)
Exception Bits (12b)
L D (1b)
H Hit D Drop LD Local Delivery Exception0
TTL Exception1 IP Option
Tx IP DAddr (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Port (4b)
QID(20b)
DA(8b)
Slice data pointer (32b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Rx IP SAddr (32b)
6
Initialization
  • Static configuration by XScale
  • Control block (12B)
  • Ethernet address
  • IP address (global IP)
  • Slice info table per slice (36B)
  • GPE IP address (local IP)
  • NPE IP address (local IP)
  • GPE Ethernet address
  • UDP SRC port
  • UDP DST port
  • Port
  • QID for local delivery
  • QID for exception packets

typedef struct _hdr_format_control_block
unsigned int eth_addr_hi32 unsigned int
eth_addr_lo16 unsigned int this_ip_addr
hdr_format_control_block
typedef struct _hdr_format_slice_info_table
unsigned int gpe_ip_addr unsigned int
npe_ip_addr unsigned int gpe_eth_addr_hi32
unsigned int gpe_eth_addr_lo16
unsigned int udp_src_port unsigned int
udp_dst_port unsigned int port
unsigned int ld_qid unsigned int
excpt_qid hdr_format_slice_info_table
7
Global Variables
  • Externally defined global variables
  • In hdr_format_dl.c
  • ring_in
  • ring_out
  • dlNextBlock
  • Initialization variables shared by all threads
  • In hdr_format.c
  • this_ip_addr
  • eth_addr_hi32
  • eth_addr_lo16
  • partial_ip_cksum (computed on known IP header
    fields)
  • header_format_init() will read the control block
    in SRAM and initialize these variables

8
Header Data Structure
DstAddr (6B)
Ethernet VLAN Header (18B)
SrcAddr (6B)
Type802.1Q (2B)
VLAN (2B)
TypeIP (2B)
Ver/HLen/Tos (2B)
Len (2B)
ID/Flags/FragOff set(4B)
TTL (1B)
IP Header (20B)
Header
Protocol UDP (1B)
Hdr Cksum (2B)
Dst Addr (4B)
Src Addr (4B)
Src Port (2B)
UDP Header (8B)
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
Same for all pkts
Rsvd, Type, (4B)
MN Internal Header (8,16B)
hdr_length (2B)
Vary per pkt
Rx UDP DPort (2B)
Rx IP SAddr (4B)
Rx UDP SPort (2B)
Type dependent data (8B)
9
Function and Performance
Functions
Memory access
Processing cycles Common case/worst case
Dequeue ring_in data
NN 9W reads
42/42
Construct MN int hdr
44/86
Construct IP, UDP, Ethernet, VLAN hdr
64/73
12/12
Set IP checksum
11/11
Set UDP checksum
DRAM 46-58B writes
37/40
Write hdr to DRAM
Inc Pre_queue Cnt
SRAM 8B writes
15/15
Update buffer descriptor
SRAM 10B writes
66/66
Enqueue ring_out data
NN 3W writes
27/27
318/372
10
Performance
  • 372 cycles for CPU processing
  • 1300 cycles latency
  • Expected performance (90B min IPv4 packet (78 min
    IPv4MN 12B IFS))
  • (201/372)5Gbps 2.7Gbps
  • To achieve 5Gbps, need two MEs running in parallel

11
IPv4 Internal Header Format
Type (28b)
0000
Length (2B)
Rx UDP DPort (2B)
Tx UDP DPort (2B)
Rx IP Saddr (4B)
Tx IP DAddr (4B)
Rx UDP SPort (2B)
Type Dependent
Data (8B)
Tx UDP SPort (2B)
Path Category Type field Reason Outgoing MN Internal Hdr
GPE-gtNPE 0 Reclassify Rx UDP DPort if set, otherwise Rx UDP Dport FwdKey
GPE-gtNPE
NPE-gt Egress LC Fast path No MN Int Hdr
NPE-gtGPE Exception 2 No route Rx UDP DPort
NPE-gtGPE Exception 3 Expired TTL Rx UDP DPort
NPE-gtGPE Exception 4 IP w/ options Rx UDP DPort FwdKey
NPE-gtGPE Exception 5 Redirect due to Rx UDP DPort Tx UDP SPort Rx UDP DPort FwdKey
NPE-gtGPE Control 6 Local delivery Rx UDP DPort
NPE-gtGPE Control 7 Inspect Rx UDP DPort
NPE-gtGPE Debug 8 Monitor Rx UDP DPort
NPE-gtGPE Debug 9 Log due to error in pkts Rx UDP DPort
FwdKey Tx UDP DPort Tx UDP Sport Tx IP
DAddr
12
Construct ipv4 MN Internal header
Yes
Drop bit set?
No
Yes
Hit bit set?
No
No
No
No
No
TTL expired?
Local DL?
Set NR bit in type
Redirect?
IP option?
Yes
Yes
Yes
Yes
No
Set TTL bit in type Set Rx UDP DPort Length 4
Set LD bit in type Set Rx UDP DPort Length 4
Set OPT bit in type Set Rx UDP DPort Set
TypeDependData Length 12
Set RD bit in type Set Rx UDP DPort Set
TypeDependData Length 12
TTL expired?
Yes
Set TTL bit in type Set Rx UDP DPort Length 4
86 cycles for the worst case 44 cycles for the
common case
return
13
Testing MR Header Format
Hdr Format
Dummy Lookup
Stub Parse
H Hit D Drop LD Local Delivery Exception0
TTL Exception1 IP Option
Buf Handle(32b)
Buf Handle(32b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
Rx UDP DPort(16b)
Slice ID (VLAN) (16b)
Lookup Key143-112 Slice ID/Rx UDP DPort (32b)
Cntr Index (16b)
R S V d (1b)
H (1b)
D (1b)
Exception Bits (12b)
L D (1b)
Lookup Key111-80 DA (32b)
Lookup Key 79-48 SA (32b)
Tx IP DAddr (32b)
Lookup Key 47-16 Ports (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Exception Bits (12b)
L Flags (4b)
Port (4b)
QID(20b)
DA(8b)
Slice data pointer (32b)
Slice Data Ptr (32b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Rx IP SAddr (32b)
Rx IP SAddr (32b)
  • Dummy Lookup block enumerates all combinations
    of the five bits and generates corresponding NN
    ring data to Hdr Format.

14
Possible Optimizations
Functions
Memory access
Optimizations
Processing cycles Common case/worst case
  • More efficient Dequeue

NN 9W reads
42/42 -10
Dequeue ring_in data
  • Reduce redundant assignments for worst case

44/86 -15
Construct MN int hdr
  • Static fields only initialized by the first
    packet in each thread

64/73 -20
Construct IP, UDP, Ethernet, VLAN hdr
12/12
Set IP checksum
11/11
Set UDP checksum
DRAM 46-58B writes
37/40
Write hdr to DRAM
  • Aligned sram writes, use assembler

SRAM 8B writes
15/15 -6
Inc Pre_queue Cnt
  • Similar to DRAM writes

SRAM 10B writes
66/66 -30
Update buffer descriptor
NN 3W writes
27/27
Enqueue ring_out data
318/372 -81
15
Implementation Status
  • Add dynamic statistics
  • Packet counter for fast path packets
  • Packet counter for exception path packets
  • Packet counter per exception case
  • Decide which field in buffer descriptor to store
    counter index
  • Run 8-thread simulation
Write a Comment
User Comments (0)
About PowerShow.com