Title: Code Review for IPv4 Metarouter Header Format
1Code Review for IPv4 MetarouterHeader Format
Jing Lu jl1_at_arl.wustl.edu
2Header Format
Lookup
Tx
Header Format
Rx
Parse
Substr Decap
- Main functions
- Put on MN Internal header (slow path), tunnel
frame header (IP/UDP header) and Ethernet VLAN
header based on - Exception flags raised by Parse block
- TTL expired bit 0 of exception flags
- IP option bit 1 of exception flags
- Lookup result
- Hit, Drop, Local delivery bits
- If Rx UDP DPort Tx UDP SPort, packet should be
redirected - Increment pre-queue packet counter and byte
counter for each incoming packet based on counter
index - Update buffer descriptor with new buffer/packet
size, buffer offset and counter index - pass relevant fields to QM
- NN communication
- Single thread
3Where is the code
- Dispatch loop
- IPv4_MR\src\dispatch_loop\PL\hdr_format_dl.c,h
- IPv4_MR\src\dispatch_loop\PL\dl_source.c,h
- IPv4_MR\src\dispatch_loop\PL\nn_rings.c,h
- Header format
- IPv4_MR\src\hdr_format\PL\hdr_format.c,h
- Ipv4 header format
- IPv4_MR\src\ipv4\PL\ipv4_hdr_format.c,h
- External Dependencies
- Ring Data format
- IPv4_MR/src/dispatch_loop/PL/ring_formats.h
- System definitions and memory locations
- IPv4_MR/build/PL/dispatch_loop/dl_system.h
4Required Includes
- Files
- IXA_SDK_4.0\microengineC\src\intrinsic.c
- IXA_SDK_4.0\microengineC\src\rtl.c
- Directories
- IXA_SDK_4.0\src\library\microblocks_library\microc
\ - IXA_SDK_4.0\MicroengineC\include\..\..\..\..\
- IXA_SDK_4.0\src\library\dataplane_library\microc\
- These are required to gain access to the buffer
libraries and intrinsic functions!
5Input and Output
Buf Handle(32b)
Port (4b)
QID(20b)
Rsv_1 (4b)
Rsv_2 (4b)
Cntr Index (16b)
MN Fram Length (16b)
Hdr Format
Lookup
Buf Handle(32b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
Rx UDP DPort(16b)
Slice ID (VLAN) (16b)
Cntr Index (16b)
R S V d (1b)
H (1b)
D (1b)
Exception Bits (12b)
L D (1b)
H Hit D Drop LD Local Delivery Exception0
TTL Exception1 IP Option
Tx IP DAddr (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Port (4b)
QID(20b)
DA(8b)
Slice data pointer (32b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Rx IP SAddr (32b)
6Initialization
- Static configuration by XScale
- Control block (12B)
- Ethernet address
- IP address (global IP)
- Slice info table per slice (36B)
- GPE IP address (local IP)
- NPE IP address (local IP)
- GPE Ethernet address
- UDP SRC port
- UDP DST port
- Port
- QID for local delivery
- QID for exception packets
typedef struct _hdr_format_control_block
unsigned int eth_addr_hi32 unsigned int
eth_addr_lo16 unsigned int this_ip_addr
hdr_format_control_block
typedef struct _hdr_format_slice_info_table
unsigned int gpe_ip_addr unsigned int
npe_ip_addr unsigned int gpe_eth_addr_hi32
unsigned int gpe_eth_addr_lo16
unsigned int udp_src_port unsigned int
udp_dst_port unsigned int port
unsigned int ld_qid unsigned int
excpt_qid hdr_format_slice_info_table
7Global Variables
- Externally defined global variables
- In hdr_format_dl.c
- ring_in
- ring_out
- dlNextBlock
- Initialization variables shared by all threads
- In hdr_format.c
- this_ip_addr
- eth_addr_hi32
- eth_addr_lo16
- partial_ip_cksum (computed on known IP header
fields) - header_format_init() will read the control block
in SRAM and initialize these variables
8Header Data Structure
DstAddr (6B)
Ethernet VLAN Header (18B)
SrcAddr (6B)
Type802.1Q (2B)
VLAN (2B)
TypeIP (2B)
Ver/HLen/Tos (2B)
Len (2B)
ID/Flags/FragOff set(4B)
TTL (1B)
IP Header (20B)
Header
Protocol UDP (1B)
Hdr Cksum (2B)
Dst Addr (4B)
Src Addr (4B)
Src Port (2B)
UDP Header (8B)
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
Same for all pkts
Rsvd, Type, (4B)
MN Internal Header (8,16B)
hdr_length (2B)
Vary per pkt
Rx UDP DPort (2B)
Rx IP SAddr (4B)
Rx UDP SPort (2B)
Type dependent data (8B)
9Function and Performance
Functions
Memory access
Processing cycles Common case/worst case
Dequeue ring_in data
NN 9W reads
42/42
Construct MN int hdr
44/86
Construct IP, UDP, Ethernet, VLAN hdr
64/73
12/12
Set IP checksum
11/11
Set UDP checksum
DRAM 46-58B writes
37/40
Write hdr to DRAM
Inc Pre_queue Cnt
SRAM 8B writes
15/15
Update buffer descriptor
SRAM 10B writes
66/66
Enqueue ring_out data
NN 3W writes
27/27
318/372
10Performance
- 372 cycles for CPU processing
- 1300 cycles latency
- Expected performance (90B min IPv4 packet (78 min
IPv4MN 12B IFS)) - (201/372)5Gbps 2.7Gbps
- To achieve 5Gbps, need two MEs running in parallel
11IPv4 Internal Header Format
Type (28b)
0000
Length (2B)
Rx UDP DPort (2B)
Tx UDP DPort (2B)
Rx IP Saddr (4B)
Tx IP DAddr (4B)
Rx UDP SPort (2B)
Type Dependent
Data (8B)
Tx UDP SPort (2B)
Path Category Type field Reason Outgoing MN Internal Hdr
GPE-gtNPE 0 Reclassify Rx UDP DPort if set, otherwise Rx UDP Dport FwdKey
GPE-gtNPE
NPE-gt Egress LC Fast path No MN Int Hdr
NPE-gtGPE Exception 2 No route Rx UDP DPort
NPE-gtGPE Exception 3 Expired TTL Rx UDP DPort
NPE-gtGPE Exception 4 IP w/ options Rx UDP DPort FwdKey
NPE-gtGPE Exception 5 Redirect due to Rx UDP DPort Tx UDP SPort Rx UDP DPort FwdKey
NPE-gtGPE Control 6 Local delivery Rx UDP DPort
NPE-gtGPE Control 7 Inspect Rx UDP DPort
NPE-gtGPE Debug 8 Monitor Rx UDP DPort
NPE-gtGPE Debug 9 Log due to error in pkts Rx UDP DPort
FwdKey Tx UDP DPort Tx UDP Sport Tx IP
DAddr
12Construct ipv4 MN Internal header
Yes
Drop bit set?
No
Yes
Hit bit set?
No
No
No
No
No
TTL expired?
Local DL?
Set NR bit in type
Redirect?
IP option?
Yes
Yes
Yes
Yes
No
Set TTL bit in type Set Rx UDP DPort Length 4
Set LD bit in type Set Rx UDP DPort Length 4
Set OPT bit in type Set Rx UDP DPort Set
TypeDependData Length 12
Set RD bit in type Set Rx UDP DPort Set
TypeDependData Length 12
TTL expired?
Yes
Set TTL bit in type Set Rx UDP DPort Length 4
86 cycles for the worst case 44 cycles for the
common case
return
13Testing MR Header Format
Hdr Format
Dummy Lookup
Stub Parse
H Hit D Drop LD Local Delivery Exception0
TTL Exception1 IP Option
Buf Handle(32b)
Buf Handle(32b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
Rx UDP DPort(16b)
Slice ID (VLAN) (16b)
Lookup Key143-112 Slice ID/Rx UDP DPort (32b)
Cntr Index (16b)
R S V d (1b)
H (1b)
D (1b)
Exception Bits (12b)
L D (1b)
Lookup Key111-80 DA (32b)
Lookup Key 79-48 SA (32b)
Tx IP DAddr (32b)
Lookup Key 47-16 Ports (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Exception Bits (12b)
L Flags (4b)
Port (4b)
QID(20b)
DA(8b)
Slice data pointer (32b)
Slice Data Ptr (32b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Code opt (4b)
Rsv2(12b)
Rx UDP SPort (16b)
Rx IP SAddr (32b)
Rx IP SAddr (32b)
- Dummy Lookup block enumerates all combinations
of the five bits and generates corresponding NN
ring data to Hdr Format.
14Possible Optimizations
Functions
Memory access
Optimizations
Processing cycles Common case/worst case
NN 9W reads
42/42 -10
Dequeue ring_in data
- Reduce redundant assignments for worst case
44/86 -15
Construct MN int hdr
- Static fields only initialized by the first
packet in each thread
64/73 -20
Construct IP, UDP, Ethernet, VLAN hdr
12/12
Set IP checksum
11/11
Set UDP checksum
DRAM 46-58B writes
37/40
Write hdr to DRAM
- Aligned sram writes, use assembler
SRAM 8B writes
15/15 -6
Inc Pre_queue Cnt
SRAM 10B writes
66/66 -30
Update buffer descriptor
NN 3W writes
27/27
Enqueue ring_out data
318/372 -81
15Implementation Status
- Add dynamic statistics
- Packet counter for fast path packets
- Packet counter for exception path packets
- Packet counter per exception case
- Decide which field in buffer descriptor to store
counter index - Run 8-thread simulation