Title: Block Design Review: ONL Header Format
1Block Design ReviewONL Header Format
Michael Wilson mlw2_at_arl.wustl.edu http//www.arl.w
ustl.edu/projects/techX
2Revision History
- 4/10/07 (MLW)
- Released
- 4/11/07 (MLW)
- Updates from feedback at 10 April meeting
3Header Format Inputs/Outputs
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
32KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
32KW Each
NN
NN
NN
NN
SRAM Ring
Plugin1
Plugin2
Plugin3
Plugin4
Plugin5
SRAM
xScale
Scratch Ring
NN Ring
NN
- slide taken from ONL_NProuter.ppt
4Header Format Inputs/Outputs
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
32KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
32KW Each
NN
NN
NN
NN
SRAM Ring
Plugin1
Plugin2
Plugin3
Plugin4
Plugin5
SRAM
xScale
Scratch Ring
NN Ring
NN
- slide taken from ONL_NProuter.ppt
5Contents
- Overview
- Latency Analysis
- Code Locations (Planned)
- Test Procedures (Planned)
- Implementation Status
6Overview
- Initialization
- Initialize local table of Source MAC addresses
for output ports - Processing (Main Loop)
- Receive handle from QM
- Copy to output registers
- Buffer Handle (from NN ring if not chained, from
buffer descriptor Buffer_Next otherwise) - Destination MAC, EtherType (from buffer
descriptor) - Source MAC address (from local memory, indexed by
port) - If chained, free the header buffer
- Update Stats (index from buffer descriptor)
- Forward packet to TX
- Update TX Counters
- Header Format will be written in C, not microcode
7Latency Analysis
Critical Path Latency360 Cycles
dl_source
Negligible cycles
Is Valid?
No
Yes
Read Buffer Descriptor
150 cycles
Read Source MAC
Is Chained
No
Yes
Write Buffer_NextNULL
Write Stats
150 cycles
150
60 cycles
60
60
dl_sink
60 cycles
8Performance
- What is our performance target?
- To hit 5 Gb rate
- Minimum Ethernet frame 76B
- 64B frame 12B InterFrame Spacing
- 5 Gb/sec 1B/8b packet/76B 8.22 Mpkt/sec
- IXP ME processing
- 1.4Ghz clock rate
- 1.4Gcycle/sec 1 sec/ 8.22 Mp 170.3 cycles
per packet - compute budget (MEs170)
- 1 ME 170 cycles
- 2 ME 340 cycles
- 3 ME 510 cycles
- 4 ME 680 cycles
- latency budget (threads170)
- 1 ME 8 threads 1360 cycles
- 2 ME 16 threads 2720 cycles
- 3 ME 24 threads 4080 cycles
- 4 ME 32 threads 5440 cycles
- slide taken from ONL_NProuter.ppt
9dl_sink Semantics
- One of my optimizations requires a change to
dl_sink semantics. In pseudo-code - signal_t sig1, sig2, sig3
- send_stats(stats, sig1) // 60 cycles
- free_block(hdr_buf, sig2) // 60 cycles
- dl_sink(data_buf, sig3) // 60 cycles
- wait(sig1, sig2, sig3) // 60606060
- As of 10 April Meeting, this optimization is no
longer necessary for Header Format. - Header Format has enough slack to skip exotic
optimizations - Header Format can start all of the scratch ring
writes and then dl_sink, do the wait after
dl_sink. PLC does not have this option, but this
doesnt impact Header Format.
10File locations (in /ONL_Router/)
- Code
- src/hdrFormat/ONL/hdrfmt.c
- Includes
- src/dispatch_loop/ONL/dl_source.h,c
- dl_source() and dl_sink() functions
11Test and Validation
- All validation tests will done with 8 threads
- Header Format has no loops and only two
conditionals. All code paths will be tested
once. - Invalid handle (Valid bit not set)
- Unchained packet
- Chained packet
- Need to decide correct behavior in the face of
erroneous input (port out of range) - Test back-pressure from TX through HdrFmt to QM
- HdrFormat will be tested at high speeds to ensure
I/O contention is not an issue
12Implementation Status
- Still in pseudo-code
- Working on a C-equivalent of the HdrFmt Stub as a
framework for my implementation - Bugs
- Doesnt compile, as there is no source yet.
- Untested
- Everything
- Optimizations not taken (but available if needed
later) - The Buffer_Next field of the buffer descriptor
can be read and written back-to-back because the
memory controller guarantees in-order execution.
Thus, we dont need to read, check to see if we
need to write, and then write. We can issue both
at once and worry afterward. This wont work
with multi-buffer payload, but neither will the
rest of Header Format.
13Extra Slides
14ONL Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
LW1
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
LW2
MAC DAddr_47_32 (16b)
Stats Index (16b)
LW3
MAC DAddr_31_00 (32b)
LW4
Reserved (16b)
EtherType (16b)
LW5
Reserved (32b)
LW6
Packet_Next (32b)
LW7
1 Written by Rx, Added to by Copy Decremented
by Freelist Mgr
Written by Freelist Mgr
Written by Rx
Written by Copy
Written by Rx and Plugins
Written by QM