Title: An NPBased Router for the Open Network Lab Design
1An NP-Based Router for the Open Network
LabDesign
John DeHart
2ARP Notes
- Add 4th database for IP to MAC translations
- ARP DB will be populated either statically at
configuration time or dynamically by ARP daemon
on XScale - This ARP DB will be queried at same time other 3
are and if a result is returned it can be used
for attached hosts. - NH Router filter and route results will be
populated with either NH IP or NH MAC addresses. - If NH IP then PLC will send to XScale with NH IP
and address of filter result that needs to be
overwritten with new NH MAC Address - If NH MAC then PLC will use that.
- If neither, then use result from ARP DB lookup
- If that had no result then send to XScale with
DAddr from pkt for it to perform ARP and populate
ARP DB.
3ONL NP Router
Large SRAM Ring
xScale
xScale (3 Rings?)
Assoc. Data ZBT-SRAM
Small SRAM Ring
Scratch Ring
SRAM
LD
TCAM
SRAM
Except
Errors
NN Ring
NN
64KW
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
HdrFmt (1 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
Mostly Unchanged
xScale
64KW
64KW
64KW
64KW
64KW
64KW
Plugin to XScale Ctrl,Update RLI Msgs
512W
512W
512W
512W
512W
New
NN
NN
NN
NN
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
Needs A Lot Of Mod.
Rx Mux HF Copy Plugins Tx
Needs Some Mod.
Tx, QM Parse Plugin XScale
FreeList Mgr (1 ME)
Stats (1 ME)
SRAM
4Performance
- What is our performance target?
- To hit 5 Gb rate
- Minimum Ethernet frame 76B
- 64B frame 12B InterFrame Spacing
- 5 Gb/sec 1B/8b packet/76B 8.22 Mpkt/sec
- IXP ME processing
- 1.4Ghz clock rate
- 1.4Gcycle/sec 1 sec/ 8.22 Mp 170.3 cycles
per packet - compute budget (MEs170)
- 1 ME 170 cycles
- 2 ME 340 cycles
- 3 ME 510 cycles
- 4 ME 680 cycles
- latency budget (threads170)
- 1 ME 8 threads 1360 cycles
- 2 ME 16 threads 2720 cycles
- 3 ME 24 threads 4080 cycles
- 4 ME 32 threads 5440 cycles
5Performance Results July 2007
- Methodology
- Set workbench to stop after each 100 pkts
received - Record Tx pkt count and cycle count at Rx pkts
counts of 0, 100, 200, 300, 400, 500 - Performance metric is Tx rated between Rx pkt
counts of 100 and 500. - Input Rate 7.44 Mpkts/s
- Minimum sized UDP packets
- Ethernet Frame wire occupancy 84 Bytes
- 64 Byte Ethernet Frame
- 12 Byte Ethernet Interframe Spacing
- 8 Byte Ethernet Preamble
- Ethernet wire bit rate 4999.20 Mb/s
- Output Rates for various configurations
- All Real Blocks
- 40 Queues (QM worst case) 3.35 Mpkts/s
- Each packet causes eviction and reload of a queue
- 5 Queues (QM best case) 3.81 Mpkts/s
- All queues remain resident, no evicting and
reloading - Some Stub Blocks
- PLC 3.95 Mpkts/s
6Performance Results July 2007
- Methodology
- Set workbench to stop after each 100 pkts
received - Record Tx pkt count and cycle count at Rx pkts
counts of 0, 100, 200, 300, 400, 500 - Performance metric is Tx rate between Rx pkt
counts of 100 and 500. - Some measurement artifacts possible due to Tx
being just about to transmit - Input Rate 7.44 Mpkts/s (Minimum sized UDP
packets) - Ethernet Frame wire occupancy 84 Bytes
- 64 Byte Ethernet Frame
- 12 Byte Ethernet Interframe Spacing
- 8 Byte Ethernet Preamble
- Ethernet wire bit rate 4.99920 Gb/s
- Bottleneck blocks QM (45), Mux (90), PLC (97)
7Performance Results July 2007
8Inter Block Rings
- Scratch Rings (sizes in 32b Words 128, 256, 512,
1024) - XScale ? MUX
- 3 Word per pkt
- 256 Word Ring
- 256/3 pkts
- PLC ? XScale
- 3 Word per pkt
- 256 Word Ring
- 256/3 pkts
- MUX ? PLC
- 3 Word per pkt
- 256 Word Ring
- 256/3 pkts
- ? QM
- 3 Words per pkt
- 1024 Word Ring
- 1024/3 Pkts
- HF ? TX
- 5 Word per pkt
9Inter Block Rings
- SRAM Rings (sizes in 32b KW 0.5, 1, 2, 4, 8, 16,
32, 64) - RX ? MUX
- 2 Words per pkt
- 64KW Ring
- 32K Pkts
- PLC ? Plugins (5 of them)
- 3 Words per pkt
- 64KW Rings
- 21K Pkts
- Plugins ? MUX (1 serving all plugins)
- 3 Words per pkt
- 64KW Ring
- 21K Pkts
- NN Rings (128 32b words)
- QM? HF
- 1 Word per pkt
- 128 Pkts
- Plugin N ? Plugin N1 (for N1 to N4)
- Words per pkt is plugin dependent
10ONL SRAM Buffer Descriptor
- Problem
- With the use of Filters, Plugins and recycling
back around for reclassification, we can end up
with an arbitrary number of copies of one packet
in the system at a time. - Each copy of a packet could end up going to an
output port and need a different MAC DAddr from
all the other packets - Having one Buffer Descriptor per packet
regardless of the number of copies will not be
sufficient. - Solution
- When there are multiple copies of the packet in
the system, each copy will need a separate Header
buffer descriptor which will contain the MAC
DAddr for that copy. - When the Copy block gets a packet that it only
needs to send one copy to QM, it will read the
current reference count and if this copy is the
ONLY copy in the system, it will not prepend the
Header buffer descriptor. - SRAM buffer descriptors are the scarce resource
and we want to optimize their use. - Therefore We do NOT want to always prepend a
header buffer descriptor - Otherwise, Copy will prepend a Header buffer
descriptor to each copy going to the QM. - Copy does NOT need to prepend a Header buffer
descriptor to copies going to plugins - Copy does NOT need to prepend a Header buffer
descriptor to a copy going to the XScale - The Header buffer descriptors will come from the
same pool (freelist 0) as the PacketPayload
buffer descriptors. - There is no advantage to associating these Header
buffer descriptors with small DRAM buffers. - DRAM is not the scarce resource
- SRAM buffer descriptors are the scarce resource.
- We want to avoid getting a descriptor coming in
to PLC for reclassification with and the Header
buffer descriptor chained in front of the payload
buffer descriptor. - Plugins and XScale should append a Header Buffer
descriptor when they are sending something that
has copies that is going directly to the QM or to
Mux and PLC for PassThrough.
11ONL SRAM Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
LW1
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
LW2
MAC DAddr_47_32 (16b)
Stats Index (16b)
LW3
MAC DAddr_31_00 (32b)
LW4
Reserved (16b)
EtherType (16b)
LW5
Reserved (32b)
LW6
Packet_Next (32b)
LW7
1 Written by Rx, Added to by Copy Decremented
by Freelist Mgr
Written by Freelist Mgr
Written by Rx
Written by Copy
Written by Rx and Plugins
Written by QM
12ONL DRAM Buffer and SRAM Buffer Descriptor
- SRAM Buffer Descriptor Fields
- Buffer_Next ptr to next buffer in a multi-buffer
packet - Buffer_Size number of bytes in the associated
DRAM buffer - Packet_Size total number of bytes in the pkt
- QM (dequeue) uses this to decrement qlength
- Offset byte offset into DRAM buffer where packet
(ethernet frame) starts. From RX - 0x180 Constant offset to start of Ethernet Hdr
- 0x18E Constant offset to start of IP/ARP/etc hdr
- However, Plugins can do ANYTHING so we cannot
depend on the constant offsets. - The following slides will, however, assume that
nothing funny has happened. - Freelist Id of freelist that this buffer came
from and should be returned to when it is freed - Ref_Cnt Number of copies of this buffer
currently in the system - MAC_DAddr Ethernet MAC Destination Address that
should be used for this packet - Stats Index Index into statistics counters that
should be used for this packet - EtherType Ethernet Type filed that should be
used for this packet - Packet_Next ptr to next packet in the queue when
this packet is queued by the QM
Buffer_Next (32b)
Buffer_Size (16b)
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
Stats Index(16b)
MAC DAddr_47_32 (16b)
MAC DAddr_31_00 (32b)
EtherType (16b)
Reserved (16b)
Reserved (32b)
Packet_Next (32b)
0x000
Empty
0x180
Ethernet Hdr
0x18E
IP Packet
0x800
13ONL DRAM Buffer and SRAM Buffer Descriptor
- Normal Unicast case
- One copy of packet being sent to one output port
- SRAM Buffer Descriptor Fields
- Buffer_Next NULL
- Buffer_Size IP_Pkt_Length
- Packet_Size IP_Pkt_Length
- Offset 0x18E
- Freelist 0
- Ref_Cnt 1
- MAC_DAddr ltresult of lookupgt
- Stats Index ltfrom lookup resultgt
- EtherType 0x0800 (IP)
- Packet_Next ltas used by QMgt
0x000
Empty
0x180
Ethernet Hdr
0x18E
IP Packet
0x800
14ONL DRAM Buffer and SRAM Buffer Descriptor
- Multi-copy case
- gt1 copy of packet in system
- This copy going from Copy to QM to go out on an
output port
0x000
0x000
Empty
Empty
0x180
0x180
Empty
Ethernet Hdr
0x18E
0x18E
Empty
IP Packet
0x800
0x800
15ONL DRAM Buffer and SRAM Buffer Descriptor
- Multi-copy case (continued)
- gt1 copy of packet in system
- This copy going from Copy to QM to go out on an
output port - Header Buf Descriptor
- SRAM Buffer Descriptor Fields
- Buffer_Next ptr to payload buf desc
- Buffer_Size 0 (Dont Care)
- Packet_Size IP_Pkt_Length
- Offset 0 (Dont Care)
- Freelist 0
- Ref_Cnt 1
- MAC_DAddr ltresult of lookupgt
- Stats Index ltfrom lookup resultgt
- Different copies of the same packet may actually
have different Stats Indices - EtherType 0x0800 (IP)
- Packet_Next ltas used by QMgt
0x000
0x000
Empty
Empty
0x180
0x180
Empty
Ethernet Hdr
0x18E
0x18E
Empty
IP Packet
0x800
0x800
16ONL DRAM Buffer and SRAM Buffer Descriptor
- Multi-copy case (continued)
- gt1 copy of packet in system
- This copy going from Copy to QM to go out on an
output port - Payload Buf Descriptor
- SRAM Buffer Descriptor Fields
- Buffer_Next NULL
- Buffer_Size IP_Pkt_Length
- Packet_Size IP_Pkt_Length
- Offset 0x18E
- Freelist 0
- Ref_Cnt ltnumber of copies currently in
systemgt - MAC_DAddr ltdont caregt
- Stats Index ltshould not be usedgt
- EtherType ltdont caregt
- Packet_Next ltshould not be usedgt
Buffer_Next (32b)
Buffer_Size (16b)
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
Stats Index (16b)
MAC DAddr_47_32 (16b)
MAC DAddr_31_00 (32b)
EtherType (16b)
Reserved (16b)
Reserved (32b)
Packet_Next (32b)
0x000
0x000
Empty
Empty
0x180
0x180
Empty
Ethernet Hdr
0x18E
0x18E
Empty
IP Packet
0x800
0x800
17ONL SRAM Buffer Descriptor
- Rx writes
- Buffer_size ? ethernet frame length
- Packet_size ? ethernet frame length
- Offset ? 0x180
- Freelist ? 0
- Mux Block writes
- Buffer_size ? (frame length from Rx) -14
- Packet_size ? (frame length from Rx) -14
- Offset ? 0x18E
- Freelist ? 0
- Ref_cnt ? 1
- Copy Block initializes a newly allocated Hdr
desc - Buffer_Next to point to original payload buffer
- Buffer_size ? 0 (dont care, noone should be
using this field) - Packet_size ? IP Pkt Length (should be length
from input ring) - Offset ? 0 (dont care, noone should be using
this field) - Freelist ? 0
- Ref_cnt ? 1
- Stats_Index ? from lookup result
18SRAM Usage
- What will be using SRAM?
- Buffer descriptors
- Current MR supports 229,376 buffers
- 32 Bytes per SRAM buffer descriptor
- 7 MBytes
- Queue Descriptors
- Current MR supports 65536 queues
- 16 Bytes per Queue Descriptor
- 1 MByte
- Queue Parameters
- 16 Bytes per Queue Params (actually only 12 used
in SRAM) - 1 MByte
- QM Scheduling structure
- Current MR supports 13109 batch buffers per QM ME
- 44 Bytes per batch buffer
- 576796 Bytes
- QM Port Rates
- 4 Bytes per port
- Plugin scratch memory
19SRAM Bank Allocation
- SRAM Banks
- Bank0
- 4 MB total, 2MB per NPU
- Same interface/bus as TCAM
- Bank1-3
- 8 MB each
- Criteria for how SRAM banks should be allocated?
- Size
- SRAM Bandwidth
- How many SRAM accesses per packet are needed for
the various SRAM uses? - QM needs buffer desc and queue desc in same bank
20Proposed SRAM Bank Allocation
- SRAM Bank 0
- TCAM
- Lookup Results
- SRAM Bank 1 (2.5MB/8MB)
- QM Queue Params (1MB)
- QM Scheduling Struct (0.5 MB)
- QM Port Rates (20B)
- Large Inter-Block Rings (1MB)
- SRAM Rings are of sizes (in Words) 0.5K, 1K, 2K,
4K, 8K, 16K, 32K, 64K - Rx ? Mux (2 Words per pkt) 64KW (32K pkts)
128KB - ? Plugin (3 Words per pkt) 64KW each (21K Pkts
each) 640KB - ? Plugin (3 Words per pkt) 64KW (21K Pkts)
256KB - SRAM Bank 2 (8MB/8MB)
- Buffer Descriptors (7MB)
- Queue Descriptors (1MB)
- SRAM Bank 3 (6MB/8MB)
- Stats Counters (1MB)
- Global Registers (256 4B)
- Plugin scratch memory (5MB, 1MB per plugin)
21Queues and QIDs
- Assigned Queues vs. Datagram Queues
- A flow or set of flows can be assigned to a
specific Queue by assigning a specific QID to
its/their filter(s) and/or route(s) - A flow can be assigned to use a Datagram queue by
assigning QID0 to its filter(s) and/or route(s) - There are 64 datagram queues
- If it sees a lookup result with a QID0, the PLC
block will calculate the datagram QID for the
result based on the following hash function - DG QID SA98 SA65 DA65
- Concatenate IP src addr bits 9 and 8, IP src addr
bits 6 and 5, IP dst addr bits 6 and 5 - Who/What assigns QIDs to flows?
- The ONL User can assign QIDs to flows or sets of
flows using the RLI - The XScale daemon can assign QIDs to flows on
behalf of the User/RLI if so requested - User indicates that they want an assigned QID but
they want the system to pick it for them and
report it back to them. - The ONL User indicates that they want to use a
datagram queue and the data path (Copy block)
calculates the QID using a defined hash fct - Using the same QID for all copies of a multicast
does not work - The QM does not partition QIDs across ports
- We cannot assume that the User will partition the
QIDs so we will have to enforce a partitioning.
22Queues and QIDs (continued)
- Proposed partitioning of QIDs
- QID1513 Port Number 0-4 (numbered 1-5)
- Copy block will add these bits
- QID12 0 per port queues
- 8128 Reserved queues per port
- 64 datagram queues per port
- yyy1 0000 00xx xxxx Datagram queues for port
ltyyygt - QIDs 64-8191 per port Reserved Queues
- QIDs 0-63 per port Datagram Queues
- With this partitioning, only 13 bits of the QID
should be made available to the ONL User.
23Lookups
- How will lookups be structured?
- Three Databases
- Route Lookup Containing Unicast and Multicast
Entries - Unicast
- Port Can be wildcarded
- Longest Prefix Match on DAddr
- Routes should be shorted in the DB with longest
prefixes first. - Multicast
- Port Can be wildcarded?
- Exact Match on DAddr
- Longest Prefix Match on SAddr
- Routes should be sorted in the DB with longest
prefixes first. - Primary Filter
- Filters should be sorted in the DB with higher
priority filters first - Auxiliary Filter
- Filters should be sorted in the DB with higher
priority filters first - Priority between Primary Filter and Route Lookup
- A priority will be stored with each Primary
Filter - A priority will be assigned to RLs (all routes
have same priority)
24Route Lookup
- Route Lookup Key (72b)
- Port (3b) Can be a wildcard (for Unicast,
probably not for Multicast) - Value of 111b in Port field can be used to denote
a packet that originated from the XScale - Value of 110b in Port field can be used to denots
a packet that originated from a Plugin - Ports numbered 0-4
- PluginTag (5b) Can be a wildcard (for Unicast,
probably not for Multicast) - Plugins numberd 0-4
- DAddr (32b)
- Prefixed for Unicast
- Exact Match for Multicast
- SAddr (32b)
- Unicast entries always have this and its mask set
to 0 - Prefixed for Multicast
- Route Lookup Result (96b)
- Unicast/Multicast Fields (determined by
IP_MCast_Valid bit (1MCast, 0Unicast) (13b) - IP_MCast Valid (1b)
- MulticastFields (12b)
- Plugin/Port Selection Bit (1b)
- 0 Send pkt to both Port and Plugin. Does it get
the MCast CopyVector?
25Lookup Key and Results Formats
IP DAddr (32b)
IP SAddr (32b)
DPort (16b)
SPort (16b)
Proto (8b)
TCP Flags (12b)
Exceptions (16b)
P (3b)
P Tag (5b)
140 Bit Key
RL
PF and AF
32 Bit Result in TCAM Assoc. Data SRAM
96 Bit Result in QDR SRAM Bank0
QID (16b)
Stats Index (16b)
UCast MCast (12b)
V (4b)
PF
QID (16b)
Stats Index (16b)
Uni Cast (8b)
V (4b)
S B (2b)
R e s (2b)
AF
QID (16b)
Stats Index (16b)
UCast MCast (12b)
V (4b)
RL
TCAM Ctrl Bits DDone HHIT MHMulti-Hit
26Exception Bits in Lookup Key
IP DAddr (32b)
IP SAddr (32b)
DPort (16b)
SPort (16b)
Proto (8b)
TCP Flags (12b)
Exceptions (16b)
P (3b)
P Tag (5b)
140 Bit Key
RL
PF and AF
Non-IP (1b)
ARP (1b)
IP Opt (1b)
TTL (1b)
Reserved (12b)
- Exception Bits
- TTL TTL has expired. It was 0 or 1 on arriving
packet - IP Opt IP Packet contained Options
- ARP Ethertype field in ethernet header was ARP
- Non-IP Ethertype field in ethernet header was
NOT IP - NOTE An ARP packet will have ARP bit and Non-IP
bit set
27UCast/MCast Bits
- Format of the UCast/MCast fields in Ring data
going to XScale and Plugins
28Primary Filter
- Primary Filter Lookup Key (140b)
- Port (3b) Can be a wildcard (for Unicast,
probably not for Multicast) - Value of 111b in Port field to denote coming from
the XScale - Ports numbered 0-4
- PluginTag (5b) Can be a wildcard (for Unicast,
probably not for Multicast) - Plugins numberd 0-4
- DAddr (32b)
- SAddr (32b)
- Protocol (8b)
- DPort (16b)
- Sport (16b)
- TCP Flags (12b)
- Exception Bits (16b) Allow for directing of
packets based on defined exceptions - Primary Filter Result (104b)
- Unicast/Multicast Fields (determined by
IP_MCast_Valid bit (1MCast, 0Unicast) (13b) - IP_MCast Valid (1b)
- MulticastFields (12b)
- Plugin/Port Selection Bit (1b)
- 0 Send pkt to ports and plugins indicated by
MCast Copy Vector.
29Auxiliary Filter
- Auxiliary Filter Lookup Key (140b)
- Port (3b) Can be a wildcard (for Unicast,
probably not for Multicast) - Value of 111b in Port field to denote coming from
the XScale - Ports numbered 0-4
- PluginTag (5b) Can be a wildcard (for Unicast,
probably not for Multicast) - Plugins numberd 0-4
- DAddr (32b)
- SAddr (32b)
- Protocol (8b)
- DPort (16b)
- Sport (16b)
- TCP Flags (12b)
- Exception Bits (16b)
- Allow for directing of packets based on defined
exceptions - Can be wildcarded.
- Auxiliary Filter Lookup Result (93b)
- Unicast Fields (8b) (No Multicast fields)
- Drop Bit (1b) (Should never actually be set by
control software, but keep here for symmetry with
other Unicast Fields) - 0 handle normally
30TCAM Operations for Lookups
- Five TCAM Operations of interest
- Lookup (Direct)
- 1 DB, 1 Result
- Multi-Hit Lookup (MHL) (Direct)
- 1 DB, lt 8 Results
- Simultaneous Multi-Database Lookup (SMDL)
(Direct) - 2 DB, 1 Result Each
- DBs must be consecutive!
- Care must be given when assigning segments to DBs
that use this operation. There must be a clean
separation of even and odd DBs and segments. - Multi-Database Lookup (MDL) (Indirect)
- lt 8 DB, 1 Result Each
- Simultaneous Multi-Database Lookup (SMDL)
(Indirect) - 2 DB, 1 Result Each
- Functionally same as Direct version but key
presentation and DB selection are different. - DBs need not be consecutive.
- Care must be given when assigning segments to DBs
that use this operation. There must be a clean
separation of even and odd DBs and segments.
31Lookups Proposed Design
- Use SRAM Bank 0 (2 MB per NPU) for all Results
- B0 Byte Address Range 0x000000 0x3FFFFF
- 22 bits
- B0 Word Address Range 0x000000 0x3FFFFC
- 20 bits
- Two trailing 0s
- Use 32-bit Associated Data SRAM result for
Address of actual Result - Done 1b
- Hit 1b
- MHit 1b
- Priority 8b
- Present for Primary Filters, for RL and Aux
Filters should be 0 - SRAM B0 Word Address 21b
- 1 spare bit
- Use Multi-Database Lookup (MDL) Indirect for
searching all 3 DBs - Order of fields in Key is important.
- Each thread will need one TCAM context
- Route DB
- Lookup Size 68b (3 32b words transferred across
QDR intf)
32Block Interfaces
- The next set of slides show the block interfaces
- These slides are still very much a work in
progress
33ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
34ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
35ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
Flags Src Source (2b) 00
Rx 01 XScale 10 Plugin
11 Undefined PT(1b) PassThrough(1)/Clas
sify(0) Reserved (5b)
Buffer Handle(24b)
Rsv (4b)
Out Port (4b)
64KW
SRAM
64KW Each
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
Stats Index (16b)
In Port (3b)
Plugin Tag (5b)
Flags (8b)
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
36ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
QM will not do any Stats Operations so it
does not need the Stats Index. But the QM code is
nasty enough that it will not be easy to change
the format for the input. We will attack that
change when we do other optimizations for QM.
Buffer Handle(24b)
Rsv (8b)
64KW
SRAM
64KW Each
QID(16b)
Rsv (4b)
Out Port (4b)
Rsv (8b)
L3 (IP, ARP, ) Pkt Length (16b)
Reserved(16b)
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
37ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
38ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
39ONL NP Router
Flags(8b) Why pkt is being sent to
Plugin TTL(1b) TTL expired Options(1b)
IP Options present NoRoute(1b) No matching
route or filter NonIP(1b) Non IP Packet
received ARP_Needed(1b) NH_IP valid, but no
MAC NH_Invalid(1b) NH_IP AND NH_MAC both
invalid Reserved(2b) currently unused
xScale
xScale
Buffer Handle(24b)
Rsv (8b)
TCAM
Assoc. Data ZBT-SRAM
SRAM
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
Stats Index (16b)
In Port (3b)
Plugin Tag (5b)
Flags (8b)
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
NH MAC DA4716 (32b)
NH MAC DA150 (16b)
EtherType (16b)
MC 1 Multiple copies of this pkt
exist in the system 0 This is
the only copy of pkt
Buffer Handle(24b)
M C (1b)
Out Port (4b)
Rsv (3b)
64KW
Unicast/MCast Bits (16b)
Reserved (16b)
SRAM
64KW Each
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
Stats Index (16b)
In Port (3b)
Rsv (8b)
Plugin Tag (5b)
NN
NN
NN
NN
SRAM Ring
NH MAC DA4716 (32b)
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NH MAC DA150 (16b)
EtherType (16b)
NN Ring
NN
Unicast/MCast bits (16b)
Reserved (16b)
40ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
Flags PT(1b) PassThrough(1)/Classify(0)
Reserved (7b)
Buffer Handle(24b)
Rsv (4b)
Out Port (4b)
64KW
SRAM
64KW Each
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
Stats Index (16b)
In Port (3b)
Plugin Tag (5b)
Flags (8b)
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
41ONL NP Router
Flags(8b) Why pkt is being sent to
XScale TTL(1b) TTL expired Options(1b)
IP Options present NoRoute(1b) No matching
route or filter NonIP(1b) Non IP Packet
received ARP_Needed(1b) NH_IP valid, but no
MAC NH_Invalid(1b) NH_IP AND NH_MAC both
invalid Reserved(2b) currently unused
xScale
xScale
Buffer Handle(24b)
Rsv (8b)
TCAM
Assoc. Data ZBT-SRAM
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
SRAM
Stats Index (16b)
In Port (3b)
Plugin Tag (5b)
Flags (8b)
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
NH MAC DA4716 (32b)
NH MAC DA150 (16b)
EtherType (16b)
Unicast/MCast Bits (16b)
Reserved (16b)
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
42ONL NP Router
xScale
xScale
Flags PassThrough/Classify (1b)
Reserved (7b)
Buffer Handle(24b)
Rsv (4b)
Out Port (4b)
TCAM
Assoc. Data ZBT-SRAM
L3 (IP, ARP, ) Pkt Length (16b)
QID(16b)
SRAM
Stats Index (16b)
In Port (3b)
Plugin Tag (5b)
Flags (8b)
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
43ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
Stats Index (16b)
Opcode (4b)
Data (12b)
SRAM
xScale
Scratch Ring
NN Ring
NN
44ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
64KW
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
64KW
SRAM
64KW Each
NN
NN
NN
NN
SRAM Ring
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Scratch Ring
NN Ring
NN
45Extra Slides
- Everything after this is either OLD or is just
extra support data for me to use.
46ONL NP Router
TCAM
Assoc. Data ZBT-SRAM
- Input Data
- Buffer Handle
- In Plugin
- In Port
- Out Port
- Flags
- Source (3b) Rx/XScale/Plugin
- PassThrough/Classify (1b)
- Reserved (4b)
- QID
- Frame Length
- Stats Index
- Exception Bits (16b)
- TTL Expired
- IP Options present
- No Route
- Auxiliary Result
- Valid (1b)
- CopyVector (10b)
xScale
Lookup
Copy
Parse
QM
Plugins
SRAM
- Control Flags
- PassThrough/Reclassify
- Primary Result
- Valid (1b)
- CopyVector (10b)
- NH IP/MAC (48b)
- QID (16b)
- LD (1b) Send to XScale
- Drop (1b) Drop pkt
- Valid Bits (3b)
- NH IP Valid (1b)
- NH MAC Valid (1b)
- IP_MCast Valid (1b)
- Key (136b)
- Port/Plugin (4b)
- 0-4 Port
- 5-9 Plugin
- 15 XScale
- DAddr (32b)
- SAddr (32b)
- Protocol (8b)
- DPort (16b)
- Sport (16b)
- TCP Flags (12b)
- Exception Bits (16b)
47Lookup Results
- Results of a lookup could be
- 1 PF/RL Result
- IP Unicast 1 packet sent to a Port
- Plugin Unicast 1 packet sent to a Plugin
- Unicast with Plugin Copies
- 0 or 1 packet sent to a port
- 1-5 copies sent to plugin(s)
- IP Multicast
- 0-10 copies sent
- 1 to each of 5 ports and one to each of 5 plugins
- 1 Aux Filter Result
- IP Unicast 1 packet sent to a Port
- Plugin Unicast 1 packet sent to a Plugin
- Unicast with Plugin Copies
- 0 or 1 packet sent to a port
- 1-5 copies sent to plugin(s)
- IP Multicast
- 0 or 1 copy sent to a Port
- 1-5 copies sent to plugins
48PLC
- Main()
-
- If (PassThrough)
- Copy()
-
- Else
- Parse()
- if (!Drop)
- Lookup()
- if (!Drop)
- Copy()
-
-
-
49PLC
- Lookup()
-
- write KEY to TCAM
- use timestamp delay to wait appropriate time
- while !DoneBit // DONE Bit BUG Fix requires
reading just first word - read 1 word from Results Mailbox
- check DoneBit
- done
- read words 2 and 3 from Results Mailbox
- If (PrimaryFilter and RouteLookup results HIT)
- compare priorities
- PrimaryResult.Valid ? TRUE
- store higher priority result as Primary Result
(read result from SRAM Bank0) -
- else if (PrimaryFilter results HIT)
- PrimaryResult.Valid ? TRUE
- PrimaryResults. ? PrimaryFilter. (read result
from SRAM Bank0) -
- else if (RouterLookup results HIT)
50PLC
- Copy()
-
- currentRefCnt ? Read(Buffer Descriptor Ref Cnt)
- copyCount ? 0
- outputData.bufferHandle ? inputData.bufferHandle
- outputData.QID ? inputData.QID
- outputData.frameLength ? inputData.frameLength
- outputData.statsIndex ? inputData.statsIndex
- if (PassThrough) // It came from either XScale
or Plugin, process inputData - copyCount ? 1
- if (inputData.outPort XScale)
- // Do we need to include any additional flags
when sending to XScale? - outputData.outPort ? inputData.outPort
- outputData.Flags ? inputData.Flags
- outputData.inPort ? inputData.inPort
- outputData.Plugin ? inputData.Plugin
- // Packets to XScale do not (we think) need
addition Header buf desc. - sendToXScale()
-
51PLC
- else // Process Lookup Results
- // PrimaryResult is either Primary Filter or
Route Lookup, depending on Priority - if (PrimaryResult.Valid TRUE)
- if (PrimaryResult.IP_MCastValid TRUE)
- IP_MCast_Daddr read DRAM
- MacDAddr calculateMCast(IP_MCast_Daddr)
-
- else // Unicast
- if (countPorts(PrimaryResult.copyVector) gt 1)
- ILLEGAL
-
- if (PrimaryResult.NH_Mac_Valid TRUE)
- MacDAddr PrimaryResult.NH_Address
-
-
- copyCount copyCount countOnes(PrimaryResult
.copyVector) -
- if (AuxiliaryResult.Valid TRUE)
- if (countPorts(AuxiliaryResult.copyVector) gt
1)
52ONL NP Router
SRAM Ring
xScale
xScale
Scratch Ring
TCAM
Assoc. Data ZBT-SRAM
SRAM
NN Ring
NN
64KW
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
HdrFmt (1 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
Mostly Unchanged
64KW
SRAM
64KW Each
New
NN
NN
NN
NN
Plugin0
Plugin1
Plugin2
Plugin3
Plugin4
SRAM
xScale
Needs A Lot Of Mod.
Needs Some Mod.
Tx, QM Parse Plugin XScale
FreeList Mgr (1 ME)
Stats (1 ME)
QM Copy Plugins
SRAM
53ONL NP Router
TCAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Mux (1 ME)
Tx (2 ME)
QueueManager (1 ME)
54ONL NP Router
TCAM
SRAM
HdrFmt (1 ME)
Rx (2 ME)
Parse, Lookup, Copy (3 MEs)
Mux (1 ME)
Tx (2 ME)
QueueManager (1 ME)
55ONL NP Router
TCAM
- Copy
- Port Identifies Source MAC Addr
- Write it to buffer descriptor or let HF determine
it via port? - Unicast
- Valid MAC
- Write MAC Addr to Buffer descriptor and queue pkt
- No Valid MAC
- Prepare pkt to be sent to XScale for ARP
processing - Multicast
- Calculate Ethernet multicast Dst MAC Addr
- Fct(IP Multicast Dst Addr)
- Write Dst MAC Addr to buf desc.
- Same for all copies!
- For each bit set in copy bit vector
- Queue a packet to port represented by bit in bit
vector. - Reference Count in buffer desc.
Parse, Lookup, PHFCopy (3 MEs)
- Parse
- Do IP Router checks
- Extract lookup key
- Lookup
- Perform lookups potentially three lookups
- Route Lookup
- Primary Filter lookup
- Auxiliary Filter lookup
56ONL NP Router
xScale
xScale
add configurable per port delay (up to 150 ms
total delay)
add largeSRAM ring
TCAM
Assoc. Data ZBT-SRAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (4 MEs)
Mux (1 ME)
Tx (1 ME)
QueueManager (1 ME)
largeSRAM ring
Stats (1 ME)
- Each output has common set of QiDs
- Multicast copies use same QiD for all outputs
- QiD ignored for plugin copies
Plugin
Plugin
Plugin
Plugin
Plugin
SRAM
xScale
Plugin write access to QM Scratch Ring
largeSRAM ring
57ONL NP Router
xScale
xScale
TCAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (4 MEs)
Mux (1 ME)
Tx (1 ME)
QueueManager (1 ME)
- Each output has common set of QiDs
- Multicast copies use same QiD for all outputs
- QiD ignored for plugin copies
Stats (1 ME)
NN
NN
NN
NN
Plugin0
SRAM
xScale
58Lookup Results
- Results of a lookup could be
- 1 PF/RL Result
- IP Unicast 1 packet sent to a Port
- Plugin Unicast 1 packet sent to a Plugin
- Unicast with Plugin Copies
- 0 or 1 packet sent to a port
- 1-5 copies sent to plugin(s)
- IP Multicast
- 0-10 copies sent
- 1 to each of 5 ports and one to each of 5 plugins
- 1 Aux Filter Result
- IP Unicast 1 packet sent to a Port
- Plugin Unicast 1 packet sent to a Plugin
- Unicast with Plugin Copies
- 0 or 1 packet sent to a port
- 1-5 copies sent to plugin(s)
- IP Multicast
- 0 or 1 copy sent to a Port
- 1-5 copies sent to plugins
59PLC
- Input Data
- Buffer Handle
- In Plugin
- In Port
- Out Port
- Flags
- Source (3b) Rx/XScale/Plugin
- PassThrough/Classify (1b)
- Reserved (4b)
- QID
- Frame Length
- Stats Index
- Control Flags
- PassThrough/Reclassify
- Key (136b)
- Port/Plugin (4b)
- 0-4 Port
- 5-9 Plugin
- 15 XScale
- Primary Result
- Valid (1b)
- CopyVector (10b)
- NH IP/MAC (48b)
- QID (16b)
- LD (1b) Send to XScale
- Drop (1b) Drop pkt
- Valid Bits (3b)
- NH IP Valid (1b)
- NH MAC Valid (1b)
- IP_MCast Valid (1b)
- Auxiliary Result
- Valid (1b)
- CopyVector (10b)
- NH IP/MAC (48b)
- QID (16b)
- LD (1b) Send to XScale
- Drop (1b) Drop pkt
- NH IP Valid (1b)
60SRAM Buffer Descriptor
- Problem
- With the use of Filters, Plugins and recycling
back around for reclassification, we can end up
with an arbitrary number of copies of one packet
in the system at a time. - Each copy of a packet could end up going to an
output port and need a different MAC DAddr from
all the other packets - Having one Buffer Descriptor per packet
regardless of the number of copies will not be
sufficient. - Solution
- When there are multiple copies of the packet in
the system, each copy will need a separate Header
buffer descriptor which will contain the MAC
DAddr for that copy. - When the Copy block gets a packet that it only
needs to send one copy to QM, it will read the
current reference count and if this copy is the
ONLY copy in the system, it will not prepend the
Header buffer descriptor. - SRAM buffer descriptors are the scarce resource
and we want to optimize their use. - Therefore We do NOT want to always prepend a
header buffer descriptor - Otherwise, Copy will prepend a Header buffer
descriptor to each copy going to the QM. - Copy does not need to prepend a Header buffer
descriptor to copies going to plugins - We have to think some more about the case of
copies going to the XScale. - The Header buffer descriptors will come from the
same pool (freelist 0) as the PacketPayload
buffer descriptors. - There is no advantage to associating these Header
buffer descriptors with small DRAM buffers. - DRAM is not the scarce resource
- SRAM buffer descriptors are the scarce resource.
61MR Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
Offset (16b)
LW1
Packet_Size (16b)
Reserved (8b)
Free_list 0000 (4b)
Reserved (4b)
LW2
Reserved (16b)
Stats Index (16b)
LW3
Reserved (16b)
Reserved (8b)
Reserved (4b)
Reserved (4b)
LW4
Reserved (32b)
Reserved (4b)
Reserved (4b)
LW5
Reserved (16b)
Reserved (16b)
LW6
Packet_Next (32b)
LW7
62Intel Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
Offset (16b)
LW1
Packet_Size (16b)
Hdr_Type (8b)
Free_list (4b)
Rx_stat (4b)
LW2
Input_Port (16b)
Output_Port (16b)
LW3
Next_Hop_ID (16b)
Fabric_Port (8b)
Reserved (4b)
NHID type (4b)
LW4
FlowID (32b)
ColorID (4b)
Reserved (4b)
LW5
Class_ID (16b)
Reserved (16b)
LW6
Packet_Next (32b)
LW7
63SRAM Accesses Per Packet
- To support 8.22 M pkts/sec
- we can Read 24 Words and Write 24 Words per pkt
(200M/8.22M) - Rx
- SRAM Dequeue (1 Word)
- To retrieve a buffer descriptor from free list
- Write buffer desc (2 Words)
- Parse
- Lookup
- TCAM Operations
- Reading Results
- Copy
- Write buffer desc (3 Words)
- Ref_cnt
- MAC DAddr
- Stats Index
- Pre-Q stats increments
- Read 2 Words
- Write 2 Words
- HF
64QM SRAM Accesses Per Packet
- QM (Worst case analysis)
- Enqueue (assume queue is idle and not loaded in
Q-Array) - Write Q-Desc (4 Words)
- Eviction of Least Recently Used Queue
- Write Q-Params ?
- When we evict a Q do we need to write its params
back? - The Q-Length is the only thing that the QM is
changing. - Looks like it writes it back ever time it
enqueues or dequeues - AND it writes it back when it evcicts (we can
probably remove the one when it evicts) - Read Q-Desc (4 Words)
- Read Q-Params (3 Words)
- Q-Length, Threshold, Quantum
- Write Q-Length (1 Word)
- SRAM Enqueue -- Write (1 Word)
- Scheduling structure accesses?
- They are done once every 5 pkts (when running
full rate) - Dequeue (assume queue is not loaded in Q-Array)
- Write Q-Desc (4 Words)
- Write Q-Params ?
65QM SRAM Accesses Per Packet
- QM (Worst case analysis)
- Total Per Pkt accesses
- Queue Descriptors and Buffer Enq/Deq
- Write 9 Words
- Read 9 Words
- Queue Params
- Write 2 Words
- Read 6 Words
- Scheduling Structure Accesses Per Iteration
(batch of 5 packets) - Advance Head Read 11 Words
- Write Tail Write 11 Words
- Update Freelist
- Read 2 Words
- OR
- Write 5 Words
66TCAM Core Lookup Performance
Routes
Filters
- Lookup/Core size of 72 or 144 bits, Freq200MHz
- CAM Core can support 100M searches per second
- For 1 Router on each of NPUA and NPUB
- 8.22 MPkt/s per Router
- 3 Searches per Pkt (Primary Filter, Aux Filter,
Route Lookup) - Total Per Router 24.66 M Searches per second
- TCAM Total 49.32 M Searches per second
- So, the CAM Core can keep up
- Now lets look at the LA-1 Interfaces
67TCAM LA-1 Interface Lookup Performance
Routes
Filters
- Lookup/Core size of 144 bits (ignore for now that
Route size is smaller) - Each LA-1 interface can support 40M searches per
second. - For 1 Router on each of NPUA and NPUB (each NPU
uses a separate LA-1 Intf) - 8.22 MPkt/s per Router
- Maximum of 3 Searches per Pkt (Primary Filter,
Aux Filter, Route Lookup) - Max of 3 assumes they are each done as a separate
operation - Total Per Interface 24.66 M Searches per second
- So, the LA-1 Interfaces can keep up
- Now lets look at the AD SRAM Results
68TCAM Assoc. Data SRAM Results Performance
- 8.22M 72b or 144b lookups
- 32b results consumes 1/12
- 64b results consumes 1/6
- 128b results consumes 1/3
Routes
Filters
- Lookup/Core size of 72 or 144 bits, Freq200MHz,
SRAM Result Size of 128 bits - Associated SRAM can support up to 25M searches
per second. - For 1 Router on each of NPUA and NPUB
- 8.22 MPkt/s per Router
- 3 Searches per Pkt (Primary Filter, Aux Filter,
Route Lookup) - Total Per Router 24.66 M Searches per second
- TCAM Total 49.32 M Searches per second
- So, the Associated Data SRAM can NOT keep up
69Lookups Latency
- Three searches in one MDL Indirect Operation
- Latencies for operation
- QDR xfer time 6 clock cycles
- 1 for MDL Indirect subinstruction
- 5 for 144 bit key transferred across QDR Bus
- Instruction Fifo 2 clock cycles
- Synchronizer 3 clock cycles
- Execution Latency search dependent
- Re-Synchronizer 1 clock cycle
- Total 12 clock cycles
70Lookups Latency
- 144 bit DB, 32 bits of AD (two of these)
- Instruction Latency 30
- Core blocking delay 2
- Backend latency 8
- 72 bit DB, 32 bits of AD
- Instruction Latency 30
- Core blocking delay2
- Backend latency 8
- Latency of first search (144 bit DB)
- 11 30 41 clock cycles
- Latency of subsequent searchs
- (previous search latency) (backend latency of
previous search) (core block delay of previous
search) (backend latency of this search) - Latency of second 144 bit search
- 41 8 2 8 43
- Latency of third search (72 bit)
- 43 8 2 8 45 clock cycles
- 45 QDR Clock cycles (200 MHz clock) ? 315 IXP
Clock cycles (1400 MHz clock) - This is JUST for the TCAM operation, we also need
to read the SRAM - SRAM Read to retrieve TCAM Results Mailbox (3
words one per search)
71Lookups SRAM Bandwidth
- Analysis is PER LA-1 QDR Interface
- That is, each of NPUA and NPUB can do the
following. - 16-bit QDR SRAM at 200 MHz
- Separate read and write bus
- Operations on rising and falling edge of each
clock - 32 bits of read AND 32 bits of write per clock
tick - QDR Write Bus
- 6 32-bit cycles per instruction
- Cycle 0
- Write Address bus contains the TCAM Indirect
Instruction - Write Data bus contains the TCAM Indirect MDL
Sub-Instruction - Cycles 1-5
- Write Data bus contains the 5 words of the Lookup
Key - Write Bus can support 200M/6 33.33 M
searches/sec - QDR Read Bus
- Retrieval of Results Mailbox
- 3 32-bit cycles per instruction
- Retrieval of two full results from QDR SRAM Bank
0 - 6 32-bit cycles per instruction
72Lookups
- Route Lookup
- Key (72b)
- Port (4b) Can be a wildcard (for Unicast,
probably not for Multicast) - Value of 1111b in Port field to denote coming
from the XScale - Ports numbered 0-4
- Plugin (4b) Can be a wildcard (for Unicast,
probably not for Multicast) - Plugins numberd 0-4
- DAddr (32b)
- Prefixed for Unicast
- Exact Match for Multicast
- SAddr (32b)
- Unicast entries always have this and its mask set
to 0 - Prefixed for Multicast
- Result (99b)
- CopyVector (11b)
- One bit for each of the 5 ports and 5 plugins and
one bit for the XScale - PluginOutputPortVector(5b) (under consideration)
- This would allow users to send packets to a
plugin which could then send it along to output
port(s). The copyvector is not useful for this
since bits set in the copyvector would cause the
Copy block to send out multiple copies to
different places. - QID (16b)
73Lookups
- Filter Lookup
- Key (140b)
- Port (4b) Can be a wildcard (for Unicast,
probably not for Multicast) - Value of 1111b in Port field to denote coming
from the XScale - Ports numbered 0-4
- Plugin (4b) Can be a wildcard (for Unicast,
probably not for Multicast) - Plugins numberd 0-4
- DAddr (32b)
- SAddr (32b)
- Protocol (8b)