Title: A 50-Gb/s IP Router
1A 50-Gb/s IP Router
- Authors Craig Partridge et al. IEEE/ACM TON June
1998 - Presenter Srinivas R. Avasarala
- CS Dept., Purdue University
2Why a Gigabit Router?
- Transmission link bandwidths are improving at
very fast rates - Network usage is expanding
- Host Adapters, OS, switches and MUX also need to
get faster for improved network performance - The goal of the work is to show routers can keep
pace with other technologies
3Goals of a Multi-Gigabit Router (MGR)
- Enough internal bandwidth to move packets between
its interfaces at gigabit rates - Enough packet processing power to forward several
million packets per second (MPPS) - Conformance to protocol standards
- MGR achieves up to 32 MPPS forwarding rates with
- 50 Gb/s of full duplex backplane capacity
4Router Architecture
- Multiple line cards, each with one or more
network interfaces - Forwarding Engine cards (FEs), to make packet
forwarding decisions - High speed switch
- Network processor
5Router Architecture
6Major Innovations
- Each FE has a complete set of the routing tables
- A switched fabric is used instead of the
traditional shared bus - FEs are on boards distinct from the line cards
- Use of an abstract link layer header
- Include QoS processing in the router
7The Forwarding Engine Processor
- A 415-MHz DEC Alpha 21164 processor
- 64bit 32 register super scalar RISC processor
- 2 Integer logic units E0, E1
- 2 Floating point logic units FA, FM
- Each cycle schedules one instruction to each
logic unit, processing 4 instructions (quad) in a
group
8Forwarding Engines Caches
- 3 internal caches
- First level instruction cache (Icache) 8kB
- First-level data cache (Dcache) 8kB
- An on-chip secondary cache (Scache) 96kB used as
a cache of recent routes. Can store 12000 routes
approx. with 64b per route - An external tertiary cache (Bcache) 16 MB
- Divided into two 8MB banks
- One bank stores entire forwarding table
- Other is updated by NW processor via PCI bus
9Forwarding Engine Hardware
- Headers are placed in a request FIFO queue
- Alpha reads from queue head, examines header,
makes route decision and informs inbound card - Header includes 24/56B of packet 8B abstract
link layer header and alpha reads a min of 32B - Alpha writes out the updated header indicating
the outbound interface to use (dispatching info) - Updated header contains the outbound link layer
address and a flow id used for packet scheduling - Unique approach to ARP !!
10Forwarding Engine Software
- A few 100 lines of code
- 85 instructions in the common case taking a
minimum of 42 cycles. - This gives a peak forwarding rate of 9.8MPPS
- 415 MHz/42 cycles 9.8MPPS
- Fast path of the code is in 3 stages, each with
about 20-30 instructions (10-15 cycles)
11Fast path of the code
- Stage 1
- Basic error checking to see if header is from a
IP datagram - Confirm packet/header lengths are reasonable
- Confirm that IP header has no options
- Compute hash offset into route cache and load the
route - Start loading of next header
12Fast path of the code
- Stage 2
- Check if cached route matches destination of the
datagram - If not then do an extended lookup in the route
table in Bcache - Update TTL and CHECKSUM fields
- Stage 3
- Put updated ttl, checksum and the route
information into IP hdr along with link layer
info from the forwarding table
13An exception !!
- IP HDR checksum is not verified but only updated
- The incremental update algorithm is safe because
if the checksum is bad, it remains bad - Reason Checksum verification is expensive and is
a large penalty to pay for a rare error that can
be caught end-to-end - Requires 17 instructions with min of 14 cycles,
increasing forwarding time by 21 - IPv6 does not include a header checksum too!!
14Some datagrams not handled in fast path
- Headers whose destination misses in the cache
- Headers with errors
- Headers with IP options
- Datagrams that require fragmentation
- Multicast datagrams
- Requires multicast routing which is based on
source address and inbound link as well - Requires multiple copies of header to be sent to
different line cards
15Instruction set
- 27 of them do bit, byte or word manipulation due
to extraction of various fields from headers - The above instructions can only be done in E0,
resulting in contention (checksum verifying) - Floating point instructions account for 12 but
do not have any impact on performance as they
only set SNMP values and can be interleaved - There is a minimum of loads(6) and stores(4)
16Issues in forwarding design
- Why not use an ASIC in place of the engine ?
- Since IP protocol is stable, why not do it ?
- Answer depends on where the router will be
deployed corporate LAN or ISPs backbone? - How effective is a route cache ?
- A full route lookup is 5 times more expensive
than a cache hit. So we need modest hit rates. - And modest hit rates seem to be assured because
of packet trains
17Abstract link layer header
- Designed to keep the forwarding engine and its
code simple
18The Switched bus
- Instead of the conventional shared bus, MGR uses
a 15-port point-to-point switch - Limitation of a point-to-point switch is that it
does not support one to many transfers - The switch has 2 interfaces to each function card
- Data Interface 75 input, 75 output pins clocked
at 51.84 MHz - Allocation Interface 2 request pins, 2 inhibit
pins, 1 input status pin and 1 output status pin
clocked at 25.92 MHz
19Data transfer in the switch
- An epoch is 16 ticks of data clock (8 allocation
clk) - Up to 15 simultaneous transfers in an epoch
- Each transfer is 1024 bits of data 176
auxiliary bits for parity and control - Aggregate data bandwidth is 49.77 Gb/s. 58.32
Gb/s including the auxiliary bits. 3.3 Gb/s per
line card - The 1024 bits are sent in two 64B blocks
- Function cards are expected to wait several
epochs for another 64B block to fill the transfer
20Scheduling of the switch
- Minimum of 4 epochs to schedule and complete a
transfer but scheduling is pipelined. - Epoch1 source card signals that it has data to
send to the destination card - Epoch2 switch allocator schedules transfer
- Epoch3 source and destination cards are notified
and told to configure themselves - Epoch4 transfer takes place
- Flow control through inhibit pins
21The Switch Allocator card
- Takes connection requests from function cards
- Takes inhibit requests from destination cards
- Computes a transfer configuration for each epoch
- 15X15 225 possible pairings with 15! Patterns
- Disadvantages of the simple allocator
- Unfair there is a preference for low-numbered
sources - Requires evaluating 225 positions per epoch,
which is too fast for an FPGA
22The Switch Allocator Card
23The Switch Allocator
- Solution to unfairness problem Random shuffling
of sources and destinations - Solution to timing problem Parallel evaluation
of multiple locations - Priority to requests from forwarding engines over
line cards to avoid header contention on line
cards
24Line Card Design
- A line card in MGR can have up to 16 interfaces
on it, all of the same type - Total bandwidth of all interfaces on a card must
not exceed 2.5 Gb/s. The difference between 2.5
and 3.3 Gb/s is to allow for transfer of headers
to and from forwarding engines - Can support
- 1 OC-48c 2.4 Gb/s SONET interface
- 4 OC-12c 622 Mb/s SONET interfaces
- 3 HIPPI 800 Mb/s interfaces
- 16 100Mb/s Ethernet or FDDI interfaces
25Line card Inbound packet processing
- Assigns a packet id and breaks data into a chain
of 64B - The first page is sent to the FE to get routing
info - 2 Complications
- Multicasting FE sends multiple copies of updated
first pages for a single packet - ATM Cells are 53b. So we need SAR. OAM cells
between interfaces on the same card must be sent
directly in a single page.
26Line card Outbound packet processing
- Receives pages of a packet from the switch
- Assembles them in a list
- Creates a packet record pointing to the list
- Passes the packet record QoS processor (an FPGA)
which does scheduling based on flow ids - Any link layer scheduling is done separately later
27Network processor, Routing tables
- 233-MHz 21064 Alpha processor
- Access to line cards through a PCI bridge
- Runs 1.1 NetBSD UNIX
- All routing protocols run on the NW processor
- FEs have only small tables with minimal info
- NW processor periodically downloads new tables
into FEs - FEs then switch memory banks and invalidate the
route cache
28Conclusions
- Makes 2 important contributions
- Emphasizes on examining every header improves
robustness and security - Shows it is feasible to build routers that can
serve in emerging high-speed networks - In all, an excellent paper providing complete and
intricate details about high speed router design