Title: Packet Classification
1Packet Classification
- Tisson Mathew
- tisson.k.mathew_at_intel.com
2Introduction Papers Covered
- "Scalable High Speed IP Routing Lookups" - M.
Waldvogel, G. Varghese, J. Turner, B. Plattner - Scalable High-Speed Prefix Matching - M.
Waldvogel, G. Varghese, J. Turner, B. Plattner - "Building a Robust Software-Based Router Using
Network Processors" T. Spalink, S. Karlin, L.
Peterson, Y. Gottlieb
3Agenda
- Existing Longest prefix matching algorithms -
Analysis - Papers proposed Algorithm - Analysis
- Software-Based Router using Network Processors (
Intel IXP 1200 )
4Acronyms
- CIDR - Classless Inter-Domain Routing
- BMP - Best Matching Prefix
- MPLS - Multi-Protocol Layer Switching
- CAM - Content Accessible Memory
- NP - Network Processor
- MAC Media Access Control
5Existing Solutions for BMP
- Modified Exact Matching Scheme
- Binary Search O (log2 2N ) , N -gt No. of
routing table entries - Exact Match O ( W ) , W -gt Address Length
- Trie Based
- BSD Radix Trie O ( W ) , W -gt Address Length
and requires W memory access - W -gt 32 for IPV4 and 128 IPV6
6Existing Solution Contd.
- Hardware Solutions
- Match done in parallel
- Ternary CAMs are Expensive
- Cant keep pace with Software
- Protocol Based Solutions
- IP switching / Tag switching
- Boundary Routers to make full route decisions
- Good performance within a level of Network
hierarchy - Requires BMP on portions of Network
- Caching
- Implemented CAMs -gt Expensive
- Helps but doesnt avoid BMP lookups
7Scalable High-Speed Prefix Matching
- Hashing One hash table per prefix length
- Binary Search On hash tables
- Backtracking avoidance Adding markers and
precomputation -
8Hash Tables
- One hash table per prefix length
9Hash Table terms
- L -gt Table
- Li.Length -gt length of prefix at i th
position - Li.Length -gt pointer to the hash table for i
th prefix length
10Linear Search Procedure
- To Search for address D
- 1.Start at longest prefix length l
- 2.Extract l bits of D and do a search on hash
table for l bits - 3.If l bits of D present in the table , found
BMP Otherwise perform the steps 1 2 for largest
prefix length smaller than l until all prefix
lengths are searched.
11Binary Search Procedure
1.Get middle level of the hash table array
L 2.Extract most significant l ( same as the
prefix length ) bits of D 3.Search hash table
for prefix length l 4.If found search higher
length prefix and remember the found entry
otherwise search lower prefix length Note Add
Markers to indicate longer prefixes , modify
Binary Search to do Backtracking
12Binary Search Steps
1.Eg. Find BMP 1100100 ?CORRECT 2.Eg. Find BMP
1111000 ?BMP miss
13Adding Markers
E.g. Find BMP 1111000 ? CORRECT - Marker
is 1111 , DO NOT need at all levels (shared)
- 25 increase on typical routing database
14Backtracking Avoidance
- Make Marker node a record containing a variable
bmp? BMP of the Marker - bmp is computed when Marker is inserted into
its hash table - When a marker is found and if the longer prefix
search fail , Marker.bmp is the BMP - So Backtracking is NOT needed
15Optimizations
Modify binary tree structure to search most
promising prefix length layers first ( 8,16,24 )
16Mutating Binary Search
- Improve Search tree after each match
- Every match with Marker X means that we need to
search among the set of prefixes for which X is
a prefix - Whenever we get a match and move to a subtrie ,
we only need to do binary search on the levels of
new subtrie
17Rope Search
Rope is the sequence of levels which binary tree
would follow on repeated failures Marker can
store the rope ? M.rope
18Implementation
- Every hash table entry is associated with a bmp
field and rope (?) field which are precomputed - Precomputation requires complex insertion
routines - Addition of new prefix rare in field but route to
a prefix may change frequently
19Implementation contd.
- Addition or deletion of a prefix into hash table
may change bmp values of Marker(s) - Addition of a new prefix may change the rope of
a number of entries - Solution Batch changes and build search tree
again
20Speed and Memory usage
21Summary
- Scalable , distributed adaptable
- Secure distributed storage
- Distributed key storage
- Protocols for bandwidth fairness enforcement
- Need optimized Insert / Delete algorithms
- Route database size increases
- Additions without rebalancing may degrade
performance over time.
22Software Routers
- Using Network Processors e.g. Intel IXP 1200
- Need for software services ? firewalls, Intrusion
detection , level-n switching, overlays - Moving from NICs Processor to Network
Processors - Paper explains router built using 733 Mhz P3 and
IXP 1200 PCI board
23Characteristics of IXP 1200
- High packet-per-sec (pps ) ? e.g. 2.5G link has
to process 6.1 M minimum sized pps - 6 MicroEngines Parallelism to hide memory
latency - Processing and memory operations are done in
parallel using hardware contexts in MicroEngines - IXP 1200 includes the standard blocks of a
networking ASIC with flexible data movement and
programmable parallel machines
24Router Design
- Data Plane ? forwarding packets
- Control Plane ? run OSPF , RSVP and LDP
- Data Plane runs on NP process packets at line
speed - Data Plane does minimal processing ( e.g.
Validating header , decrements TTL , recomputes
checksum ) but it may run QoS - Control Plane runs on Host Processor expects
fewer packets to process ( only during route
changes ) - Control plane is compute intensive ( OSPF to
compute new routing table )
25Packet Switching Paths
Path 1 Cycles at MicroEngines only Path 2
Cycles at MicroEngine StrongARM Path 3 Cycles
at MicroEngine , StrongARM Pentium
26Software Router Architecture
- Classifier C reads packets from input port ,
based on fields in the packet header selects a
Forwarder F - Forwarder F applies some function to the packet
and sends the modified packet to the output queue - Output Scheduler S selects one of its non-empty
queues and transmits packet to the output port
27IXP 1200 Block Diagram
28Blocks
- The major blocks are
- XScale processor -- General purpose 32-bit RISC
processor compatible to ARM Version 5
Architecture. XScale is used to initialize and
manage the chip, and can be used for higher layer
network processing tasks. - Microengines (MEs) -- 6 32-bit programmable
engines specialized for Network Processing.
Microengines do the main data plane processing
per packet. - DRAM -- Typically DRAM is used for buffer
packets. - SRAM -- Typically SRAM is used to store routing
table and per-flow state if any - Scratchpad Memory -- 4 KBytes storage for general
purpose use.
29Forwarding pipeline
Each FIFO is an addressable 16 slot x 64 byte
register file Unit of Data in IXP 1200 is
MAC-Packet of 64 byte size DMA operations needs
to be mutex protected staged input processing
30MicroEngine Protocol Processing IP
- Validate IP Header
- Decrement TTL field
- Recompute checksum
- Update Destination MAC address to the one found
in the routing table - Update Source MAC Address to that of the output
port - Copy MP into DRAM for output queue or pass to
StrongARM if involves additional IP options
31Queuing
- Protocol processing on the First MP chooses
destination queue (out) for entire packet - Check queue for ready packets , keep queue tail
pointers in registers ( 16 max ), prioritize
queues. - Queue contention avoidance using mutex
32MicroEngine Evaluation
Routers performance is greatly influenced by
queuing discipline selected
33Key Learnings
- IXP 1200 is not easy to program
- No Compiler for MicroEngines ( do it all in
assembly ? ) - Static configuration is port dependant , needs to
re-design the software if no. of ports change - Met all processing speeds
34StrongARM / XScale Stage
- Access DRAM directly with minimal overhead
- Input contexts packets usual but upon detection
that packet requires service from XScale( e.g.
route miss in cache ) ,it enqueues packet in
XScale queue - Interrupt XScale on new packet in queue
- XScale does the output processing for the packet
35Pentium Stage
- Move packets IXP 1200 to Pentium over PCI bus
- Intelligent I/O standard
- Process routing protocols , VPN
36Conclusion
- Staged Computation Parallelism using Contexts
- Speed decreases on as stage goes higher
- Need more software tools ( MicroEngine Compiler
) - Software-based Vs Hardware-based router
becomes less meaningful
37Where to Get More Information
- http//www.intel.com/design/network/products/npfam
ily/ixp1200.htm for Network Processors - http//www.tik.ee.ethz.ch/mwa/HPPC/ for Longest
prefix matching algorithms