Packet Classification - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Packet Classification

Description:

'Building a Robust Software-Based Router Using Network Processors' T. Spalink, ... BSD Radix Trie : O ( W ) , W - Address Length and requires W memory access ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 38

Provided by: thef

Category:

more less

Transcript and Presenter's Notes

Title: Packet Classification

1
Packet Classification

Tisson Mathew
tisson.k.mathew_at_intel.com

2
Introduction Papers Covered

"Scalable High Speed IP Routing Lookups" - M.
Waldvogel, G. Varghese, J. Turner, B. Plattner
Scalable High-Speed Prefix Matching - M.
Waldvogel, G. Varghese, J. Turner, B. Plattner
"Building a Robust Software-Based Router Using
Network Processors" T. Spalink, S. Karlin, L.
Peterson, Y. Gottlieb

3
Agenda

Existing Longest prefix matching algorithms -
Analysis
Papers proposed Algorithm - Analysis
Software-Based Router using Network Processors (
Intel IXP 1200 )

4
Acronyms

CIDR - Classless Inter-Domain Routing
BMP - Best Matching Prefix
MPLS - Multi-Protocol Layer Switching
CAM - Content Accessible Memory
NP - Network Processor
MAC Media Access Control

5
Existing Solutions for BMP

Modified Exact Matching Scheme
Binary Search O (log2 2N ) , N -gt No. of
routing table entries
Exact Match O ( W ) , W -gt Address Length
Trie Based
BSD Radix Trie O ( W ) , W -gt Address Length
and requires W memory access
W -gt 32 for IPV4 and 128 IPV6

6
Existing Solution Contd.

Hardware Solutions
Match done in parallel
Ternary CAMs are Expensive
Cant keep pace with Software
Protocol Based Solutions
IP switching / Tag switching
Boundary Routers to make full route decisions
Good performance within a level of Network
hierarchy
Requires BMP on portions of Network
Caching
Implemented CAMs -gt Expensive
Helps but doesnt avoid BMP lookups

7
Scalable High-Speed Prefix Matching

Hashing One hash table per prefix length
Binary Search On hash tables
Backtracking avoidance Adding markers and
precomputation

8
Hash Tables

One hash table per prefix length

9
Hash Table terms

L -gt Table
Li.Length -gt length of prefix at i th
position
Li.Length -gt pointer to the hash table for i
th prefix length

10
Linear Search Procedure

To Search for address D
1.Start at longest prefix length l
2.Extract l bits of D and do a search on hash
table for l bits
3.If l bits of D present in the table , found
BMP Otherwise perform the steps 1 2 for largest
prefix length smaller than l until all prefix
lengths are searched.

11
Binary Search Procedure
1.Get middle level of the hash table array
L 2.Extract most significant l ( same as the
prefix length ) bits of D 3.Search hash table
for prefix length l 4.If found search higher
length prefix and remember the found entry
otherwise search lower prefix length Note Add
Markers to indicate longer prefixes , modify
Binary Search to do Backtracking
12
Binary Search Steps
1.Eg. Find BMP 1100100 ?CORRECT 2.Eg. Find BMP
1111000 ?BMP miss
13
Adding Markers
E.g. Find BMP 1111000 ? CORRECT - Marker
is 1111 , DO NOT need at all levels (shared)
- 25 increase on typical routing database
14
Backtracking Avoidance

Make Marker node a record containing a variable
bmp? BMP of the Marker
bmp is computed when Marker is inserted into
its hash table
When a marker is found and if the longer prefix
search fail , Marker.bmp is the BMP
So Backtracking is NOT needed

15
Optimizations
Modify binary tree structure to search most
promising prefix length layers first ( 8,16,24 )
16
Mutating Binary Search

Improve Search tree after each match
Every match with Marker X means that we need to
search among the set of prefixes for which X is
a prefix
Whenever we get a match and move to a subtrie ,
we only need to do binary search on the levels of
new subtrie

17
Rope Search
Rope is the sequence of levels which binary tree
would follow on repeated failures Marker can
store the rope ? M.rope
18
Implementation

Every hash table entry is associated with a bmp
field and rope (?) field which are precomputed
Precomputation requires complex insertion
routines
Addition of new prefix rare in field but route to
a prefix may change frequently

19
Implementation contd.

Addition or deletion of a prefix into hash table
may change bmp values of Marker(s)
Addition of a new prefix may change the rope of
a number of entries
Solution Batch changes and build search tree
again

20
Speed and Memory usage
21
Summary

Scalable , distributed adaptable
Secure distributed storage
Distributed key storage
Protocols for bandwidth fairness enforcement
Need optimized Insert / Delete algorithms
Route database size increases
Additions without rebalancing may degrade
performance over time.

22
Software Routers

Using Network Processors e.g. Intel IXP 1200
Need for software services ? firewalls, Intrusion
detection , level-n switching, overlays
Moving from NICs Processor to Network
Processors
Paper explains router built using 733 Mhz P3 and
IXP 1200 PCI board

23
Characteristics of IXP 1200

High packet-per-sec (pps ) ? e.g. 2.5G link has
to process 6.1 M minimum sized pps
6 MicroEngines Parallelism to hide memory
latency
Processing and memory operations are done in
parallel using hardware contexts in MicroEngines
IXP 1200 includes the standard blocks of a
networking ASIC with flexible data movement and
programmable parallel machines

24
Router Design

Data Plane ? forwarding packets
Control Plane ? run OSPF , RSVP and LDP
Data Plane runs on NP process packets at line
speed
Data Plane does minimal processing ( e.g.
Validating header , decrements TTL , recomputes
checksum ) but it may run QoS
Control Plane runs on Host Processor expects
fewer packets to process ( only during route
changes )
Control plane is compute intensive ( OSPF to
compute new routing table )

25
Packet Switching Paths
Path 1 Cycles at MicroEngines only Path 2
Cycles at MicroEngine StrongARM Path 3 Cycles
at MicroEngine , StrongARM Pentium
26
Software Router Architecture

Classifier C reads packets from input port ,
based on fields in the packet header selects a
Forwarder F
Forwarder F applies some function to the packet
and sends the modified packet to the output queue
Output Scheduler S selects one of its non-empty
queues and transmits packet to the output port

27
IXP 1200 Block Diagram
28
Blocks

The major blocks are
XScale processor -- General purpose 32-bit RISC
processor compatible to ARM Version 5
Architecture. XScale is used to initialize and
manage the chip, and can be used for higher layer
network processing tasks.
Microengines (MEs) -- 6 32-bit programmable
engines specialized for Network Processing.
Microengines do the main data plane processing
per packet.
DRAM -- Typically DRAM is used for buffer
packets.
SRAM -- Typically SRAM is used to store routing
table and per-flow state if any
Scratchpad Memory -- 4 KBytes storage for general
purpose use.

29
Forwarding pipeline
Each FIFO is an addressable 16 slot x 64 byte
register file Unit of Data in IXP 1200 is
MAC-Packet of 64 byte size DMA operations needs
to be mutex protected staged input processing
30
MicroEngine Protocol Processing IP

Validate IP Header
Decrement TTL field
Recompute checksum
Update Destination MAC address to the one found
in the routing table
Update Source MAC Address to that of the output
port
Copy MP into DRAM for output queue or pass to
StrongARM if involves additional IP options

31
Queuing

Protocol processing on the First MP chooses
destination queue (out) for entire packet
Check queue for ready packets , keep queue tail
pointers in registers ( 16 max ), prioritize
queues.
Queue contention avoidance using mutex

32
MicroEngine Evaluation
Routers performance is greatly influenced by
queuing discipline selected
33
Key Learnings

IXP 1200 is not easy to program
No Compiler for MicroEngines ( do it all in
assembly ? )
Static configuration is port dependant , needs to
re-design the software if no. of ports change
Met all processing speeds

34
StrongARM / XScale Stage

Access DRAM directly with minimal overhead
Input contexts packets usual but upon detection
that packet requires service from XScale( e.g.
route miss in cache ) ,it enqueues packet in
XScale queue
Interrupt XScale on new packet in queue
XScale does the output processing for the packet

35
Pentium Stage

Move packets IXP 1200 to Pentium over PCI bus
Intelligent I/O standard
Process routing protocols , VPN

36
Conclusion

Staged Computation Parallelism using Contexts
Speed decreases on as stage goes higher
Need more software tools ( MicroEngine Compiler
)
Software-based Vs Hardware-based router
becomes less meaningful

37
Where to Get More Information

http//www.intel.com/design/network/products/npfam
ily/ixp1200.htm for Network Processors
http//www.tik.ee.ethz.ch/mwa/HPPC/ for Longest
prefix matching algorithms

Write a Comment

User Comments (0)