Packet Classification - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Packet Classification

Description:

'Building a Robust Software-Based Router Using Network Processors' T. Spalink, ... BSD Radix Trie : O ( W ) , W - Address Length and requires W memory access ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 38
Provided by: thef
Category:

less

Transcript and Presenter's Notes

Title: Packet Classification


1
Packet Classification
  • Tisson Mathew
  • tisson.k.mathew_at_intel.com

2
Introduction Papers Covered
  • "Scalable High Speed IP Routing Lookups" - M.
    Waldvogel, G. Varghese, J. Turner, B. Plattner
  • Scalable High-Speed Prefix Matching - M.
    Waldvogel, G. Varghese, J. Turner, B. Plattner
  • "Building a Robust Software-Based Router Using
    Network Processors" T. Spalink, S. Karlin, L.
    Peterson, Y. Gottlieb

3
Agenda
  • Existing Longest prefix matching algorithms -
    Analysis
  • Papers proposed Algorithm - Analysis
  • Software-Based Router using Network Processors (
    Intel IXP 1200 )

4
Acronyms
  • CIDR - Classless Inter-Domain Routing
  • BMP - Best Matching Prefix
  • MPLS - Multi-Protocol Layer Switching
  • CAM - Content Accessible Memory
  • NP - Network Processor
  • MAC Media Access Control

5
Existing Solutions for BMP
  • Modified Exact Matching Scheme
  • Binary Search O (log2 2N ) , N -gt No. of
    routing table entries
  • Exact Match O ( W ) , W -gt Address Length
  • Trie Based
  • BSD Radix Trie O ( W ) , W -gt Address Length
    and requires W memory access
  • W -gt 32 for IPV4 and 128 IPV6

6
Existing Solution Contd.
  • Hardware Solutions
  • Match done in parallel
  • Ternary CAMs are Expensive
  • Cant keep pace with Software
  • Protocol Based Solutions
  • IP switching / Tag switching
  • Boundary Routers to make full route decisions
  • Good performance within a level of Network
    hierarchy
  • Requires BMP on portions of Network
  • Caching
  • Implemented CAMs -gt Expensive
  • Helps but doesnt avoid BMP lookups

7
Scalable High-Speed Prefix Matching
  • Hashing One hash table per prefix length
  • Binary Search On hash tables
  • Backtracking avoidance Adding markers and
    precomputation

8
Hash Tables
  • One hash table per prefix length

9
Hash Table terms
  • L -gt Table
  • Li.Length -gt length of prefix at i th
    position
  • Li.Length -gt pointer to the hash table for i
    th prefix length

10
Linear Search Procedure
  • To Search for address D
  • 1.Start at longest prefix length l
  • 2.Extract l bits of D and do a search on hash
    table for l bits
  • 3.If l bits of D present in the table , found
    BMP Otherwise perform the steps 1 2 for largest
    prefix length smaller than l until all prefix
    lengths are searched.

11
Binary Search Procedure
1.Get middle level of the hash table array
L 2.Extract most significant l ( same as the
prefix length ) bits of D 3.Search hash table
for prefix length l 4.If found search higher
length prefix and remember the found entry
otherwise search lower prefix length Note Add
Markers to indicate longer prefixes , modify
Binary Search to do Backtracking
12
Binary Search Steps
1.Eg. Find BMP 1100100 ?CORRECT 2.Eg. Find BMP
1111000 ?BMP miss
13
Adding Markers
E.g. Find BMP 1111000 ? CORRECT - Marker
is 1111 , DO NOT need at all levels (shared)
- 25 increase on typical routing database
14
Backtracking Avoidance
  • Make Marker node a record containing a variable
    bmp? BMP of the Marker
  • bmp is computed when Marker is inserted into
    its hash table
  • When a marker is found and if the longer prefix
    search fail , Marker.bmp is the BMP
  • So Backtracking is NOT needed

15
Optimizations
Modify binary tree structure to search most
promising prefix length layers first ( 8,16,24 )
16
Mutating Binary Search
  • Improve Search tree after each match
  • Every match with Marker X means that we need to
    search among the set of prefixes for which X is
    a prefix
  • Whenever we get a match and move to a subtrie ,
    we only need to do binary search on the levels of
    new subtrie

17
Rope Search
Rope is the sequence of levels which binary tree
would follow on repeated failures Marker can
store the rope ? M.rope
18
Implementation
  • Every hash table entry is associated with a bmp
    field and rope (?) field which are precomputed
  • Precomputation requires complex insertion
    routines
  • Addition of new prefix rare in field but route to
    a prefix may change frequently

19
Implementation contd.
  • Addition or deletion of a prefix into hash table
    may change bmp values of Marker(s)
  • Addition of a new prefix may change the rope of
    a number of entries
  • Solution Batch changes and build search tree
    again

20
Speed and Memory usage
21
Summary
  • Scalable , distributed adaptable
  • Secure distributed storage
  • Distributed key storage
  • Protocols for bandwidth fairness enforcement
  • Need optimized Insert / Delete algorithms
  • Route database size increases
  • Additions without rebalancing may degrade
    performance over time.

22
Software Routers
  • Using Network Processors e.g. Intel IXP 1200
  • Need for software services ? firewalls, Intrusion
    detection , level-n switching, overlays
  • Moving from NICs Processor to Network
    Processors
  • Paper explains router built using 733 Mhz P3 and
    IXP 1200 PCI board

23
Characteristics of IXP 1200
  • High packet-per-sec (pps ) ? e.g. 2.5G link has
    to process 6.1 M minimum sized pps
  • 6 MicroEngines Parallelism to hide memory
    latency
  • Processing and memory operations are done in
    parallel using hardware contexts in MicroEngines
  • IXP 1200 includes the standard blocks of a
    networking ASIC with flexible data movement and
    programmable parallel machines

24
Router Design
  • Data Plane ? forwarding packets
  • Control Plane ? run OSPF , RSVP and LDP
  • Data Plane runs on NP process packets at line
    speed
  • Data Plane does minimal processing ( e.g.
    Validating header , decrements TTL , recomputes
    checksum ) but it may run QoS
  • Control Plane runs on Host Processor expects
    fewer packets to process ( only during route
    changes )
  • Control plane is compute intensive ( OSPF to
    compute new routing table )

25
Packet Switching Paths
Path 1 Cycles at MicroEngines only Path 2
Cycles at MicroEngine StrongARM Path 3 Cycles
at MicroEngine , StrongARM Pentium
26
Software Router Architecture
  • Classifier C reads packets from input port ,
    based on fields in the packet header selects a
    Forwarder F
  • Forwarder F applies some function to the packet
    and sends the modified packet to the output queue
  • Output Scheduler S selects one of its non-empty
    queues and transmits packet to the output port

27
IXP 1200 Block Diagram
28
Blocks
  • The major blocks are
  • XScale processor -- General purpose 32-bit RISC
    processor compatible to ARM Version 5
    Architecture. XScale is used to initialize and
    manage the chip, and can be used for higher layer
    network processing tasks.
  • Microengines (MEs) -- 6 32-bit programmable
    engines specialized for Network Processing.
    Microengines do the main data plane processing
    per packet.
  • DRAM -- Typically DRAM is used for buffer
    packets.
  • SRAM -- Typically SRAM is used to store routing
    table and per-flow state if any
  • Scratchpad Memory -- 4 KBytes storage for general
    purpose use.

29
Forwarding pipeline
Each FIFO is an addressable 16 slot x 64 byte
register file Unit of Data in IXP 1200 is
MAC-Packet of 64 byte size DMA operations needs
to be mutex protected staged input processing
30
MicroEngine Protocol Processing IP
  • Validate IP Header
  • Decrement TTL field
  • Recompute checksum
  • Update Destination MAC address to the one found
    in the routing table
  • Update Source MAC Address to that of the output
    port
  • Copy MP into DRAM for output queue or pass to
    StrongARM if involves additional IP options

31
Queuing
  • Protocol processing on the First MP chooses
    destination queue (out) for entire packet
  • Check queue for ready packets , keep queue tail
    pointers in registers ( 16 max ), prioritize
    queues.
  • Queue contention avoidance using mutex

32
MicroEngine Evaluation
Routers performance is greatly influenced by
queuing discipline selected
33
Key Learnings
  • IXP 1200 is not easy to program
  • No Compiler for MicroEngines ( do it all in
    assembly ? )
  • Static configuration is port dependant , needs to
    re-design the software if no. of ports change
  • Met all processing speeds

34
StrongARM / XScale Stage
  • Access DRAM directly with minimal overhead
  • Input contexts packets usual but upon detection
    that packet requires service from XScale( e.g.
    route miss in cache ) ,it enqueues packet in
    XScale queue
  • Interrupt XScale on new packet in queue
  • XScale does the output processing for the packet

35
Pentium Stage
  • Move packets IXP 1200 to Pentium over PCI bus
  • Intelligent I/O standard
  • Process routing protocols , VPN

36
Conclusion
  • Staged Computation Parallelism using Contexts
  • Speed decreases on as stage goes higher
  • Need more software tools ( MicroEngine Compiler
    )
  • Software-based Vs Hardware-based router
    becomes less meaningful

37
Where to Get More Information
  • http//www.intel.com/design/network/products/npfam
    ily/ixp1200.htm for Network Processors
  • http//www.tik.ee.ethz.ch/mwa/HPPC/ for Longest
    prefix matching algorithms
Write a Comment
User Comments (0)
About PowerShow.com