Title: Packet Classification Using Coarse-Grained Tuple Spaces
1Packet Classification Using Coarse-Grained Tuple
Spaces
Haoyu Song, Jon Turner and Sarang
Dharmapurikar www.arl.wustl.edu
2Overview
- Two-dimensional packet classification problem
- in list of 2d filters, find first match for given
address pair - (1011,0111) lt101,10gt, lt10,011gt, lt1,01gt
- Limitations of current solutions
- fast algorithmic methods require excessive space
(50x) - TCAM has high cost per bit, significant power
usage - Combining cross-product and tuple-space search
- hybrid strategy with range of time-space tradeoff
options - Improving 1d lookups
- combining tree bitmap and Bloom filters
- Possible extensions
3Cross-Product Method
- Procedure
- do 1d lookup on all fields
- combine results into lookup key in cross-product
table - direct lookup table or hash table
- Fast, but space grows as nk for n filters, k
fields
10100, 01110
S0D1
42D Tuple Space Search
- Group by prefix length
- hash table per group
- up to 33 x 33 1,089 groups
- in practice 30-100 occupied tuples
- Rectangle search
- markers to guide search
- at most 33 probes, often less
- hard to update
- Pruned tuple space search
- 1d search on src/dest fields
- find prefix lengths that match src/dest fields of
packet - search intersecting tuples
- if k matching prefixes, at most k2 probes
5Coarse-Grained Tuple Space
- Select coarse-grained partition of tuple space
- Build cross-product table per sub-space
- Search procedure
- 1d lookups for LPM
- probe each subspace
- terminate early if possible
- Pruning
- identify candidate sub-spaces during 1d lookup
- probe selected sub-spaces
- Space/time tradeoff
6Performance of Basic Algorithm
- Equal size divisions of 2d tuple space
- Ratio of cross-products to filter set size
- 2x2 partition brings space usage to 2x minimum
- maximum of four probes required
- compared to 30-90 for simple tuple space search
- Pruning of limited use for filter sets of size
lt104
7Performance of Best Configurations
8Alternate Partitioning Approaches
- Arbitrary sub-spaces are possible
- potential for fewer regions with good space
efficiency - Preliminary results mixed
- may be useful for smaller filter sets
- More evaluation needed
Note filters of form ltprefix,gt and lt,prefixgt
stored in 1d data structures
9Fast 1d Lookups
Tree Bitmap
Hashing Bloom Filters
- Multibit trie
- Co-located children
- Bitmaps for
- prefix nodes
- subtree presence
- 4 bit stride implies 8 memory accesses
- Expand prefixes to standard lengths
- Off-chip hash table per length
- On-chip Bloom filters to avoid unproductive
probes - Large space requirements for good worst-case
performance
10Fast and Compact 1d Lookups
- Insert tree bitmap subtree roots into off-chip
hash tables and on-chip Bloom filters - Lookup prefix of subtree roots in Bloom filters
- if match on length k and all shorter lengths,
probe off-chip table for length k - Reduction in on-chip memory for Bloom filters
- shape-shifting trie yields further space reduction
111d Lookup Performance
200K IPv4 prefixes 5 bit stride for tree bitmap 8
bit on-chip root table
4 Bloom filters 1 BF entry for every 2 prefixes 1
off-chip probes (4 incl. FP)
2 Bloom filters 1 BF entry for every 6 prefixes 2
off-chip probes (4 incl. FP)
12Practical Configuration
- Configure 1d lookups for 1 off-chip probe each
(excluding false positives) - about 5 bits per prefix for Bloom filters with
low FP rate - Record ltprefix,gt and lt,prefixgt filters in 1d
lookup data structures - also proposed in recent paper by Kounavis, et.
al. - Divide remaining filters among four subspaces
- approximately 2 off-chip hash table entries per
filter - at most four probes
- With single QDR SRAM at 200 MHz, 32 bit word size
can do 200 million probes per second - about 33 million packets/second
- 40 byte packets at 10 Gb/s
13Possible Extensions
- More extensive evaluation
- scaling to larger filter sets 100-200K filters
- integrated evaluation of 1d and 2d lookups
- systematic evaluation of alternate partitioning
strategies - Alternate representations of filter sub-spaces
- any filter set data structure is candidate
- using decision trees, can skip 1d lookups
- Generalization to more dimensions
- handling fields with ranges (for port numbers)
- coarse-grained grouping of tuple-spaces defined
on nesting level - can we beat TCAM?