Longest%20Prefix%20Matching%20Trie-based%20Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Longest%20Prefix%20Matching%20Trie-based%20Techniques

Description:

Complete lookup in minimum-sized (40-byte) packet transmission time. OC-768 ... Else number of array locations in root i C(Ti), where Ti's are children of T ... – PowerPoint PPT presentation

Number of Views:301
Avg rating:3.0/5.0
Slides: 50
Provided by: kenca7
Category:

less

Transcript and Presenter's Notes

Title: Longest%20Prefix%20Matching%20Trie-based%20Techniques


1
Longest Prefix MatchingTrie-based Techniques
  • CS 685 Network Algorithmics
  • Spring 2006

2
The Problem
  • Given
  • Database of prefixes with associated next hops,
    say
  • 1000101? 128.44.2.3
  • 01101100 ? 4.33.2.1
  • 10001 ? 124.33.55.12
  • 10 ? 151.63.10.111
  • 01 ? 4.33.2.1
  • 1000100101 ? 128.44.2.3
  • Destination IP address, e.g. 120.16.8.211
  • Find the longest matching prefix and its next
    hop

3
Constraints
  • Handle 150,000 prefixes in database
  • Complete lookup in minimum-sized (40-byte) packet
    transmission time
  • OC-768 (40 Gbps) 8 nsec
  • High degree of multiplexingpackets from 250,000
    flows interleaved
  • Database updated every few milliseconds
  • ? performance ? number of memory accesses

4
Basic ("Unibit") Trie Approach
  • Recursive data structure (a tree)
  • Nodes represent prefixes in the database
  • Root corresponds to prefix of length zero
  • Node for prefix x has three fields
  • 0 branch pointer to node for prefix x0 (if
    present)
  • 1 branch pointer to node for prefix x1 (if
    present)
  • Next hop info for x (if present)

Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
5
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
6
Trie Search Algorithm
  • typedef struct foo
  • struct foo trie_0, trie_1
  • NEXTHOPINFO trie_info
  • TRIENODE
  • NEXTHOPINFO best NULL
  • TRIENODE np root
  • unsigned int bit 0x80000000

while (np ! NULL) if (np-gttrie_info) best
np-gttrie_info // check next bit if
(addrbit) np np-gttrie_1 else np
np-gttrie_0 bit gtgt 1 return best
7
Conserving Space
  • Sparse database ? wasted space
  • Long chains of trie nodes with only one non-NULL
    pointer
  • Solution handle "one-way" branches with special
    nodes
  • encode the bits corresponding to the missing
    nodes using text strings

8
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
9
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
00
10
Bigger Issue Slow!
  • Computing one bit at a time is too slow
  • Worst-case one memory access per bit (32
    accesses!)
  • Solution compute n bits at a time
  • n stride length
  • Use n-bit chunks of addresses as index into array
    in each trie node
  • How to handle prefixes which are not a multiple
    of n in length?
  • Extend them, replicate entries as needed
  • E.g. n3, 1 becomes 100, 101, 110, 111

11
Extending Prefixes
Original Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? w f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
u e1 1001 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Example stride length2
12
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
w e1 1001 ? w f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Total cost 40 pointers (22 null) Max memory
accesses 3
13
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
00
Total cost 46 pointers (21 null) Max memory
accesses 5
14
Choosing Fixed Stride Lengths
  • We are trading space for time
  • Larger stride length ? fewer memory accesses
  • Larger stride length ? more wasted space
  • Use the largest stride length that will fit in
    memory and complete required accesses within the
    time budget

15
Updating
  • Insertion
  • Keep a unibit version of the trie, with each node
    labeled with longest matching prefix and its
    length
  • To insert P, search for P, remembering last node,
    until
  • Null pointer (not present), or
  • Reach the last stride in P
  • Expand P as needed to match stride length
  • Overwrite any existing entries with length less
    than P's
  • Deletion is similar
  • Find entry for prefix to be deleted
  • Remove its entry (from unibit copy also!)
  • Expand any entries that were "covered" by the
    deleted prefix

16
Variable Stride Lengths
  • It is not necessary that every node have the same
    stride length
  • Reduce waste by allowing stride length to vary
    per node
  • Actual stride length encoded in pointer to the
    trie node
  • Nodes with fewer used pointers can have smaller
    stride lengths

17
Expanded Database a0 00 ? x a1 01 ? x b
01000 ? y c0 0110 ? z c1 0111? z d0 10
? w d1 11 ? w e 100 ? w f 1100 ? z g
1101 ? u h 1110 ? z i 1111 ? x
1 bit
2 bits
2 bits
1 bit
u
0
1
Total waste 16 pointers Max memory accesses
3 Note encoding stride length costs 2
bits/pointer
18
Calculating Stride Lengths
  • How to pick stride lengths?
  • We have two variables to play with height and
    stride length
  • Trie height determines lookup speed ? set max
    height first
  • Call it h
  • Then choose strides to minimize storage
  • Define cost of trie T, C(T)
  • If T is a single node, then number of array
    locations in the node
  • Else number of array locations in root ?i
    C(Ti), where Ti's are children of T
  • Straightforward recursive solution
  • Root stride s results in y2s subtries T1, ... Ty
  • For each possible s, recursively compute optimal
    strides for C(Ti)'s using height limit h-1
  • Choose root stride s to minimize total cost (2s
    ?i C(Ti))

19
Calculating Stride Lengths
  • Problem Expensive, repeated subproblems
  • Solution (Srinivasan Varghese)
  • Dynamic programming
  • Observe that each subtree of a variable-stride
    trie contains the set of prefixes as some subtree
    of the original unibit trie
  • For each node of the unibit trie, compute optimal
    stride and cost for that stride
  • Start at bottom (height 1), work up
  • Determine optimal grouping of leaves in subtree
  • Given subtree optimal costs, compute parent
    optimal cost
  • This results in optimal stride length selections
    for the given set of prefixes

20
0
1
00
21
Alternative Method Level Compression
  • LC-trie (Nilsson Karlsson '98) is a
    variable-stride trie with no empty entries in
    trie nodes
  • Procedure
  • Select largest root stride that allows no empty
    entries
  • Do this recursively down through the tree
  • Disadvantage cannot control height precisely

22
Stride 1
Stride 1
0
1
00
Stride 1
Stride 2
23
Performance Comparisons
  • MAE-East database (1997 snapshot)
  • 40K prefixes
  • "Unoptimized" multibit trie 2003 KB
  • Optimal fixed-stride 737 KB, computed in 1 msec
  • Height limit 4 (? 1 Gbps wire speed _at_ 80
    nsec/access)
  • Optimized (SV) variable-stride 423 KB, computed
    in 1.6 sec, Height limit 4
  • LC-compressed
  • Height 7
  • 700 KB

24
Lulea Compressed Tries
  • Goals
  • Minimize number of memory accesses
  • Aggressively compress trie
  • Goal so it can fit in SRAM (or even cache)
  • Three-level trie with strides of 16, 8, 8
  • 8 mem accesses typical
  • Main Techniques
  • Leaf-pushing
  • Eliminate duplicate pointers from trie node
    arrays
  • Efficient bit-counting using precomputation for
    large bitmaps
  • Use of indices instead of full pointers for
    next-hop info

25
1. Leaf-Pushing
  • In general, a trie node entry has associated
  • A pointer to a next trie node
  • A prefix (i.e. pointer to next-hop info)
  • Or both, or neither
  • Observation we don't need to know about a prefix
    pointer along the way until we reach a leaf
  • So "push" prefix pointers down to leaves
  • Keep only one set of pointers per node

26
Leaf-Pushing the Concept
Prefixes
27
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
u e1 1001 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Before
Cost 40 pointers (22 wasted)
28
2. Removing Duplicate Pointers
  • Leaf-pushing results in many consecutive
    duplicate pointers
  • Would like to remove redundancy and store only
    one copy in each node
  • Problem now we can't directly index into array
    using address bits
  • Example k2, bits 01 index 1 needs to be
    converted to index 0 somehow

29
2. Removing Duplicate Pointers
  • Solution Add a bitmap one bit per original
    entry
  • 1 indicates new value
  • 0 indicates duplicate of previous value
  • To convert index i, count 1's up to position i in
    the bitmap, and subtract 1
  • Example old index 1 ? new index 0
  • old index 2 ? new index 1

u
00
u
01
w
10
w
11
30
Bitmap for Duplicate Elimination
Prefixes
10000000000010001000010000000000000000001000000000
01000000000000100010000001000000110000000000000000
00
31
3. Efficient Bit-Counting
  • Lulea first-level 16-bit stride ? 64K entries
  • Impractical to count bits up to, say, entry 34578
    on the fly!
  • Solution Precompute (P2a)
  • Divide bitmap into chunks (say, 64 bits each)
  • Store the number of 1 bits in each chunk in an
    array B
  • Compute 1 bits up to bit k by
  • chunkNum k gtgt 6
  • posInChunk k 0x3f // k mod 64
  • numOnes BchunkNum count1sInChunk(chunkNum,po
    sInChunk) 1

32
Bit-Counting Precomputation Example
index 35
Chunk Size 8 bits
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
1
0
1
0
3
3
6
7
9
Converted index 7 2 1 8
Cost 2 memory accesses (maybe less)
33
4. Efficient Pointer Representation
  • Observation the number of different next-hop
    pointers is limited
  • Each corresponds to an immediate neighbor of the
    router
  • Most routers have at most a few dozen neighbors
  • In some cases a router might have a few hundred
    distinct next hops, even a thousand
  • Apply P7 avoid unnecessary generality
  • Only a few bits (say 8-12) needed to distinguish
    between actual next-hop possibilities
  • Store indices into table of next-hops info
  • E.g., to support up to 1024 next hops 10 bits
  • 40K prefixes ? 40K pointers ? 160KB _at_ 32 bits,
  • vs 50KB _at_ 10 bits

34
Other Lulea Tricks
  • First level of trie uses two levels of
    bit-counting array
  • First counts bits before the 64-bit chunk
  • Second counts bits in the 16-bit word within
    chunk
  • Second- and third-level trie nodes are laid out
    differently depending on number of pointers in
    them
  • Each node has 256 entries
  • Categorized by number of pointers
  • 1-8 "sparse" store 8-bit indices 8 16-bit
    pointers (24B)
  • 9-64 "dense" like first level, but only one
    bit-counting array (only six bits of count
    needed)
  • 65-256 "very dense" like first level, with two
    bit-counting arrays 4 64-bit chunks, 16 16-bit
    words

35
Lulea Performance Results
  • 1997 MAE-East database
  • 32K entries, 58K leaves, 56 different next hops
  • Resulting Trie size 160KB
  • Build time 99 msec
  • Almost all lookups took lt 100 clock cycles
  • (333MHz Pentium)

36
Trie Bitmap(Eatherton, Dittia Varghese)
  • Goal storage, speed comparable to Lulea plus
    fast insertion
  • Main culprit in slow insertion is leaf-pushing
  • So get rid of leaf-pushing
  • Go back to storing node and prefix pointers
    explicitly
  • Use the same compression bitmap trick on both
    lists
  • Store next-hop information separately, only
    retrieve at the end
  • Like leaf-pushing, only in the control plane!
  • Use smaller strides to limit memory accesses to
    one per trie node (Lulea requires at least two)

37
Storing Prefixes Explicitly
  • To avoid expansion/leaf pushing, we have to store
    prefixes in the node explicitly
  • There are 2k1 1 possible prefixes of length k
  • Store list of (unique) next hop pointers for each
    prefix covered by this node
  • Use same bitmap/bit counting technique as Lulea
    to find pointer index
  • Keep trie nodes small (stride 4 or less), exploit
    hardware (P5) to do prefix matching, bit counting

38
Example Root node, stride 3
0

a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
1
0
000
0
x
1
1
001
0
w
0
00
010
1
z
0
01
011
0
u
0
10
100
0
0
11
101
0
0
000
110
1
0
001
111
0
0
010
1
011
1
100
0
101
to child nodes
0
110
0
111
39
Tree Bitmap Results
  • Insertions are as in simple multibit tries
  • May cause complete revamp of trie node, but that
    requires only one memory allocation
  • Performance comparable to Lulea, but insertion
    much faster

40
A Different Lookup Paradigm
  • Can we use binary search to do longest-prefix
    lookups?
  • Observe that each prefix corresponds to a range
    of addresses
  • E.g. 204.198.76.0/24 covers the range
  • 204.198.76.0 204.198.76.255
  • Each prefix has two range endpoints
  • N disjoint prefixes divide the entire space into
    2N1 disjoint segments
  • By sorting range endpoints, and comparing to
    address, can determine exact prefix match

41
Prefixes as Ranges
42
Binary Search on Ranges
  • Store 2N endpoints in sorted order
  • Including the full address range for
  • Store two pointers for each entry
  • "gt" entry next-hop info for addresses strictly
    greater than that value
  • "" entry next-hop info for addresses equal to
    that value

43
Example 6-bit addresses
Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
a 000000-011111 ? x b 010000-010001 ? y c
011000-011111 ? z d 100000-111111 ? w e
100000-100111 ? u f 110000-110011 ? z g
110100-110111 ? u h 111000-111011 ? z i
111100-111111 ? x
44
Range Binary Search Results
  • N prefixes can be searched in log2 N 1 steps
  • Slow compared to multibit tries
  • Insertion can also be expensive
  • Memory-expensive
  • Requires 2 full-size entries per prefix
  • 40K prefixes, 32-bit addresses 320KB, not
    counting next-hop info
  • Advantage no patent restrictions!

45
Binary Search on Prefix LengthsWaldvogel, et al
  • For same-length prefixes, a hash table gives fast
    comparisons
  • But linear search on prefix lengths is too
    expensive
  • Can we do a faster (binary) search on prefix
    lengths?
  • Challenge how do we know whether to move "up" or
    "down" in length on failure?
  • Solution include extra information to indicate
    presence of a longer prefix that might match
  • These are called marker entries
  • Each marker entry also contains best-matching
    prefix for that node
  • When searching, remember best-matching prefix
    when going "up" because of a marker, in case of
    later failure

46
Example Binary Search on Prefix Length
Prefix Lengths 1, 3, 4, 7
Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
0
1
length 1
BMP
a,x
d,w
length 3
011
100
110M
111M
010M
BMP
c,z
e,u
d,w
d,w
a,x
length 4
1100
1101
1110
1111
0100M
BMP
f,z
g,u
h,z
i,x
a,x
length 5
01000
BMP
b,y
Example Search for address 011000 and 101000
47
Binary Search on Prefix Length Performance
  • Worst-case number of hash-table accesses 5
  • However, most prefixes are 16 or 24 bits
  • Arrange hash tables so these are handled in one
    or two accesses
  • This technique is very scalable for larger
    address lengths (e.g. 128 bits for IPv6)
  • Unibit Trie for IPv6 128 accesses!

48
Memory Allocation for Compressed Schemes
  • Problem when using a compressed scheme (like
    Lulea), trie nodes are kept at minimal size
  • If a node grows (changes size), it must be
    reallocated and copied over
  • As we have discussed, memory allocators can
    perform very badly
  • Assume M is the size of the largest possible
    request
  • Cannot guarantee more than 1/log2 M of memory
    will be used!
  • E.g. if M32, 20 is max guaranteed utilization
  • Router vendors cannot claim to support large
    databases

49
Memory Allocation for Compressed Schemes
  • Solution Compaction
  • Copy memory from one location to another
  • General-purpose OS's avoid compaction!
  • Reason very hard to find and update all pointers
    to objects in the moved region
  • The good news
  • Pointer usage is very constrained in IP lookup
    algorithms
  • Most lookup structures are trees ? at most one
    pointer to any node
  • By storing a "parent" pointer, can easily update
    pointers as needed
Write a Comment
User Comments (0)
About PowerShow.com