Title: CAMP: Fast and Efficient IP Lookup Architecture
1CAMP Fast and Efficient IP Lookup Architecture
- Sailesh Kumar, Michela Becchi,
- Patrick Crowley, Jonathan Turner
-
- Washington University in St. Louis
2Context
- Trie based IP lookup
- Circular pipeline architectures
3Context
- Trie based IP lookup
- Circular pipeline architectures
IP address 111010
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
4Context
- Trie based IP lookup
- Circular pipeline architectures
IP address 111010
Trie
Prefix dataset
P1
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
P7
P6
P5
P2
P8
P3
P4
5Context
- Trie based IP lookup
- Circular pipeline architectures
IP address 111010
Trie
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
Stage 1
Stage 2
Stage 3
Stage 4
6Context
- Trie based IP lookup
- Circular pipeline architectures
IP address 111010
Trie
Circular pipeline
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
Stage 1
Stage 2
Stage 3
Stage 4
7CAMP Circular Adaptive and Monotonic Pipeline
- Problems
- Optimize global memory requirement
- Avoid bottleneck stages
- Make the per stage utilization uniform
- Idea
- Exploit a Circular pipeline
- Each stage can be a potential entry-exit point
- Possible wrap-around
- Split the trie into sub-trees and map each of
them independently to the pipeline
8CAMP (contd)
- Implications
- PROS
- Flexibility decoupling of maximum prefix length
from pipeline depth - Upgradeability memory bank updates involve only
partial remapping - CONS
- A stage can be simultaneously an entry point and
a transition stage for two distinct requests - Conflicts origination
- Scheduling mechanism required
- Possible efficiency degradation
9Trie splitting
Direct index table
- Define initial stride x
- Use a direct index table with 2x entries for
first x levels - Expand short prefixes to length x
- Map the sub-trees
Subtree 1
Subtree 3
Subtree 2
E.g. initial stride x2
10Dealing with conflicts
- Idea use a request queue in front of each stage
- Intuition without request queues,
- a request may wait till n cycles before entering
the pipeline - a waiting request causes all subsequent requests
to wait as well, even if not competing for the
same stages - Issue ordering
- Limited to requests with different entry stages
(addressed to different destinations) - An optional output reorder buffer can be used
11Pipeline Efficiency
- Metrics
- Pipeline utilization fraction of time the
pipeline is busy provided that there is a
continuous backlog of requests - Lookups per Cycle (LPC) average request
dispatching rate - Linear pipeline
- LPC1
- Pipeline utilization generally low
- Not uniform stage utilization
- CAMP pipeline
- High pipeline utilization
- Uniform stage utilization
- LPC close to 1
- Complete pipeline traversal for each request
- pipeline stages trie levels
- LPC gt 1
- Most requests dont make complete circles around
pipeline - pipeline stages gt trie levels
12Pipeline efficiency all stages traversed
- Setup
- 24 stages, all traversed by each packet
- Packet bursts sequences of packets to same entry
point - Results
- Long bursts result in high utilization and LPC
- For all burst size, enough queuing (32)
guarantees 0.8 LPC
13Pipeline efficiency LPC gt 1
- Setup
- 32 stages, rightmost 24 bits, tree-bit map of
stride 3 - Average prefix length 24
- Results
- LPC between 3 and 5
- Long bursts result in lower utilization and LPC
14Nodes-to-stages mapping
- Objectives
- Uniform distribution of nodes to stages
- Minimize the size of the biggest stage
- Correct operation of the circular pipeline
- Avoid multiple loops around pipeline
- Simplified update operation
- Avoid skipping levels
15Nodes-to-stages mapping (contd)
- Problem Formulation (constrained graph coloring)
- Given
- A list of sub-trees
- A list of colors represented by numbers
- Color nodes so that
- Every color is nearly equally used
- A monotonic ordering relationship without gaps
among colors is respected when traversing
sub-trees from root to leaves - Algorithm (min-max coloring heuristic)
- Color sub-trees in decreasing order of size
- At each steps
- Try all possible colors on root (the rest of the
sub-tree is colored consequentially) - Pick the local optimum
16Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1
Color 2 2
Color 3 4
Color 4 5
17Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1 2 5 3 2
Color 2 2 3 3 6 4
Color 3 4 6 5 5 8
Color 4 5 9 7 6 6
18Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1 2 5 3 2
Color 2 2 3 3 6 4
Color 3 4 6 5 5 8
Color 4 5 9 7 6 6
19Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 3 4 5 4 5
Color 2 6 8 7 8 7
Color 3 5 6 7 6 7
Color 4 6 8 7 8 7
20Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 3 4 5 4 5
Color 2 6 8 7 8 7
Color 3 5 6 7 6 7
Color 4 6 8 7 8 7
21Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 5
Color 2 7
Color 3 7
Color 4 7
22Evaluation settings
- Trends in BGP tables
- Increasing number of prefixes
- Most of prefixes are lt26 bit (24 bit) long
- Route updates can concentrate in short period of
time however, they rarely change the shape of
the trie - 50 BGP tables containing from 50K to 135K
prefixes
23Memory requirements
CAMP
Level based mapping
Height based mapping
- Balanced distribution across stages
- Reduced total memory requirements
- Memory overhead 2.4 w/ initial stride 8, 0.02
w/ initial stride 12, 0.01 w/ initial stride 16
24Updates
- Techniques for handling updates
- Single updates inserted as bubbles in the
pipeline - Rebalancing computed offline and involving only a
subset of tries - Scenario
- migration between different BGP tables
- imbalance leads to 4 increase in occupancy of
larger stage
25Summary
- Analysis of a circular pipeline architecture for
trie based IP lookup - Goals
- Minimize memory requirement
- Maximize pipeline utilization
- Handle updates efficiently
- Design
- Decoupling of stages from maximum prefix length
- LPC analysis
- Nodes to stages mapping heuristic
- Evaluation
- On real BGP tables
- Good memory utilization and ability to keep
40Gbps line rate through small memory banks
26 27Addressing the worst case
- Observations
- We addressed practical datasets
- Worst case tries may have long and skinny
sections difficult to split - Idea adaptive CAMP
- Split trie into parent and child subtries
- Map the parent sub-trie into pipeline
- Use more pipeline stages to mitigate effect of
multiple loops around pipeline