CAMP: Fast and Efficient IP Lookup Architecture - PowerPoint PPT Presentation

About This Presentation

Title:

CAMP: Fast and Efficient IP Lookup Architecture

Description:

CAMP: Fast and Efficient IP Lookup Architecture. Sailesh Kumar, Michela Becchi, ... Trie based IP lookup. Circular pipeline architectures. Michela Becchi - * Context ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 28

Provided by: michela

Learn more at: https://www.cs.wustl.edu

Category:

more less

Transcript and Presenter's Notes

Title: CAMP: Fast and Efficient IP Lookup Architecture

1
CAMP Fast and Efficient IP Lookup Architecture

Sailesh Kumar, Michela Becchi,
Patrick Crowley, Jonathan Turner
Washington University in St. Louis

2
Context

Trie based IP lookup
Circular pipeline architectures

3
Context

Trie based IP lookup
Circular pipeline architectures

IP address 111010
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
4
Context

Trie based IP lookup
Circular pipeline architectures

IP address 111010
Trie
Prefix dataset
P1
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
P7
P6
P5
P2
P8
P3
P4
5
Context

Trie based IP lookup
Circular pipeline architectures

IP address 111010
Trie
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
Stage 1
Stage 2
Stage 3
Stage 4
6
Context

Trie based IP lookup
Circular pipeline architectures

IP address 111010
Trie
Circular pipeline
Prefix dataset
0 P1
000 P2
0010 P3
0011 P4
011 P5
10 P6
11 P7
110 P8
Stage 1
Stage 2
Stage 3
Stage 4
7
CAMP Circular Adaptive and Monotonic Pipeline

Problems
Optimize global memory requirement
Avoid bottleneck stages
Make the per stage utilization uniform
Idea
Exploit a Circular pipeline
Each stage can be a potential entry-exit point
Possible wrap-around
Split the trie into sub-trees and map each of
them independently to the pipeline

8
CAMP (contd)

Implications
PROS
Flexibility decoupling of maximum prefix length
from pipeline depth
Upgradeability memory bank updates involve only
partial remapping
CONS
A stage can be simultaneously an entry point and
a transition stage for two distinct requests
Conflicts origination
Scheduling mechanism required
Possible efficiency degradation

9
Trie splitting
Direct index table

Define initial stride x
Use a direct index table with 2x entries for
first x levels
Expand short prefixes to length x
Map the sub-trees

Subtree 1
Subtree 3
Subtree 2
E.g. initial stride x2
10
Dealing with conflicts

Idea use a request queue in front of each stage
Intuition without request queues,
a request may wait till n cycles before entering
the pipeline
a waiting request causes all subsequent requests
to wait as well, even if not competing for the
same stages
Issue ordering
Limited to requests with different entry stages
(addressed to different destinations)
An optional output reorder buffer can be used

11
Pipeline Efficiency

Metrics
Pipeline utilization fraction of time the
pipeline is busy provided that there is a
continuous backlog of requests
Lookups per Cycle (LPC) average request
dispatching rate
Linear pipeline
LPC1
Pipeline utilization generally low
Not uniform stage utilization
CAMP pipeline
High pipeline utilization
Uniform stage utilization
LPC close to 1
Complete pipeline traversal for each request
pipeline stages trie levels
LPC gt 1
Most requests dont make complete circles around
pipeline
pipeline stages gt trie levels

12
Pipeline efficiency all stages traversed

Setup
24 stages, all traversed by each packet
Packet bursts sequences of packets to same entry
point
Results
Long bursts result in high utilization and LPC
For all burst size, enough queuing (32)
guarantees 0.8 LPC

13
Pipeline efficiency LPC gt 1

Setup
32 stages, rightmost 24 bits, tree-bit map of
stride 3
Average prefix length 24
Results
LPC between 3 and 5
Long bursts result in lower utilization and LPC

14
Nodes-to-stages mapping

Objectives
Uniform distribution of nodes to stages
Minimize the size of the biggest stage
Correct operation of the circular pipeline
Avoid multiple loops around pipeline
Simplified update operation
Avoid skipping levels

15
Nodes-to-stages mapping (contd)

Problem Formulation (constrained graph coloring)
Given
A list of sub-trees
A list of colors represented by numbers
Color nodes so that
Every color is nearly equally used
A monotonic ordering relationship without gaps
among colors is respected when traversing
sub-trees from root to leaves
Algorithm (min-max coloring heuristic)
Color sub-trees in decreasing order of size
At each steps
Try all possible colors on root (the rest of the
sub-tree is colored consequentially)
Pick the local optimum

16
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1
Color 2 2
Color 3 4
Color 4 5
17
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1 2 5 3 2
Color 2 2 3 3 6 4
Color 3 4 6 5 5 8
Color 4 5 9 7 6 6
18
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 1 2 5 3 2
Color 2 2 3 3 6 4
Color 3 4 6 5 5 8
Color 4 5 9 7 6 6
19
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 3 4 5 4 5
Color 2 6 8 7 8 7
Color 3 5 6 7 6 7
Color 4 6 8 7 8 7
20
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 3 4 5 4 5
Color 2 6 8 7 8 7
Color 3 5 6 7 6 7
Color 4 6 8 7 8 7
21
Min-max coloring heuristic - example
T4
T3
T2
T1
Present coloring If 1 on new root If 2 on new root If 3 on new root If 4 on new root
Color 1 5
Color 2 7
Color 3 7
Color 4 7
22
Evaluation settings

Trends in BGP tables
Increasing number of prefixes
Most of prefixes are lt26 bit (24 bit) long
Route updates can concentrate in short period of
time however, they rarely change the shape of
the trie
50 BGP tables containing from 50K to 135K
prefixes

23
Memory requirements
CAMP
Level based mapping
Height based mapping

Balanced distribution across stages
Reduced total memory requirements
Memory overhead 2.4 w/ initial stride 8, 0.02
w/ initial stride 12, 0.01 w/ initial stride 16

24
Updates

Techniques for handling updates
Single updates inserted as bubbles in the
pipeline
Rebalancing computed offline and involving only a
subset of tries
Scenario
migration between different BGP tables
imbalance leads to 4 increase in occupancy of
larger stage

25
Summary

Analysis of a circular pipeline architecture for
trie based IP lookup
Goals
Minimize memory requirement
Maximize pipeline utilization
Handle updates efficiently
Design
Decoupling of stages from maximum prefix length
LPC analysis
Nodes to stages mapping heuristic
Evaluation
On real BGP tables
Good memory utilization and ability to keep
40Gbps line rate through small memory banks