Title: Algorithms for Advanced Packet Classification with TCAMs
1Algorithms for Advanced Packet Classification
with TCAMs
- Karthik Lakshminarayanan
- UC Berkeley
- Joint work with
- Anand Rangarajan and Srinivasan Venkatachary
- (Cypress Semiconductors Inc.)
2Packet Processing Environment
- Packet matches a set of rules based on the header
- Examples routers, intrusion detection systems
3Ternary Content Addressable Memory
- Memory device with fixed width arrays
- Each bit is 0, 1 or x (dont care)
- Search is performed against all entries in
parallel and the first result is returned
TCAM
row1
row2
Output is row2
rown
4TCAM Benefits and Disadvantages
- Benefits
- Deterministic Search ThroughputO(1) search
- Disadvantages
- Cost
- Power consumption
- Current TCAM usage
- 6 million TCAM devices deployed
- Used in multi-gigabit systems that have O(10,000)
rules - TCAMs can support a table of size 128K ternary
entries and 133 million searches per second for
144-bit keys
5Problems
- Range Representation Problem
- Multimatch Classification Problem
Algorithms in software easier to implement
6Problems
- Range Representation Problem
- Multimatch Classification Problem
7Range Representation Problem
- Representing prefixes in ternary is trivial
- IP address prefixes present in rules
- Representing arbitrary ranges is not easy though
- port fields might contain ranges
- e.g. some security applications may allow ports
1024-65535 only
8Earlier Approaches I
- Prefix expansion of ranges
- express ranges as a union of prefixes
- have a separate TCAM entry for each prefix
- expansion the number of entries a rules expands
to - Example the range 3,12 over a 4-bit field
would expand to - 0011 (3), 01xx (4-7), 10xx (8-11) and 1100 (12)
- Worst-case expansion for a W-bit field is 2W-2
- example 1,14 would expand to 0001, 001x, 01xx,
10xx, 110x, 1110 - 16-bit port field expands to 30 entries
9Earlier Approaches II
- Database-dependent encoding
- observation TCAM array has some unused bits
- use these additional bits to encode commonly
occurring ranges in the database - TCAMs with IP ACLs have 36 extra bits
- 144-bit wide TCAMs
- 104-bits 4-bits for IP ACL rules
10Earlier Approaches II
- Database-dependent encoding
- observation TCAM array has some unused bits
- use these additional bits to encode commonly
occurring ranges in the database - Example
Address Port 12.123.0.0/16
20-24 32.12.13.0/24 1024- 128.0.0.0/8
20-24
Set extra bit to 1
Set extra bit to x
Set extra bit to 1
If search key falls in 20-24, set extra bit to 1,
else set it to 0
11Earlier Approaches II
- Database-dependent encoding
- observation TCAM array has some unused bits
- use these additional bits to encode commonly
occurring ranges in the database - Improved version Region-based Range Encoding
- Disadvantages
- database dependent ? incremental update is hard
12Database-Independent Range Pre-Encoding
- Key insight use additional bits in a database
independent way - wider representation of ranges
- reduce expansion in the worst-case
13Database-Independent Range Pre-Encoding
- Fence encoding (W bits)
- total of 2W-1 bits
- encoding of i has i ones preceded by 2W-i-1 zeros
- e.g. W3, f(0) 0000000, f(2) 0000011
- With 2W-1 bits, fence encoding achieves an
expansion of 1
Theorem For achieving a worst-case row
expansion of 1 for a W-bit range, 2W-1 bits are
necessary
14Database-Independent Range Pre-Encoding
- Procedure
- split W-bit field into multiple chunks
- encode each chunk using fence encoding
- combine the chunks to form ternary entries
k1 bits
k0 bits
k2 bits
W bits
Combining chunks analogous to multi-bit tries
15Unibit view of DIRPE (Prefix expansion)
- W3 divided into 3 one-bit chunks
- R1,6prefixes 001,01x,10x,110
- Each level can contribute to at most 2 prefixes
(but the top level)
xxx
0xx
1xx
00x
01x
11x
10x
000
001
010
011
100
101
110
111
16Multi-bit view of DIRPE
- Legend
- 8-bit field (W8)
- k02, k13, k23
- R11,54
- 013, 066
- split chunk 1
Worst case expansion 2W/k 1 Number of
extra bits needed (2k-1)W/k - W
17Comparison of Expansion
DIRPE DB-dependent ? Net expansion was 1.12
18Region-based Encoding (with r regions)
DIRPE (with k-bit chunks)
Prefix Expansion
DIRPE Region-based
Metric
(2k-1) log2r
F(
W(2k-1)
2n-1
F(
k
Extra bits
0
F(log2r
)
- W)
2n-1
k
)
r
r
Worst-case capacity degradation
2log2r
2W
)F
)F
(
- 1
(
(2W-2)F
(2log2r)F
k
k
Cost of an incremental update
W
)
)F
O((
O(N)
O(WF)
O(N)
k
Pre-computed table of size
Both pieces of logic from previous two columns
W.2k
Overhead on the packet processor
)
O(
2n-1
) F.2W)
O((log2r
k
None
r
logic gates
( or ) O(nF) comparators of width W bits
19DIRPE Summary
- Database independent
- Scales well for large databases
- Good incremental update properties
- Additional bits needed
- Small logic needed for modifying search key
20Problems
- Range Expansion Problem
- Multimatch Classification Problem
21Multimatch Classification Problem
- TCAM search primitive return first matching
entry for a key - Multimatch requirement return k matches (or all
matches) for a key - security applications where all signatures that
match this packet need to be found - accounting applications where counters have to be
updated for all matching entries
22Earlier Approaches
- Entry Invalidation scheme
- maintain state of multimatch using an additional
bit in TCAM called valid bit
TCAM array
x
00100x1x001110x0x
Search key
x
01110xxx001100xxx
0
011101xx001100x10
1
valid bit
1111101x1101000xx
x
valid bit
23Earlier Approaches
- Entry Invalidation scheme
- maintain state of multimatch using an additional
bit in TCAM called valid bit
- Disadvantage
- ill-suited for multi-threaded environments
- need one bit for each thread
24Earlier Approaches
- Geometric intersection scheme
- construct geometric intersection (cross-products)
of the fields and place in TCAM - pre-processing step is expensive
- search is fast
- Disadvantage
- does not scale well in capacity
- for router dataset expansion of 25100
25Multimatch Using Discriminators (MUD)
- Observation after index j is matched, the ACL
has to be searched for all indices gtj - Basic idea
- store a discriminator field with each row that
encodes the index of the row - to search rows with index gtj, the search key is
expanded to prefixes that correspond to gtj - multiple searches are then issued
26MUD Example
TCAM array
rule0
0000
001x 01xx 1xxx
Search key
0001
rule1
xxxx
011101xx00
0010
rule2
discriminator
discriminator field
27Geometric Intersection-based
Entry Invalidation
MUD
Metric
Multi-threading support
Yes
Yes
No
Worst-case TCAM entries for N rules
N
N
O(NF)
O(N)
Update cost
O(NF)
O(N)
1 d (d-1)(k-2)
Cycles for k multi-matches
7k
k
with DIRPE 1 d(k-1)
r
without DIRPE d
Extra bits
0
0
with DIRPE log2(d/r) (d-r) (2r-1)
Small state machine logic can be implemented
using a few hundred gates or a few
microcode instructions
Small state machine logic can be implemented
using a few hundred gates or a few
microcode instructions
Overhead on the packet processor
None
28MUD Summary
- No per-search statemulti-threaded
- Incremental updates fast
- Scales well to large databases
- Additional bits needed
- Extra search cycles
- Can still support Gbps speeds
29Conclusion
- Range expansion problem proposed DIRPE, a
database independent encoding - scales to large number of ranges
- good incremental update properties
- Multimatch classification problem MUD
- suitable for multithreaded environments
- scales to large databases
- No change to hardware ? easy to deploy