29 March 20141 - PowerPoint PPT Presentation

About This Presentation
Title:

29 March 20141

Description:

Supported by NSF ANI-9813723, DARPA N660001-01-1-8930. Applied Research Laboratory ... Suppose you are a firewall, or QoS router, or network monitor ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 29
Provided by: edwardwsp
Category:
Tags: ani | march

less

Transcript and Presenter's Notes

Title: 29 March 20141


1
Packet Classification usingExtended TCAMs
  • Edward W. Spitznagel, Jonathan S. Turner, David
    E. Taylor
  • Supported by NSF ANI-9813723, DARPA
    N660001-01-1-8930

2
Packet Classification Problem
  • Suppose you are a firewall, or QoS router, or
    network monitor ...
  • You are given a list of rules (filters) to
    determine how to process incoming packets, based
    on the packet header fields
  • Some fields in the rules are specified with bit
    masks others with ranges
  • Goal when a packet arrives, find the first rule
    that matches the packets header fields

3
Packet Classification Problem
  • Example packet arrives with header (0101, 0010,
    3, 5, UDP)
  • classification result filter b is matched
  • filter c also matches, but, b occurs before c in
    the list
  • Easy to do when we have only a few rules very
    difficult when we have 100,000 rules and packets
    arrive at 40 Gb/s

4
Geometric Representation
  • Filters with K fields can be represented
    geometrically in K dimensions
  • Example

b
c
c
c
c
a
5
Related Work
  • TCAM-based parallel classification
  • CoolCAMs (Narlikar, Basu, Zane) for IP lookup
  • SRAM-based sequential classification
  • Recursive Flow Classification (Gupta, McKeown)
  • HiCuts (Gupta, McKeown)
  • Extended Grid of Tries (Baboescu, Singh,
    Varghese)
  • HyperCuts (Singh, Baboescu, Varghese, Wang)
  • SRAM 6 transistors per bit (vs. 16 for TCAM),
    but the SRAM approaches use more bits per filter

6
Ternary CAMs
  • Most popular practical approach to
    high-performance packet classification
  • Hardware compares query word (packet header) to
    all stored words (filters) in parallel
  • each bit of a stored word can be 0, 1, or X
    (dont care)
  • Very fast, but not without drawbacks
  • High power consumption limits scalability
  • inefficient representation of ranges

7
Ternary CAM - Example
Entry 0 (filter a) is the first matching filter
8
Range Matching in TCAMs
  • Convert ranges intosets of prefixes
  • 1-4 becomes 001, 01, and 100
  • 3-5 becomes 011 and 10

F
9
Range Matching in TCAMs
b
c
a
e
f
d
  • With two 16-bit range fields,a single rule could
    require upto 900 TCAM entries!
  • Typical case entire filter setexpands by a
    factor of 2 to 6

10
Extended TCAMs
  • Extend standard TCAM architecture to enable
    classification with larger rulesets
  • Partitioned TCAM, for reduced power
  • inspired by CoolCAMs
  • differences in indexing, search and partitioning
    algorithms
  • Support range matching directly in hardware

11
Use of Partitioned TCAM
  • Main component of power use in TCAM search is
    proportional to number of entries searched
  • Partitioning the TCAM
  • divide TCAM into blocks of entries
  • each block is enabled for search via an
    associated index filter

12
Use of Partitioned TCAM
  • Example suppose we are given the following
    filters

a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxx1 d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 1x1x j. 13-14, 11xx k. 11-15, 1
11x
A real Extended TCAM would have more blocks, and
more filters per block.
13
Use of Partitioned TCAM
  • Example classify packet with header values (2,
    1010)
  • index block second andfourth filters match
  • search second and fourthfilter blocks
  • find matching filters(1-2, 1x1x) and (0-14,
    1010)

filter blocks
index filters
14
Use of Partitioned TCAM
  • The key to minimizing power consumption
    Organize filters so that only a few TCAM blocks
    must be searched to find the filters matching a
    packet.
  • Use a filter grouping algorithm

filter blocks
index filters
15
a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 1
11x
29 March 2014 15
16
a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 1
11x
29 March 2014 16
17
a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 1
11x
29 March 2014 17
18
a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 1
11x
0-6, 1xxx
h, i
7-15, 1xxx
g, j, k
Next phase
29 March 2014 18
19
a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14,
011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h.
0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 1
11x
0-6, 1xxx
h, i
7-15, 1xxx
g, j, k
Next phase
29 March 2014 19
20
Creating a set of partitions
  • At most k filters per region (k block size)
  • Regions within the same partition do not overlap
  • Total number of regions equals the index size

21
Range Matching
  • Store a pair of values (lo , hi ) for each range
    match field
  • Range check circuitry compares query values
    against lo and hi to determine if query is in
    range
  • Transistors per bit of range field is twice that
    of ordinary TCAM
  • But, for typical IPv4 applications, this results
    in just a 22 increase in overall transistor count

22
Performance Metrics
  • Power Fraction
  • a measure of power usage, relative to a standard
    TCAM
  • smaller is better
  • Storage Efficiency
  • higher is better 1 is optimal

index size ( of partitions)(block size)
number of filters
number of filters
index size ( of blocks)(block size)
23
Different Block Sizes
Block size128
Block size256
Block size64
Block size 32
Block size16
24
Results Power Fraction
Basic Algorithm
Refined
Blocksize 256
Block size 32
Block size 64
Block size 128
25
Results Storage Efficiency
Refined
Basic Algorithm
Blocksize 256
Block size 32
Block size 64
Block size 128
26
Current/Future Work
  • Computational complexity of filter grouping
    problem
  • Filter updates (add/delete operations)
  • Multi-level indices
  • Different partitioning algorithms
  • Application to SRAM/DRAM-based classification
    techniques

27
Summary
  • Packet Classification is important for many
    advanced network services
  • TCAMs scale poorly due to power consumption and
    inefficient range match representations
  • Extended TCAMs solve these issues by using
    partitioned TCAM and hardware support for range
    matching
  • power consumption greatly reduced (typically to
    5 or less of power used by a standard TCAM)
  • range match hardware avoid inefficiency in
    representing ranges

28
Questions?
?
Write a Comment
User Comments (0)
About PowerShow.com