Title: Packet Classification On Multiple Fields
1Packet Classification On Multiple Fields
- Pankaj Gupta and Nick McKeown
- Computer Systems Laboratory,
- Stanford University
- pankaj,nickm_at_stanford.edu
2Why classify packets ?
- To determine which flow they belong to
- gt to decide what service they should receive
Router needs to identify the flow of every
incoming packet and then perform appropriate
special processing
3Special Processing Requires Identification of
Flows
- All packets of a flow obey a pre-defined rule and
are processed similarly by the router - Classification is based on an arbitrary number of
fields in the packet header - E.g. a flow (src-IP-address, dst-IP-address),
or a flow (dst-IP-prefix, protocol) etc.
4Network services
- Routing
- Access-control in firewalls
- Policy-based routing
- Provision of differentiated qualities of service
- Traffic billing
5What to determine?
- Forward or filter a packet?
- Where to forward it to?
- What class of service to receive?
- How much to charge for transpoting it?
6Packet Classifier
HEADER
Forwarding Engine
Action
Packet Classification
Classifier (policy database)
rules
Action
----
----
----
----
Incoming Packet
----
----
7Need for Differentiated Services
E1
Y
E2
Z
ISP3
NAP
ISP2
X
ISP1
8Table 2
Class
Relevant Packet Fields
Source Link-layer Address,Source Transport port
number
Email and from ISP2
Source Link-layer Address
From ISP2
From ISP3 and going to E2
Source Link-layer Address Destination
Network-Layer Address
All other packets
---------
9Packet Classification Problem Definition
- Given a classifier C with N rules, Rj, 1 ? j ? N,
where Rj consists of three entities - A regular expression Rji, 1 ? i ? d, on each of
the d header fields, - A number, pri(Rj), indicating the priority of the
rule in the classifier, and - An action, referred to as action(Rj).
For an incoming packet P with the header
considered as a d-tuple of points (P1, P2, ,
Pd), the d-dimensional packet classification
problem is to find the rule Rm with the highest
priority among all the rules Rj matching the
d-tuple i.e., pri(Rm) gt pri(Rj), ? j ? m, 1 ? j
? N, such that Pi matches Rji, 1 ? i ? d. We
call rule Rm the best matching rule for packet P.
10Classification is a Generalization of Lookup
- Classifier routing table
- One-dimension (destination address)
- Rule routing table entry
- Regular expression prefix
- Action (next-hop-address, port)
- Priority prefix-length
11Example 4D classifier
152.163.198.4/255.255.255.255
152.163.36.0/255.255.255.255
tcp
R6
Gt 1023
Permit
12Example Classification Results
13General characteristics of Classifiers
- Number of rules
- not a large number
- 0.7 more than 1000 mean of 50 rules
- Number of fields
- max of 8 fields src/dst network
layeraddress - src/dst
transport layer port numbers -
type-of-service fieldTOS - protocol
field -
transport-layer protocol flags - 17 of rules 1 field , 23 3 fields ,
60 4 fields -
-
14General characteristics of Classifiers (contd.)
- Transport-layer protocol field
- TCP,UDP,ICMP,IGMP,(E)IGRP,GRE,IPINIP or
- Transport-layer field specification
- 10.2 have range specification
- Rules with non-contiguous mask
- 14 of classifiers have 10.2 of all rules
- Many different rules in the same classifier share
a number of field specification - Redundant rules
- 8 of rules in classifiers
- 4.4 of rules are
backward redundant - 3.6 of rules are
forward redundant
15Goals
- The algorithm should
- Be fast enough to operate at OC48c linerates
- and preferably at OC192c linerates
- Allow matching on arbitrary fields
- Support general classification rules
- prefixes,operators,wildcards
- Be suitable for implementation in both software
and hardware - Not have expensive memory requirements
- Scale in terms of both memory and speed with the
size of the classifier
16Previous work
- simplest classification algorithm
- evaluating rules
sequentially - simple and efficient in its use of memory
- poor scaling properties
- time grows linearly with the number of
rules
17Classification with Ternary-CAMs
TCAM
Memory array
0
0
1
1
2
0
3
0
Packet Header
Priority
The first matching rule
encoder
M
1
Too expensive,too small,and consume too much
power for large classifiers
18Structure of the Classifiers
4 regions
R3
R2
R1
A classification algorithm must keep a record of
each region and be able to determine the region
to which each newly arriving packet belongs
19Structure of the Classifiers
7 regions
R3
R2
R1
The more region the classifier contains,the more
storage is required and the longer it takes to
classify a packet
20Algorithm
- Packet Classification problem
- S bits in the packet header gt T bits of classID
- T log N N is number of classifier rules
- A simple and fast way of doing this mapping
- pre-compute the value of classID for each of
the - 2S different packet headers
-
- Yield the answer in one step in one memory
access - Require too much memory
-
-
21Recursive Flow Classificationperform the same
mapping but over several stages
One-step
2S 2128
2T 212
22Recursive Flow Classification
- Consists of P phases
- each with a set of parallel memory lookups
- Each lookup is a reduction
- the value returned by the memory lookup is
- shorter than the index of the memory
access
23Chunking of a Packet
Used to index into multiple memories in parallel
Chunk 0
Source L3 Address
Destination L3 Address
L4 protocol and flags
Source L4 port
Destination L4 port
Type of Service
Chunk 7
Packet Header
24Packet Flow
eqID
index
action
Header
Phase 0
Phase 1
Phase 2
Phase 3
25Example 4D classifier
152.163.198.4/255.255.255.255
152.163.36.0/255.255.255.255
tcp
R6
Gt 1023
Permit
26- In phase 0
- chunk6
- 1.www80 2.20,21 3.gt1023
4.remaining numbers - can be encoded by 00b to 11b eqIDs
reduction 16 to 2 bits - chunk4
- 1.tcp 2.udp 3.remaining numbers
- can be encoded by 2 bits reduction
8 to 2 bits - In phase 1
- CESs .(80,udp)
2.(20-21,udp) 3.(80,tcp)
4.(gt 1023,tcp) 5.all
remaining crossproducts - concatenating reduction 4
to 3 bits - can be encoded by 3 bits total
reduction 24 to 3 bits
27RFC preprocessing for chunk j of phase 0
- For each rule rl in the classifier
- project ith component of rl onto the number
line (from 0 to 2b-1) making the - start and end points of each of its
constituent intervals - End for
- Bmp 0
- For n in 02b-1
- If(any rule starts or ends at n)
- update bmp
- if(bmp not seen earlier)
- eq new_Equivalence_Class( )
- eq -gt cbm bmp
- end if
- End if
- Else eq the equovalence class whose cbm is bmp
- table_0_jn eq-gtID
- End for
-
-
28RFC preprocessing for chunk i of phase j(jgt0)
- Index 0
- listEqs nil
- For each CES,c1eq,of chunk c1
- For each CES,c2eq,of chunk c2
-
- For each CES,cmeq,of chunk cm
- intersectedBmp c1eq-gtcbm c2eq-gtcbm
cmeq-gtcbm - neweq searchList(listEqs,intersectedBmp)
- if(not found in listEqs)
- neweq new_Equivalence_Class( )
- neweq-gtcbm bmp
- add neweq to listEqs
- end if
- table_j_iindex neweq-gtID
- index
- End for
29Performance of RFC
- 1.number of phases P
- we combine those chunks together which have
- the most correlation
- 2.the reduction tree used
- we combine as many chunks as we can without
- causing unreasonable memory consumpsion
30Choice of Reduction Tree
Tree_B
Tree-A
0
0
1
1
2
2
ClassID
ClassID
3
3
4
4
5
5
Number of phases P 3 10 memory accesses
31Choice of reduction tree
Tree_A
Tree_B
0
0
1
1
2
2
ClassID
ClassID
3
3
4
4
5
5
Number of phases P 4 11 memory accesses
32RFC lookup in Hardware
Phase 1
Phase 0
Chks0 and 1 replicated
SRAM1
chk0 chk1 chk0 chk1
Chks0-2
SDRAM1
Chk0
Chk0 (replicated)
Chks3-5
SRAM2
SDRAM2
Phase 2
Clk 125MHZ gt 31.25
million packets per second
33RFC lookup in software
- 30 lines of code in C
- compiled on a 333Mhz PentiumII PC running
windows NT - worst case path for the code took
(140clks9tm) for three phases and (146clks11tm)
for four phases - tm memory access time 60
ns - gt 0.98us for 3 phase 1.1us for 4 phases
- close to one million packets per second
- the average lookup time is 50 faster than the
worst case
34RFc lookup operation
- For(each chunk,chknum of phase 0)
- eqNums0chkNum contents of appropriate
rfctable at memory address pktFieldschkNum - For(phaseNum1numphases-1)
- For(each chunk,chkNum,in Phase phaseNum)
- chd parent descriptor of
(phaseNum,chknum) - Index eqNumsphaseNum of
chkParents0chkNum ofchkParents0 - For(I1chd-gtnumChkParents-1)
- index index (total equivIDs of
chd-gtchkParentsI) - eqNumsphaseNum of chd-gtchkParentsI
-
chkNum of chd-gtchkParentsI - End for
- eqNumsphaseNumchkNum contents of
appropriate rfctable at address index - End for
- Return eqNums0numphases-1
35Table 6
Src L3 31..16
Src L3 15..0
Dst L3 15..0
Dstn L4 16 bits
Dst L3 31..16
L4 protocol 8 bits
Action
0.77/1
0.0/0.0
0
0.83/1..
4.6/1
udp
permit
0.0/0.0
4.6/1
1.0/255.0
1
0.83/1..
udp
20-30
permit
21
0.0/1
2
0.83/1..
0.0/0.0
0.77/1
permit
3
0.0/0.0
0.0/0.0
0.0/1
21
deny
0.0/0.0
4
0.0/0.0
0.0/0.0
0.0/1
0.0/0.0
permit
36Variations and improvements of RFC
- 1.RFC can be extended to process a larger number
of fields in each packet header - 2.speed up RFC by taking advantage of available
fast lookup algorithms - 3.employ adjacency groups technique
- to reduce the memory requirements when
- processing large
classifiers
37Adjacency Groups
- Size of the RFC table number of CES s
- R S are adjacent in dimension I if
- 1.they have the same action
- 2.all but the ith field have the exact same
- specification in the two
rules - 3.all rules appearing between them have
either - the same action or are disjoint from
R - two rules are simple adjacent if they are
adjacent in -
some dimension - SO we will merge adjacent rules
38Example of adjacency groups
R(a1,b1,c1,d1) S(a1,b1,c2,d1) T(a2,b1,c2,d1) U(a2,
b1,c1,d1) V(a1,b1,c4,d2) W(a1,b1,c3,d2) X(a2,b1,c3
,d2) Y(a2,b1,c4,d2)
RS(a1,b1,c1c2,d1) TU(a2,b1,c1c2,d1) VW(a1,b1,c3
c4,d2) XY(a2,b1,c3c4,d2)
Merge along Dimension 3
Merge along Dimension 1
RSTU(a1a2,b1,c1c2,d1) VWXY(a1a2,b1,c3c4,d2)
Carry out an RFC phase Assume chunks 1 2 are
combined And also chunks 3 4 are combined
RSTU(m1,n1) VWXY(m1,n2))
RSTUVWXY(m1,n1n2)
Merge
Continue with RFC
39RFC Pros and Cons
- Advantages
- Suitable for multiple fields
- Supports non-contiguous masks
- Fast accesses
- Disadvantages
- Large pre-processing time
- Incremental updates slow
- Large worst-case storage requirements