Title: Hazem Hamed, Adel ElAtawy, Ehab AlShaer
1Adaptive Statistical Optimization Techniques for
Firewall Packet Filtering
- Hazem Hamed, Adel El-Atawy, Ehab Al-Shaer
- School of Computer Science, Telecommunications
and - Information Systems,
- DePaul University, Chicago, USA
- FIRST MIDWEST SECURITY WORKSHOP
- May 6th, 2006
2Agenda
- Problems Motivation
- Technical Approach (our contributions)
- Early rejection optimization
- Statistical filtering tree optimization
- Evaluation
- Related Work
- Conclusion and Future Work
- Q A
3Problems and Motivations
4Problems and Motivation Statistical Network
Security Policy Optimization
- Packet filtering optimization is very important
for network security devices - Enterprise Firewalls have 5K-15K rules (as
reported by Cisco) - IDSs are expected to have in the order of 10K
(100K is not a surprise) - Most security devices still match rules
sequentially in this case the filtering cost
(matching/sec) - It is not hard to guess/craft traffic to hit the
default-deny rule ? causing maximal matching
overhead - Exploiting locality of matching in Internet
traffic
5Internet Traffic Analysis
- Several packet traces from U of Auckland (NLANR)
and DePaul U (sizes 3M to 10 M packets over
different times)
6Locality of Matching Property
- Skewness of a field value is an indication of the
high frequency of few values of a particular
field compared with the frequency of others
values in the traffic. Can be expressed as
follows
7Persistency of Locality of Matching
8Problems and Motivation Statistical Network
Security Policy Optimization
- The majority of the Internet traffic matches a
small subset of field values in firewall rules - This skewness is likely to stay for sufficient
time - Deterministic packet classification techniques
optimize for the worst case (upper bound) but not
necessarily the average case and mostly exhibits
high space complexity - Security policies are usually static and NOT
traffic-driven. - Our goal is to use a statistically adaptive
filtering technique to optimize the average case
(with much less space complexity)
9Challenges of Statistical Policies Optimization
- Can policies be adaptive to reject discarded
traffic with minimum matching ? - Can policy configuration adapts dynamically to
match traffic properties and minimize average
matching of accepted traffic ? - Does the traffic dynamics support adaptive
polices? - How can security devices (re)learn the traffic
trend at real-time ? - How can traffic rejected as early as possible
(Early Rejection) ? - How this is practical to implement/deploy?
10Optimizing the Rejection Path Early Rejection
11Early Rejection
- Firewall rules are often written as exception of
the default deny rule - Traffic rejected by default-deny cause the max
harm - Cost Rate PolicySize
- Objective (1) to create the minimum number of
Early Rejection Rules (RR) dynamically that has a
maximum discarding effect (covering Discard
space), (2) to make RR adaptive to the recent
discarded traffic (Dynamic rule selection) - The basic idea to add front-end most-effective
rejection rules such that the overall average
matching is decreased (with min affect on
accepted packets)
12Early Rejection
13Early Rejection
- Thus, the early rejection rules RR can be formed
as a combination of the common field values that
cover all rules in the policy. Formally, we can
define RR as follows - For example, a typical RR rule will be as follows
14Early Rejection Parameters
15Early Rejection Approach
- Constructing sets of RR to cover discard space
- Using set-cover approximation algorithms
- With approximation ration of 1ln(S)
- With f-approximation ratio (where f in our case
is 5) - This is an off-line operation based on policy
configuration - Dynamic Rule Selection
- To adaptively (1) selecting the most efficient RR
set, (2) removing/adding rules from RR, based on
the characteristics of the recently discarded
traffic. - Let portion of the traffic rejected by r RR's,
and is the maximum percentage of the early
rejected traffic. Then, in order to decrease the
average matching the following must hold
Rejected by RR
Rejected by def
Accepted (after RR)
16Early Rejection Criteria
- This leads to a criteria on the limit of number
rules that can be used - After adding the r RR rule, we can state the avg
matching as follows - Criteria for rule efficiency
- where is the portion of the total traffic
rejected by the rth rule - Discard Cover and Adaptive RR Rule Selection
Algorithms are in the paper
17Optimizing the Accept Path Statistical Tree
Matching
18Statistical Filtering Optimization
- Motivation
- Field value matching shows a non-uniform (skewed)
distribution that is reasonably persistence over
time - Basic Idea
- Building a statistical search tree using the
values of each field (according to frequency) - We use Alphabetic trees (vs. Huffman tress)
because it retains binary tree simplicity in
building and searching, and the inherent order is
preserved like binary search ? easy to search
based on value - The more the skewness, the greater the gain
- Average filtering time of all flows is reduced
- In the worst case (uniform distribution of all
the values), the Alphabetic tree can not be worse
than Binary tree very unlikely
19Statistical Filtering Optimization Using
Alphabetic Trees
3
2
1
20Search Aggregate in Alphabetic Tree Filtering
Aggregate matching tree structure for (a)
Cascaded matching, (b) Parallel matching.
- Complexity O(n lgn) for Construction, O(n) for
space in cascade - In parallel, the intersected rules selected
- Parallel tree might be faster but does not give
less matches than cascade
21Maintenance of Statistical Filtering
TreePerformance-triggered tree update
- Comparing the matching gain with binary search
gain - where qi is the prob of field value vi in the
current time interval and pi in the preceding one - Alternatively, lightweight calculating of the
moving average of the matching gain can be
used - where hi is the height of the destination leaf
and gi is the gain over binary search for packet
i. - If , the alphabet search tree is
disposed for this field and a new tree is built.
Computationally expensive
22Maintenance of Statistical Filtering
TreePeriodic tree update
- Tree is flushed out and reconstructed after
certain timeout - Needed to refresh the statistical tree when the
performance is in the low end close but greater
then threshold - Our experimental study shows that the optimal
interval can be easily learned and relatively
large (100s)
23Evaluation
- Internet traffic traces from NLANR and DePaul
University backbone - Anonymized Policies
- Random policy generation from traces
24Evaluation of Early Rejection Technique
41 close to optimal 50 if RR has no overhead
Selected
Early rejection (a) performance gain, (b) the
number of RR for three polices with varying
percentage of default rule traffic
25Optimization Effectiveness for Individual Fields
- The reduction of packet matching relative to
binary search for each - filtering field on the firewall (a) inbound
interface, (b) outbound - interface. (NOTE reduction relative to Binary
search)
26Optimization Effectiveness for Individual Fields
over 24 hours
- Relative matching reduction for each field for
- different times of day
27Alphabetic Filtering Tree Performance vs. Update
Interval over 24 hours
- Average optimal and measured relative matching
reduction with - varying update interval for (a) Cascaded search,
(b) Parallel search
28Frequency of Gain of Alphabetic Filtering Tree
Performance (for Cascade) over 24h
90 gives 40 or more reduction
80 gives 30 or more reduction
0.3
29Performance-trigger Update Intervals (during rush
hour)
400
300
1200
Relative matching reduction the source port
during one hour interval
30Related Work
- Early rejection- no related works
- Substantial work in the are of algorithmic
optimization of filters such as CAM-based,
Aggregated Bit Vector (ABV), Tuple space, Fat
Inverted segments trees, Recursive Flow
Classification (RFC), Geometric representation
and Hierarchical Cuttings - High space complexity
- Optimizes worst case and not necessary avg case
- Optimization through rules aggregation and
eliminating - Manual rule ordering based on Netflow by Cisco
- Gupta et al in SIGMETRICS05 shows that adaptive
trees significantly improving routing looks
limited on routing and does not address
measurement and statistical filtering over
multiple fields - Hamed Al-Shaer in ASIACCS 06 propose a
technique for Dynamic Rule Ordering based on
real-time traffic characteristics
31Conclusion
- Although optimizing packet filtering is an old
problem, we explore novel techniques and new
research directions in this area - Optimizing the rejection path (traffic rejected
by default rule) - Using statistical trees to improve average case
rather than the worst case like in most packet
classification algorithms - Early rejections
- Policy pre-processing to construct rules that
cover the discard space - Select the most efficient set of RR rules
dynamically to maximize benefit - Matching gain 19/25 50/75, and added RR
rules is 4-10 - Statistical Filtering Tree
- Using Alphabet trees (easy fast implementation,
low space complexity) - Cascade and parallel implementation
- Matching gain upto 45 in busy hours, with 200ms
update period - The implementation of both techniques is simple
and lightweight - Future Work Attacks, more comparison with DRO,
other opt
32Questions Answers
33Observations
- The majority of the internet traffic (gt70)
belongs to flows of size 10 or more (repeated 10
at least 10 times or more)- curve 2 - The majority of the internet flows (gt85) are
mice - curve 3 - From 1 and 2 (15 of flows represent 70 of size)
? the elephant is long lasting flows in the
internet - The majority of the traffic (gt70) last for 1
second or more
34Dynamic Optimal Rule-ordering (DRO)
- Problem Definition (1) For a policy of n
filtering rules with dependency relations, each
rule Ri with a given weight wi and order di, find
a valid rule ordering that minimizes average
packet matching - (2) How to find wi in real-time
Where wi is the rule weight based on hit rate
and di is the depth of the rule
35Performance-triggered tree updates
36Frequency of Gain of Alphabetic Filtering Tree
Performance
- Cumulative ratio of measurements greater than
different matching reduction - for (a) Cascaded search, (b) Parallel search
37Alphabetic Tree Performance
- Measured matching reduction for a full day
interval with different update intervals for - (a) Cascaded search
- (b) Parallel search