Title: Balancing Risk and Utility in Flow Trace Anonymization
1Balancing Risk and Utility in Flow Trace
Anonymization
- Martin Burkhart, ETH Zurichburkhart_at_tik.ee.ethz.c
h
Joint work with Daniela Brauckhoff, Elisa Boschi,
Martin May
2Motivation
- Sharing of traffic measurements is crucial
- Only a limited set of sources available
- Reproducibility of results
- Dynamics / variability of traffic
- Get the big picture (e.g. Internet Storm Center)
- Keep up with globalized attacks (e.g. botnets)
- More and more traces are collected but not shared
- Data protection legislation
- Security concerns
- Competitive advantage
3State-Of-The-Art Anonymization
- Black Marking
- Truncation
- E.g. last bits of IP addresses
- Permutation
- Random
- (Partial) Prefix-preserving IP address
permutation - Enumeration
- E.g. Timestamps keep the logical order of events
- Categorization
- Randomization (data mining community)
- K-Anonymity (data mining community)
4The Tradeoff in Anonymization
- Its a trade-off
- RU-Maps
- t Anony. Strength
- X-Axis Utility(t)
- Y-Axis Risk(t)
- Not quantitatively studied, lack of metrics
- Strongly dependent on the application / attacker
model
Risk(t)
Algorithm X
X t0.1
X t0.2
X t0.4
X t0.7
Sweet Spot
Utility(t)
5A Case Study IP Address Truncation
- Techniques that permute IP addresses 11 are
reversible - Characteristic object sizes/frequencies,
behavioral profiling, fingerprint active ports,
exploit prefix structure - Apply IP address truncation and evaluate the risk
and utility dimensions - Lower risk Hosts are aggregated to subnets
- Lower utility Resolution of entities is reduced
- Quantifying the tradeoff How bad is it in
numbers?
IP address 8 bits trunc. 16 bits trunc.
123.45.67.89 123.45.67.0 123.45.0.0
123.45.67.123 123.45.67.0 123.45.0.0
123.45.12.34 123.45.12.0 123.45.0.0
6Internal vs. External Prefixes
- Asymmetry in prefixes
- external
- Internal (AS 559)
- Is this reflected in
- Risk reduction?
- Utility reduction?
Unique Count (log)
Prefix length (32-x)
7Measuring Utility of Truncated Data
- Specific application anomaly detection
- Compare detection quality of scans and (D)DoS
attacks in original and truncated data - Two IP-based metrics
- Unique address count
- Address entropy
- 3 weeks of NetFlow data
- 43 billion flows
- SWITCH network
8Measuring Detection Quality
- Ground truth Manual identification of
scans/(D)DoS attacks - Run a Kalman filter on metric timeseries
- Utility measured by AUC (area under the ROC curve)
Vary threshold
9Utility of Truncated Data
- Internal metrics degrade faster than external
metrics - Counts degrade faster than Entropy
10Approximating Risk of Host Identification
- In general Truncation of x bits leads to
- 2(32-x) prefixes with 2x addresses per prefix
- But only a fraction (A) of potential addresses
is usually active - Hence, On average A2x addresses per prefix
1, 2, 3, ...10, 11, 12, ... 240, 241, ...254,
255
129.130.80.
e.g. A 10
11Risk of Truncated Data
(total 2.2 million)
(total 4.3 billion)
- Risk for external addresses is higher due to
sparcity! - Constant offset
12The Risk-Utility Tradeoff
No truncation
4 bits
8 bits
12 bits
16 bits
best tradeoff
Metric x Utility Risk
internal entropy 8 0.94 0.035
internal entropy 12 0.87 0.002
external entropy 16 0.97 0.02
13Conclusion
- We made a quantitative evaluation of the
risk-utility tradeoff in anonymization - Entropy is much more resistant to truncation than
unique counts - Risk and utility degrade faster for internal
addresses - For detection of scans and (D)DoS attacks, it is
possible to get a good tradeoff with high utility
and low risk
14Thank You for the Attention