Title: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications
1Reverse Hashing for High-speed Network
Monitoring Algorithms, Evaluation, and
Applications
- Robert Schweller1, Zhichun Li1, Yan Chen1, Yan
Gao1, Ashish Gupta1, Yin Zhang2, Peter Dinda1,
Ming-Yang Kao1, Gokhan Memik1
1 Lab for Internet and Security Technology
(LIST), Northwestern Univ. 2 University of
Texas at Austin
2The Spread of Sapphire/Slammer Worms
3Motivation (online change detection)
- Online network anomaly/intrusion detection over
high speed links - Small memory usage
- Small of memory access per packet
- Scalable to large key space size
- Primitives for online anomaly detection
- Heavy hitters (lots of prior work)
- Heavy changes enabler for aggregate queries over
multiple data streams - Asymmetric routing demands spatial aggregation
- Time Series Analysis (TSA) need temporal
aggregation
4Outline
- Background on k-ary sketch
- Reversible sketch problem
- Modular hashing
- IP mangling
- Reverse hashing
- Evaluation
- Conclusion
5K-ary sketch
Krishnamurthy, Sen, Zhang, Chen, 2003 First to
detect flow-level heavy changes in massive data
streams at network traffic speeds
6k-ary sketch
Krishnamurthy, Sen, Zhang, Chen, 2003
APIs
Update (k, u) Tj hj(k) u (for all j)
SCOMBINE(a,S1,b,S2)
7Reverse Sketch Problem
- Main problem
- Cannot efficiently report keys with heavy
changeINFERENCE(S,t) - Important function for anomaly detection!
- Our Contribution
- Determine set of keys that have large estimates
in a sketch
8Reversible sketch framework
value stored
value
Streaming data recording
reversible k-ary sketch
Modular hashing
IP mangling
key
Heavy change detection
change threshold
reversible k-ary sketch
heavy change keys
Reverse hashing
Reverse IP mangling
9Outline
- Background on k-ary sketch
- Reversible sketch problem
- Modular hashing
- IP mangling
- Reverse hashing
- Evaluation
- Conclusion
10Taking Intersections
- Intersect A1, A2, A3, A4, A5
11The problem with simple intersection
- Each set Ai can be very large !
H 5 K 212 keys 232 (IP
addresses)
A1 232 / 212 220
12The problem with simple intersection
- Each set Ai can be very large !
- Solution
Modular hashing
13Modular hashing reduces the set size
32 bits
10010100
10101011
10010101
10100011
8 bits
h()
12 bits
010 110 001 101
14Modular hashing reduces the set size
32 bits
10010100
10101011
10010101
10100011
8 bits
Greatly reduces size of reverse mapped sets
15Modular hashing reduces the set size
A1 25 25 25 25
Intersection
Only 32 elements per word set
1
b1
2
b2
3
b3
4
b4
5
b5
16Modular hashing reduces the set size
A1 25 25 25 25 A2 25 25 25 25
Intersection
1
b1
2
b2
3
b3
4
b4
5
b5
17Problem Too many collisions
18Problem Too many collisions
Solution
IP Mangling with GF (Galois Extension Field)
IP Mangling a bijective mapping function for
breaking the key space continuity
19Outline
- Background on k-ary sketch
- Reversible sketch problem
- Modular hashing
- IP mangling
- Reverse hashing
- Evaluation
- Conclusion
20Handling Multiple Intersections
2H different intersections
1
b1
b1
2
b2
b2
3
b3
b3
4
b4
b4
5
b5
b5
- Much more difficult Solution Reverse Hashing
algorithms - Step 1 Reverse hashing for each module
- Step 2 Infer the whole key through bucket
index matching among candidates from each module
21Reverse Hashing for Each Module
Take the first word as an example
candidate set of the first word in Hash table
i
2,3,5
2, 6,9,10
H5, r1, K212 r tolerance level
0,2,3
2,3,8,10
3,6,7,9
2
2,3
All possible values of the first word in the
sketch
22Bucket Index Matrix of Candidates
H5, r1, K212
For each x in I1, we can get B1(x), a vector of
the heavy bucket sets which x hashes to.
192.168.0.1
192.123.47.62
192... hash to the red heavy buckets
23Prefix Extension Algorithm
Path discovery algorithm
I1
I2
B1
B2
150
72
lt150.72gt
47
more than r1Ignore!
104
lt47.72gt
Ignore!
lt236.104gt
236
24Prefix Extension Algorithm
25Recap
value stored
value
Streaming data recording
reversible k-ary sketch
Modular hashing
IP mangling
key
Heavy change detection
change threshold
reversible k-ary sketch
heavy change keys
Reverse hashing
Reverse IP mangling
n is the size of key space
26Outline
- Background on k-ary sketch
- Reversible sketch problem
- Modular hashing
- IP mangling
- Reverse hashing
- Evaluation
- Conclusion
27Evaluation
- Dataset
- A large US ISP (330M Netflow records)
- NU (19M Netflow records)
- Efficient data recordingFor the worst case
traffic, all 40-byte packets - Software 526Mbps on P4 3.2Ghz PC
- Hardware 16Gbps on a single FPGA broad
- Only a few hundred KB to a couple of MB memory
used - Only 15 memory access per packet for 48 bit
reversible sketches and 16 per packet for 64 bit
reversible sketches - Efficient heavy change detection and key
inference - 0.34 seconds for 100 changes. 13.33 seconds for
1000 change
28Key Inference Accuracy
- True positives and false positives of 16bit
reversible sketches for 32bit IP addresses
Deltoids S.Muthukrishnan and Graham Cormode,
What's New Find Significant Differences in
Network Data Streams. Infocom 2004
29More Results
- Stress test with larger dataset still accurate
- Scalable to larger key space size similar
results for 64bit IP pairs - Built anomaly/intrusion detection system to
detect, e.g., SYN flooding and port scans ICDCS
2006
30Conclusions
- Proposed the first reversible sketches which
- Record high speed network streams online
- Detect the heavy changes and infer the keys
online - Small memory usage, small of memory access per
packet - Scalable to large key space size
31Backup Slides
32Related work
- Compare with deltoids
- Accuracy better
- Scalable to large key space better
- of Memory access less
- PCF, IMC2004 not reversible
- Q. Zhao et al, IMC2005 S.Venkataraman,
NDSS2005 unique fan-out (fan-in) estimation.
33Modular Hashing
Optimal Hashing
34Reversible sketch problem
- However Not reversible
- Lack of an inference API INFERENCE(S,t)
- Important function for anomaly detection!
- Decouple the recording stage of sketches from the
detection stage to enable efficient combine and
inference. - Given a threshold t, report keys whose
corresponding sum of updates are larger than the
threshold. - Our contribution an efficient algorithm for
inference
35(No Transcript)
36Problem Too many collisions
37IP-mangling
- Use GF (Galois Extension Field) function for
attack resilience
38Modular Hashing
Optimal Hashing
Modular Hashing with IP Mangling
39Reverse Hashing for Each Module
Take the first word as an example
H5, r1, K212
all possible value of the first word for the
No. j heavy bucket in Hash table i
all possible value of the first word in Hash
table i
All possible value of the first word in the sketch
40False positive reduction by original sketch
verifying
Final result
lt150.72.182.75gt
Estimate
(lt150.72.182.75gt, 180)
(lt150.72.182.75gt, 180)
Verified original k-ary sketch
Threshold150
41K-ary sketch Krishnamurthy, Sen, Zhang, Chen,
2003
- first to detect flow-level heavy changes in
massive data streams at network traffic speeds - APIs
- UPDATE(S,k,u) Tj hj(k) u (for all j)
- ESTIMATE(S, k) sum of updates for key k
- Linear combination SCOMBINE(a,S1,b,S2)