Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications

Description:

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller1, Zhichun Li1, Yan Chen1, Yan Gao1, Ashish Gupta1, Yin ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 42
Provided by: Zhich4
Category:

less

Transcript and Presenter's Notes

Title: Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications


1
Reverse Hashing for High-speed Network
Monitoring Algorithms, Evaluation, and
Applications
  • Robert Schweller1, Zhichun Li1, Yan Chen1, Yan
    Gao1, Ashish Gupta1, Yin Zhang2, Peter Dinda1,
    Ming-Yang Kao1, Gokhan Memik1

1 Lab for Internet and Security Technology
(LIST), Northwestern Univ. 2 University of
Texas at Austin
2
The Spread of Sapphire/Slammer Worms
3
Motivation (online change detection)
  • Online network anomaly/intrusion detection over
    high speed links
  • Small memory usage
  • Small of memory access per packet
  • Scalable to large key space size
  • Primitives for online anomaly detection
  • Heavy hitters (lots of prior work)
  • Heavy changes enabler for aggregate queries over
    multiple data streams
  • Asymmetric routing demands spatial aggregation
  • Time Series Analysis (TSA) need temporal
    aggregation

4
Outline
  • Background on k-ary sketch
  • Reversible sketch problem
  • Modular hashing
  • IP mangling
  • Reverse hashing
  • Evaluation
  • Conclusion

5
K-ary sketch
Krishnamurthy, Sen, Zhang, Chen, 2003 First to
detect flow-level heavy changes in massive data
streams at network traffic speeds
6
k-ary sketch
Krishnamurthy, Sen, Zhang, Chen, 2003
APIs
Update (k, u) Tj hj(k) u (for all j)
SCOMBINE(a,S1,b,S2)
7
Reverse Sketch Problem
  • Main problem
  • Cannot efficiently report keys with heavy
    changeINFERENCE(S,t)
  • Important function for anomaly detection!
  • Our Contribution
  • Determine set of keys that have large estimates
    in a sketch

8
Reversible sketch framework
value stored
value
Streaming data recording
reversible k-ary sketch
Modular hashing
IP mangling
key
Heavy change detection
change threshold
reversible k-ary sketch
heavy change keys
Reverse hashing
Reverse IP mangling
9
Outline
  • Background on k-ary sketch
  • Reversible sketch problem
  • Modular hashing
  • IP mangling
  • Reverse hashing
  • Evaluation
  • Conclusion

10
Taking Intersections
  • Intersect A1, A2, A3, A4, A5

11
The problem with simple intersection
  • Each set Ai can be very large !

H 5 K 212 keys 232 (IP
addresses)
A1 232 / 212 220
12
The problem with simple intersection
  • Each set Ai can be very large !
  • Solution

Modular hashing
13
Modular hashing reduces the set size
32 bits
10010100
10101011
10010101
10100011
8 bits
h()
12 bits
010 110 001 101
14
Modular hashing reduces the set size
32 bits
10010100
10101011
10010101
10100011
8 bits
Greatly reduces size of reverse mapped sets
15
Modular hashing reduces the set size
A1 25 25 25 25
Intersection
Only 32 elements per word set
1
b1
2
b2
3
b3
4
b4
5
b5
16
Modular hashing reduces the set size
A1 25 25 25 25 A2 25 25 25 25
Intersection
1
b1
2
b2
3
b3
4
b4
5
b5
17
Problem Too many collisions
18
Problem Too many collisions
Solution
IP Mangling with GF (Galois Extension Field)
IP Mangling a bijective mapping function for
breaking the key space continuity
19
Outline
  • Background on k-ary sketch
  • Reversible sketch problem
  • Modular hashing
  • IP mangling
  • Reverse hashing
  • Evaluation
  • Conclusion

20
Handling Multiple Intersections
2H different intersections
1
b1
b1
2
b2
b2
3
b3
b3
4
b4
b4
5
b5
b5
  • Much more difficult Solution Reverse Hashing
    algorithms
  • Step 1 Reverse hashing for each module
  • Step 2 Infer the whole key through bucket
    index matching among candidates from each module

21
Reverse Hashing for Each Module
Take the first word as an example
candidate set of the first word in Hash table
i
2,3,5
2, 6,9,10
H5, r1, K212 r tolerance level
0,2,3
2,3,8,10
3,6,7,9
2
2,3
All possible values of the first word in the
sketch
22
Bucket Index Matrix of Candidates
H5, r1, K212
For each x in I1, we can get B1(x), a vector of
the heavy bucket sets which x hashes to.
192.168.0.1
192.123.47.62
192... hash to the red heavy buckets
23
Prefix Extension Algorithm
Path discovery algorithm
I1
I2
B1
B2
150
72
lt150.72gt


47
more than r1Ignore!
104
lt47.72gt
Ignore!
lt236.104gt
236
24
Prefix Extension Algorithm




25
Recap
value stored
value
Streaming data recording
reversible k-ary sketch
Modular hashing
IP mangling
key
Heavy change detection
change threshold
reversible k-ary sketch
heavy change keys
Reverse hashing
Reverse IP mangling
n is the size of key space
26
Outline
  • Background on k-ary sketch
  • Reversible sketch problem
  • Modular hashing
  • IP mangling
  • Reverse hashing
  • Evaluation
  • Conclusion

27
Evaluation
  • Dataset
  • A large US ISP (330M Netflow records)
  • NU (19M Netflow records)
  • Efficient data recordingFor the worst case
    traffic, all 40-byte packets
  • Software 526Mbps on P4 3.2Ghz PC
  • Hardware 16Gbps on a single FPGA broad
  • Only a few hundred KB to a couple of MB memory
    used
  • Only 15 memory access per packet for 48 bit
    reversible sketches and 16 per packet for 64 bit
    reversible sketches
  • Efficient heavy change detection and key
    inference
  • 0.34 seconds for 100 changes. 13.33 seconds for
    1000 change

28
Key Inference Accuracy
  • True positives and false positives of 16bit
    reversible sketches for 32bit IP addresses

Deltoids S.Muthukrishnan and Graham Cormode,
What's New Find Significant Differences in
Network Data Streams. Infocom 2004
29
More Results
  • Stress test with larger dataset still accurate
  • Scalable to larger key space size similar
    results for 64bit IP pairs
  • Built anomaly/intrusion detection system to
    detect, e.g., SYN flooding and port scans ICDCS
    2006

30
Conclusions
  • Proposed the first reversible sketches which
  • Record high speed network streams online
  • Detect the heavy changes and infer the keys
    online
  • Small memory usage, small of memory access per
    packet
  • Scalable to large key space size

31
Backup Slides
32
Related work
  • Compare with deltoids
  • Accuracy better
  • Scalable to large key space better
  • of Memory access less
  • PCF, IMC2004 not reversible
  • Q. Zhao et al, IMC2005 S.Venkataraman,
    NDSS2005 unique fan-out (fan-in) estimation.

33
Modular Hashing
Optimal Hashing
34
Reversible sketch problem
  • However Not reversible
  • Lack of an inference API INFERENCE(S,t)
  • Important function for anomaly detection!
  • Decouple the recording stage of sketches from the
    detection stage to enable efficient combine and
    inference.
  • Given a threshold t, report keys whose
    corresponding sum of updates are larger than the
    threshold.
  • Our contribution an efficient algorithm for
    inference

35
(No Transcript)
36
Problem Too many collisions
37
IP-mangling
  • Use GF (Galois Extension Field) function for
    attack resilience

38
Modular Hashing
Optimal Hashing
Modular Hashing with IP Mangling
39
Reverse Hashing for Each Module
Take the first word as an example
H5, r1, K212
all possible value of the first word for the
No. j heavy bucket in Hash table i
all possible value of the first word in Hash
table i
All possible value of the first word in the sketch
40
False positive reduction by original sketch
verifying
Final result
lt150.72.182.75gt
Estimate
(lt150.72.182.75gt, 180)
(lt150.72.182.75gt, 180)
Verified original k-ary sketch
Threshold150
41
K-ary sketch Krishnamurthy, Sen, Zhang, Chen,
2003
  • first to detect flow-level heavy changes in
    massive data streams at network traffic speeds
  • APIs
  • UPDATE(S,k,u) Tj hj(k) u (for all j)
  • ESTIMATE(S, k) sum of updates for key k
  • Linear combination SCOMBINE(a,S1,b,S2)
Write a Comment
User Comments (0)
About PowerShow.com