Learning Communication Rules - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Learning Communication Rules

Description:

Recap: eXpose Mines for Rules. Learn all significant rules without prior knowledge ... Algorithms to mine and prune. Empirical validation on enterprise traces ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 32
Provided by: Srikanth5
Category:

less

Transcript and Presenter's Notes

Title: Learning Communication Rules


1
Learning Communication Rules
Srikanth Kandula
Ranveer Chandra and Dina Katabi
2
Network Admins. are Groping in the Dark
  • Focus on Traffic Volume
  • TCP80, HTTP30
  • Adapt report categories (e.g., AutoFocus)
  • Much traffic from ports 500-600
  • But, Whats Going On?
  • Traffic follows plan?
  • Misconfigurations
  • Suspicious Traffic

Besides focusing on volume, learn rules
underlying the traffic
  • (Active) user browsing web, reading/sending mail
  • (Automatic) SMS scan on a network, outlook refresh

3
X
X
X
Y
Y
Y
X
t
Whenever flowy happens, flowx is likely to occur
flowY ? flowX
Rule
(http ? DNS)
If you could learn such rules directly from a
trace,
  • Infer the actual behavior of applications
  • AFS root servers direct traffic to volume servers
    evenly
  • mail to the incoming MX, is forwarded onto group
    MXes
  • Notice misconfigurations and badness
  • these clients shld not be talking on known
    command-control ports this server shld not be
    responding to DHCP requests
  • this mail server shld not attempt connections to
    non-existent MXes

4
Report all significant rules with no specific
knowledge about a trace
5
Mining for Rules is Hard
  • How to define significance?
  • When is a group of flows interesting enough to
    report?
  • Avoid observer bias but cannot evaluate
    everything
  • Focus on one server, miss what you are not
    looking for
  • Practical, deal with noise, search quickly
  • eXpose
  • A scoring function for significance
  • Heuristics that bias search toward high hit-rate
  • Empirical validation on enterprise traces

6
Overview
Activity Matrix
flow1 flowK
time1

timeR
Packet Trace
Rules
  • Packet trace to Activity Matrix
  • Rows are 1s windows Columns are flows
  • Is flow active in timei-1, timei )? (at least
    one packet)
  • Association rule mining (X,Y are r.v. for
    columns)
  • Need not worry about interleaving
  • Dependencies are at these time-scales (an rtt, a
    server response)

All windows in .25s, 2s range yield similar
rules
7
Which Rules are Significant?
X ? Y
  • High Joint Probability?
  • X, Y may occur very often individually (e.g.,
    breeze, sun shining)
  • High Conditional Probability?
  • Say Y occurs only when X does, but both are rare
    (lottery, buy a jet)

8
Which Rules are Significant?
X ? Y
  • High Joint Probability?
  • High Conditional Probability?
  • We use mutual information (combines the two)

Measures fraction of change in Y due to X
Score0, if Y is independent of X
ScoreMax, if Y is fully dependent on X
Trades off dependency frequency
Encodes Directionality
9
Modifying Scores for Networking
  • Negative Correlation
  • Flows with little overlap

P(?YX) ? 1
leads to high score
10
Modifying Scores for Networking
  • Negative Correlation
  • Flows with little overlap
  • Long Running Flows
  • Large downloads, ssh/remote desktop
  • Trivial overlaps with long flow
  • Distinguish new vs. present
  • Present rules reported only if small mismatch in
    freq.
  • Too Many Possibilities
  • Bias, focus on pairs with at least one common IP
  • Miss rules, but hit-rate up 1000x and costs down
    10x

P(YX) ? 1
11
Generics
  • - Miss, if no client accesses server often
  • Rules that abstract away parts of a flow

Database
Client Server ? Server Database
Server

Client Server ? Server Database
(any client)
Kerberos
Client Rsrv. ? Client Kerberos


Client Rsrv. ? Client Kerberos
Reservation
(any client, but same on both sides)
  • To do this automatically,
  • what to abstract? (IP addresses at non-server
    port)
  • which pairs to consider for rule?
  • flows match IP, generics match abstracted IP

12
Mining for Rules
  • Techniques extend to arbitrary sized rules
  • Instead,
  • Focus on pair-wise rules (simpler is likelier)
  • Group similar rules
  • Eliminate weak rules between strongly connected
    groups
  • Transitive closure to read off clusters

O(f2)
O(fn1)
Recursive Spectral Partitioning (VKV00)
Rule Score







Rule Mining
Digests 105106 flows into 102103 rule clusters
13
Recap eXpose Mines for Rules
Activity Matrix
Rules
Rule Clusters
flow1 flowK
time1 present new

timeR
flowi.new ? flowj.present ...
Packet Trace
Contributions
  • Learn all significant rules without prior
    knowledge
  • Scoring function for rule significance
  • Avoids observer bias, yet stays feasible by
    focusing on high hit-rate
  • Algorithms to mine and prune

14
Related Work
  • Semi-Automated Discovery of App. Session
    Structure (KJPK06)
  • Sherlock (Diagnosing Performance Problems,
    BCGKMZ07)
  • Autofocus (ESV03)
  • BLINC (KPF05)
  • Stepping Stones (ZP00)
  • Learn all significant rules without prior
    knowledge
  • Avoids observer bias, yet stays feasible by
    focusing on high hit-rate
  • Scoring function for rule significance
  • Algorithms to mine and prune

15
Results
16
Evaluation Setup
Inside Microsoft
Before CSAILs Servers
Access Link of Conf. LANs
CSAILs Access
  • Traces at access and internal server-facing links
  • Packet Headers, Connection Records (Bro), some
    anon.
  • Operational n/w with ?103 clients, diverse
    traffic mix
  • Corroborated on test-bed traffic vetted by
    admins.
  • Ran eXpose on a 2.4GHz x86 with 8GB RAM

17
Rules Discovered by eXpose
  • Dependencies for Major Applications

email _at_ microsoft
Client. PFS1.X
Client. PFS2.X
Client. Proxy.80
Client. DC.88
Client. Mail.X
Client. Mail.135
18
Rules Discovered by eXpose
  • Dependencies for Major Applications

afs _at_ csail
AFS1.7000 Root.7002
C.7001 .
C.7001 AFS2.7000
C.7001 Root.7003
C.7001 AFS1.7000
19
Rules Discovered by eXpose
  • Dependencies for Major Applications
  • web, e-mail, file-servers, IM, print, video
    broadcast

web _at_ microsoft
Proxy3.80 .
Proxy2.80 .
Proxy1.80 .
Proxy4.80 .
20
Rules Discovered by eXpose
  • Dependencies for Major Applications
  • web, e-mail, file-servers, IM, print, video
    broadcast
  • Configuration Errors Other Badness

smtp IDENT _at_ csail
Client.113 MailServer.
Client. MailServer.25
21
Rules Discovered by eXpose
  • Dependencies for Major Applications
  • web, e-mail, file-servers, IM, print, video
    broadcast
  • Configuration Errors Other Badness
  • IDENT, Legacy emails, ssh scans, wingate

Legacy email ids _at_ csail
UnivMail. Old1.25
UnivMail. Old3.25
UnivMail. Old2.25
22
Rules Discovered by eXpose
  • Dependencies for Major Applications
  • web, e-mail, file-servers, IM, print, video
    broadcast
  • Configuration Errors Other Badness
  • IDENT, Legacy emails, ssh scans, wingate
  • Rules for stuff we didnt know before

Nagios monitors _at_ csail
23
Rules Discovered by eXpose
  • Dependencies for Major Applications
  • web, e-mail, file-servers, IM, print, video
    broadcast
  • Configuration Errors Other Badness
  • IDENT, Legacy emails, ssh scans, wingate
  • Rules for stuff we didnt know before
  • Nagios, LLMNR, iTunes

Link level multicast name resolution _at_ hotspots
H.137 Wins.137
Black box Little prior knowledge about servers,
applications, or users ? Can evolve
H. Multicast.5355
H. DNS.53
24
Correctness Completeness
  • False Positives
  • 13 of rule-clusters in CSAIL trace, we couldnt
    explain
  • False Negatives
  • Main CSAIL Web Server (too many different
    activities)
  • Dependencies on Personal Web Pages (too few
    traffic)
  • PlanetLab Traffic (punted)
  • Other Limitations
  • IPSec, Anonymized, Cover Traffic
  • Extensions
  • Rules repeat over time, and across traces
  • Application whitelisting, Customize Generics

25
Time to Mine for Rules
Flows (x 106)
.6
.2
.6
.9
2.8
At CSAILs access link, high fan-out with many
distinct flows
Stream Mining Appears Feasible!
26
eXpose
Packet Trace
Rules for frequently reoccurring flow sets
  • Learn all significant rules with no specific
    knowledge
  • Avoids observer bias, but feasible by focusing on
    high hit-rate
  • Scoring function for rule significance
  • Algorithms to mine and prune
  • Empirical validation on enterprise traces
  • found configurations protocols that we didnt
    know existed
  • learnt rules for actual behavior of applications
  • found config. errors, bot scans, infected
    machines

http//research.microsoft.com/srikanth
27
Backup
28
Expanding Search Space ( of flows)
of Discovered Rules
Rule Score (Modified JMeasure)
exposes few significant rules!
29
Expanding Search Space ( of flows)
Time to Mine Rules (s)
Memory Footprint (million rules)
Top Active Flows
Top Active Flows
exposes few rules costs a lot in time, memory
30
Varying Size of Time Windows
of Discovered Rules
Rule Score (Modified JMeasure)
All window sizes in .25s, 2s produce similar
rules!
31
For all rules X ? Y
Joint Probability
Prob. (X)
Prob. (Y)
Write a Comment
User Comments (0)
About PowerShow.com