Network Anomalies: Origins, Structure and Diagnosis - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Network Anomalies: Origins, Structure and Diagnosis

Description:

This talk: focus on detection and identification. 20 ... e.g., Spammers hijack route prefixes (network layer), to send spam at ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 37

Provided by: anukool

Category:

more less

Transcript and Presenter's Notes

Title: Network Anomalies: Origins, Structure and Diagnosis

1
Network Anomalies Origins, Structure and
Diagnosis

Anukool LakhinaOral Qualifier Exam

Committee Azer Bestavros John Byers Mark
Crovella (advisor)
2
Defining Anomalies

anomaly Merriam-Webster, 2004
deviation from the common rule, an irregularity
something different, abnormal, peculiar, or not
easily classified
Anomalies are unusual events, relative to some
baseline normal state
Relativistic events ? difficult to precisely nail
down

3
Network Anomalies

We want to capture unusual events that network
operators care about
Network anomalies are unexpected events that can
adversely impact the availability and
performance of networks

4
Examples
Survey of problem reports to NANOG 1994-99 241
serious problems reported 2001-04 375 same
problems reported Routing loops, instability,
hijacks, filtering, blackhole, outage, etc. have
not reduced in 10 years! Feamster04
5
More Examples
6
Talk Organization

Origin of Anomalies
How and why do anomalies arise?
Structure of Anomalies
How do anomalies differ?
How can we organize them?
Diagnosing Anomalies
Strategies to detect, identify, and mitigate
anomalies
This talk focus on detection and identification

7
Origin of Anomalies

Operational events
Misconfigurations, accidents,
Design/Implementation Consequences
Software bugs, unexpected protocol interactions,
Network abuse (malicious intent)
DOS attacks, worms, intrusions,
Unusual end-user behavior (not malicious)
Flash crowds, peer to peer traffic,

8
Operational Anomalies

Broadly arise due to operator error
accidents
Famous example AS 7007 incident
In 1997, a small ISP announced routes for the
entire Internet
Most ISPs believed the announcements
Result all Internet traffic sent to one router
for several hours, disrupting Internet
connectivity to hundreds of networks Farrow02
BGP configuration errors are pervasive
Mahajan02
Cause erroneous updates to 0.1-1 of the BGP
table each day 4 of these can cause outage
Many errors due to primitive router configuration
Distributed program without ability to compile
test Feamster03,Caldwell03

9
Operator error not exclusive to the Internet
only
Operator error single largest contributor to
failures in Tandem transaction processing
systems, accounting for 42 of all failures.
Gray85
10
prevalent in yet more domains
Oppenheimer03

Dominant in large Internet services
Studied failure data from 3 geographically
Internet services
51 of failures due to operator error
Dominant in Public Switched Telephone Networks
Kuhn97 studied FCC disruption reports from
1992-1994
50 outages due to operator error
54 in 2000 Enriquez02

Kuhn97
Enriquez02
11
Reasons for Human Error

Humans make errors, even when they know what they
are doing
Because understanding state of large,
tightly-coupled systems is difficult
Humans not good at diagnosing problems from first
principles
Especially in an emergency
Automation solves common tasks, leaving the rare,
complex ones for operators
Automation irony Poor automation reduces system
visibility ? harder for operators to diagnose
anomalies
Lessons Errors will always occur automation
in synergy with operator

Reason90
12
Design Implementation Flaws

Anomalies that arise due to implementation bugs
or from flawed design. Examples
1988 Internet congestion collapse Jacobson88
2004 Unexpected protocol interactions, e.g.,
interaction between inter-domain intra-domain
routing
Design Goal isolate Internet from routing
changes within an AS
Reality Small changes in internal routing
weights can cause large traffic shifts in
neighbor networks
Result operators set IGP metrics that make BGP
sense, rather than view them separately ? more
complexity
Lesson Routing protocols not designed with
interactions, network design configurations, and
dynamics in mind
Teixeria04,Teixeira05

13
Reasons for Design Flaws

Latent design errors exposed in stress conditions
Congestion collapse, routers reboot when
overloaded
Correlated failures frequently occur
Evidence in IP networks CBI04 , in
Internet-scale distributed services
Yalagalunda04
Traditional fault tolerant designs assume
independent failures
Margin of Safety (in Civil Engineering)
25 of all railroad bridges failed between
1850-1890s! Petroski92
What is the equivalent of a margin of safety
for computer systems networks? Patterson02

Perrow90
Petroski92
14
Abuse Anomalies

Defining characteristic Arise from malicious
intent and violate the targets confidentiality,
integrity, and availability RS91
Examples
Intrusions target confidentiality,
Route hijacks target integrity,
DOS attacks target availability

15
Reasons for Abuse Anomalies

Technical enablers
Unrestricted connectivity, platform homogeneity,
anonymity, few defenses
Attacker uses automation to target all systems at
once. Defender must defend all systems
Traditional threat attacker targets high-value
target, defender allocates more resources to
defend it
Economic motivations
Profit SPAM forwarding, extortion,
Emerging marketplace can buy sell zombie
machines
Bad guys now have financial incentive to get
better
Savage05
Political reasons
Cyber-terror, cyber-warfare, political protest
Weaver04

16
Unusual End-User behavior

Not malicious in intent but can have harmful
impact on availability and performance
Important to manage these anomalies to provision
network resources
e.g., high rate flows, peer to peer traffic,
measurement experiments, flash crowds
Flash crowds unusual demand for a resource
e.g, Starr report, the slashdot effect, ...

17
Flash Crowds hit MSNBC.com
MSNBC is experiencing high site traffic. We have
temporarily moved your personalized news to a
separate page click here
MSNBC.com homepage during a flash crowd
18
Lessons from Anomaly Origins

Network anomalies span a broad range, are
prevalent, and can have catastrophic impact
Anomalies are here to stay new anomalies will
arise
Cannot prevent anomalies, but can try to
accommodate them
Despite anomalies, PSTN still had 99.999
availability Two reasons
1) Error detection built into design Designers
devote half of the software in telephone switches
to error detection and correction. Kuhn97
2) Quick correct human intervention (
capabilities to intervene)

19
Talk Organization

Origin of Anomalies
How and why do anomalies arise?
Structure of Anomalies
How do anomalies differ?
How can we organize them?
Diagnosing Anomalies
Strategies to detect, identify, and mitigate
anomalies
This talk focus on detection and identification

20
A Common Feature Anomalies Create Unusual
Network Traffic

Despite their diversity, many serious anomalies
create unusual traffic
e.g., DDOS, flash, outage, scans, worms.
Some anomalies may not affect traffic, until
they become seriouse.g., dormant configuration
or design errors that result in an outage event
Challenges
How do we mine anomalies in traffic?
What type of traffic to analyze?

All Anomalies
Anomaliesvisible in traffic
21
Anomalies by Layer

Application LayerGenerates, interprets data
Transport LayerReliable data transfer (TCP)
Network LayerAddress assignment, and routing
(BGP, IGP, ..)
Physical Layer
MAC addressing and bit transmission

Layered TCP/IP Model
22
Abuse Examples, by Layer

DDOS floods
TCP attacks
Route hijacks
MAC flooding

MAC flooding
Overwhelm address-to-physical port mappings at
switch with spoofed packets
Switch enters failopen mode ? all incoming
traffic is now broadcast
Result eavesdropping of legitimate network
traffic

Deny service to the victim by 1) overwhelming
resource (DOS, MAC), 2) masquerading resource
(hijacks), 3) timing attacks (TCP attacks,
RoQ) RoQ attacks are timing-based attacks
that target adaptation mechanisms instead of
victim directly thus capable of attacking at
multiple layers Guirguis04,Guirguis05
23
Anomaly Origin Structure
24
Examples by Origin Structure
25
Multi-Layer Propagation

Anomalies travel across layers
Each layer has containment capability (e.g,
checksums)
Going up if error-checks fail to contain
anomalies, e.g, link failures
Going down e.g., router reboots on overload,
Nimda worm caused routing instability
Andersen04,Wang02
Multi-Layer Anomalies
Anomalies at multiple layers
e.g., Spammers hijack route prefixes (network
layer), to send spam at application layer
Bellovin01
Makes diagnosis challenging
especially root-cause analysis

26
Talk Organization

Origin of Anomalies
How and why do anomalies arise?
Structure of Anomalies
How do anomalies differ?
How can we organize them?
Diagnosing Anomalies
Strategies to detect, identify, and mitigate
This talk focus on detection and identification

27
Anomaly Diagnosis

Key ingredients of Anomaly Diagnosis
Detection Stating when an anomaly has occurred
or is occurring
Identification Isolating the anomaly from
normal, stating its type, and where possible,
exposing its structure and origin
Diagnosis also includes
Mitigation avoiding, managing and controlling
adverse impact of anomalies
Lots of research here, most centered on re-design
of protocols and architectures
Not in this talk

28
Identification Approaches

Anomaly Identification Given a detected anomaly,
what
is its structure and cause? Problems
Origin structure are typically not
observable,
Evidence is partial, ambiguous, inconsistent
Strategies
Model-driven e.g., traversing system
dependency models
Rule-based e.g., expert-systems, case-based
reasoning
Learning-based e.g., clustering and
classification

29
Detection Approaches

1) Anomaly as known signatures to match
Key Challenge Define a broad set of signatures
(without causing false alarms)
Advantage Identification for free
Problem Cannot detect new anomalies
2) Anomalies as deviations from normality
Key Challenge Define notion of normality
Advantage Can detect new anomalies
Problem Identification problem difficult
3) Hybrid Schemes
Match anomaly against known signatures if no
match, then check for deviations from normality

this talk
30
Deviation from normal
The model is based on the hypothesis that
exploitation of a systems vulnerability involves
abnormal use of the system therefore security
violations could be detected from abnormal
pattern of system usage.
Denning86
31
Capturing normal behavior

Broadly, three strategies to model normal
behavior
System-knowledge models
Use only a priori knowledge of system
Useful in static settings when normal behavior
is known (e.g., configuration correctness
checking)
Data-driven models
Use measurements to model normality via
correlation in data, e.g., using temporal or
spatial correlation
Useful in dynamic settings or when normal is
unknown (e.g., for traffic anomalies)
Hybrid schemes
Use measurements as input to a system model, and
predict expected behavior, e.g., via dependency
graphs
Useful when normal behavior for system is known,
but workload is unknown (e.g., vulnerability
filters for end-hosts)

32
Measurements by Layer

Measurements placed in a layer if they are
available at that layer (and useful to diagnose
anomalies)

33
How methods use data to capture normal
A broad categorization of methods
34
Measurements by Location

Measure at individual end-host (edge)
Detailed payloads possible
Limited global visibility, limited mitigation
e.g., network intrusion detection systems
Paxson99
Measure at multiple end-hosts (overlay)
Detailed payloads, that are exchanged
Better global visibility, but still limited
mitigation
e.g., Collaborative intrusion detection
Yegneswaran04
Measure at core (ISP)
Sampled packet headers
Network-wide visibility, effective mitigation
possible
But, mining network-wide traffic is difficult
LCD05

35
Recent Trends in Diagnosis

Network-Wide Diagnosis
Exploit correlation across links in order to
build models of normal traffic,
trace how anomalies move in network
LCD04,LCD05
Multi-Layer Diagnosis
Correlate traffic data from multiple layers
simultaneously Roughan04
Enables sophisticated identification and
root-cause analysis
Recent thrust in fault management literature also
Steinder04

36
Final thoughts

Network anomalies span a wide range and can have
severe impact
Network anomalies are increasing in prevalence
and unlikely to go away
Effective diagnosis and mitigation methods are
needed to manage anomalies
Despite their diversity, many anomalies disturb
network traffic
General anomaly diagnosis may be possible by
mining anomalies in network traffic at different
layers and topological locations