Title: Research Challenges in Passive Network Measurement
1Research Challengesin Passive Network Measurement
- Jim Kurose
- Dept. Computer Science
- University of Massachusetts
- Amherst MA
- http//www.cs.umass.edu/kurose
2Passive Network Measurement
Goal of this talk passive network measurement,
Research themes, implication for measurement
platforms
3Overview
- Single node
- general architecture
- protocol/traffic studies
- computing challenges
- anonymization
- compression
- counting
- scanning for strings
- themes
- Multiple nodes
- distributed monitoring
- distributed detection
- on-going efforts
- themes
- Discussion, summary
4Why monitor/measure?
- understanding protocols, applications in the
wild - in recent IMC/IMWs BGP, OSPF, TCP, DNS, P2P,
NAT, DHCP, RealVideo, games, chat, alternate
routing - traffic characterization
- provisioning (TM estimation)
- bottleneck analysis where do bottlenecks occur?
- meeting service requirements VoIP Iannaccone
2004
CDF
100
101
102
10-1
outage length (min)
recovery times
5Why monitor/measure?
- network management/control, security
- heavy-hitter identification usage-based charging
- better management on beyond SNMP
- anomaly detection e.g., large number of sessions
established to/from IP address (port scans,
attacks), large changes in traffic - worm detection finding patterns in packets
Arguably the largest impact application,
but still a research (not operational) scenario
6A generic monitoring node configuration
external communication
monitored link
monitored link
Gigabit Ethernet
SCSI RAID
switch
data indexing query processing
NPs
7What to measure/monitor/compute in NP?
- anonymization of IP header (next talk, by Tilman
Wolf) - Flow-based compression of stored packet headers
- one record/flow
- only store changing fields in packet header
- 3-4 times compression Iannacconne 2001, Peuhkuri
2001 - need to solve flow identification problem
8Computing Traffic Statistics at High Speeds
- Question how many unique flows observed?
- Challenge backbone routers have million(s) of
flows - Naïve solution counters
- Better solution approximate counting, hashing to
bitmaps Estan 2003
- An illustrative example of data streaming
algorithms, cleverly trading speed for accuracy.
Related work - identifying, counting elephant flows Estan 2002,
Golab 2003 - counting packets in each flow Chandra 2003
- statistical sampling
9Bitmap counting direct bitmap
Set bits in the bitmap using hash of the flow ID
of incoming packets
HASH(green)10001001
Slides courtesy of C. Estan, G. Varghese
10Bitmap counting direct bitmap
Different flows have different hash values
HASH(blue)00100100
11Bitmap counting direct bitmap
Packets from the same flow always hash to the
same bit
HASH(green)10001001
12Bitmap counting direct bitmap
Collisions OK, estimates compensate for them
HASH(violet)10010101
13Bitmap counting direct bitmap
HASH(orange)11110011
14Bitmap counting direct bitmap
HASH(pink)11100000
15Bitmap counting direct bitmap
As the bitmap fills up, estimates get inaccurate
HASH(yellow)01100011
16Bitmap counting direct bitmap
Solution use more bits
HASH(green)10001001
17Bitmap counting direct bitmap
Solution use more bits
Problem memory scales with the number of flows
HASH(blue)00100100
18Bitmap counting virtual bitmap
Solution a) store only a portion of the bitmap
b) multiply estimate by scaling
factor
19Bitmap counting virtual bitmap
HASH(pink)11100000
20Bitmap counting virtual bitmap
Problem estimate inaccurate when few flows active
HASH(yellow)01100011
21Bitmap counting multiple bitmaps
Solution use many bitmaps, each accurate
for a different range
22Bitmap counting multiple bmps
HASH(pink)11100000
23Bitmap counting multiple bmps
HASH(yellow)01100011
24Searching for Patterns at High Speeds
- Question do any of a set of strings appear in
packet? - Application intrusion detection, e.g.,
perl.exe in packet sent to port 80 (http) - Fast (non streaming) algorithms for multiple-
string matching exist Fisk 2001, - butPaxson 2004
- The resourceful attacker
- splits perl.exe between two TCP packets
- countermeasure remember last packets content
- transmits two TCP packets out of order (will be
re-ordered at destination) - countermeasure receiver-side reassembly/reorderin
g at measurement point
25Search for Patterns at High Speeds
- The resourceful attacker
- inserts chaff into TCP data stream, to fool IDS
TTL15, P
TTL15, E
TTL10, A
X
TTL15, R
TTL15, L
attacker
IDS
target
Solvable .. but requires even more smarts at IDS
Moral an escalating arms race, computing
capabilities at measurement point always an
advantage
26Single node emerging themes
- many important applications for single node
measurement - trading speed for accuracy approximate answers
enable high-speed implementations - adaptivity string search patterns, traffic
thresholds (e.g., heavy hitter definition) may
change - interface to NP
- ability to configure measurement service on the
fly
27Overview
- Single node
- general architecture
- protocol/traffic studies
- computing challenges
- anonymization
- compression
- counting
- scanning for strings
- themes
- Multiple nodes
- distributed monitoring
- distributed detection
- on-going efforts
- themes
- Discussion, summary
28Distributed Monitoring/Measurement
- coordinating multiple measurement points
- avoiding redundant operations
- packets within flows trajectory sampling
- among flows monitor placement
- distributed detection
29Avoiding Redundancy trajectory sampling
- Goal want record of packet along path
- sample packets such that packet stored at one
monitor, also stored everywhere else Duffield
2002
30Maximizing flow coverage monitor placement
- Goal place monitors to maximize flow coverage
Bhattacharyya 2004, Suh 2004 - NP-hardness complexity results with and without
sampling - greedy heuristics
31Distributed detection
- Goal notification when aggregate traffic rate to
a destination exceeds threshold - Needed sum of distributed ingress rates
- centralized approach
- DHT hash to node that gathers sum data
- in-network summation
32Distributed Monitoring/Measurement
- Work in this area just beginning
- ATT Gigascope
- Intel
- UMass Hyperion
- Overlay networks
- PIER (Berkeley)
- Sophia (Princeton)
33Overview
- Single node
- general architecture
- protocol/traffic studies
- computing challenges
- anonymization
- compression
- counting
- scanning for strings
- themes
- Multiple nodes
- distributed monitoring
- distributed detection
- on-going efforts
- themes
- Discussion, summary
34Useful analogy measurement as sensing
Collaborative adaptive sensing of the atmosphere
35Useful analogy measurement as sensing
A distributed measurement infrastructure
NOC
36Summary
- lots of exciting measurement research!
- algorithms/architecture for individual node,
traffic characterization - new frontier wireless
- distributed measurement for network
management/control security - just beginning
- require collaboration among hardware designers,
systems software algorithms (networking, OS,
database), providers