Kunchan Lan - PowerPoint PPT Presentation

1 / 160
About This Presentation
Title:

Kunchan Lan

Description:

Search for Invariants. Invariant: behavior that holds in a very wide ... Useful for accounting/billing, traffic monitoring, user profiling, data mining, etc. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 161
Provided by: kunch
Category:
Tags: hotmail | kunchan | lan | search | user

less

Transcript and Presenter's Notes

Title: Kunchan Lan


1
Network Measurements, Modeling and Simulations
  • Kun-chan Lan
  • Department of Computer Science and Information
    Engineering
  • klan_at_csie.ncku.edu.tw

2
Some admin issues before we start
  • No class next week
  • Paper review list due next week (3/16)
  • Project proposal due 3 weeks later (3/30)
  • TA
  • Yi-Wei Ting
  • Office 4243
  • Office hours Thursday AM 1000 PM 500
  • Email p7893113_at_mail.ncku.edu.tw
  • TEL 0912-302375
  • MSN iwting_at_hotmail.com

3
Outline
  • Model and simulate Internet traffic
  • Its hard to model and simulate Internet
  • We advocate trace-driven simulation
  • Internet and wireless measurements
  • Case study modeling heavy-hitter traffic

4
The challenges in modeling and simulating
Internet traffic
5
What is a model?
  • Abstraction of real world
  • Base of a network simulation
  • Topology model
  • e.g. a dumbbell topology
  • Traffic model
  • 80 TCP 20 UDP
  • Queuing model
  • e.g. FIFO, Fair queuing, etc.
  • ..

6
Role of simulation
  • Based on some particular models
  • Topology e.g. dumbell vs. tree
  • Traffic e.g. TCP vs. UDP
  • Widely used by researcher to study Internet
  • Millions of hosts in different administrative
    domains
  • Simulation vs. experiment
  • Repeatability
  • Configurability
  • Scalability
  • Explore complicated scenarios
  • Study future application/prtotocol/network

7
What simulation doest do
  • Realism
  • Details of simulation matters!
  • Its your responsibility to know what level of
    details you need to capture in the simulation
  • Prove correctness of the model
  • Only for validation!
  • The value of simulation relies on a good model

8
Its hard to simulate Internet
  • Network heterogeneity
  • Rapid and unpredictable change

which is GOOD though -- so that more PhD
students can be produced in this area ?
9
Network heterogeneity
  • Topology
  • Link properties
  • Protocol
  • traffic
  • All the above matter when you do the simulation

10
Difficulty in modeling topology
  • Constantly changing
  • Routing change
  • Link/node up and down
  • ISPs typically do not make topological
    information available
  • There is no typical topology
  • Depends on what are you simulating

11
Difficulty in modeling links
  • large diversities
  • Speed e.g. modem vs. fiber optic link
  • Loss e.g. cooper wire vs. 802.11
  • Transmission point-to-point vs. broadcast
  • Latency DSL vs. satellite links
  • Routing-dependent
  • Asymmetry

12
Difficulty in modeling protocol
  • Differences in implementations
  • 400 different TCP implementations
  • Different applications and different traffic mix

13
Difficulty in modeling traffic
  • Traffic is different everywhere
  • Effect of background traffic
  • Queuing, congestion
  • Some application are adaptive to network
    conditions

14
Rapid and unpredictable changes
  • Change in TCP Reno -gt NewReno/SACK
  • Change in devices PC-gthandheld
  • Change in web caching -gt CDN
  • Change in killer applicaton
  • web-gtp2p-gtVoIP?
  • Change in physical layer wired -gt wireless

15
Coping strategy
  • OK, so its hard to simulate Internet, but can we
    do something about it?
  • Yes
  • Systematically explore important parameters
  • Searching for invariants

16
Network behavior as a function
  • Explore network behavior as a function of
    changing parameters
  • ltobserved trafficgt f(x1,x2,x3,..)
  • Impossible to explore the whole set of parameters
  • Challenge identify important parameters
  • Example parameters to which a simulation might be
    sensitive
  • Congestion
  • Topology
  • Router mechanism (routing, scheduling, etc.)

17
Search for Invariants
  • Invariant behavior that holds in a very wide
    range of environment
  • Examples
  • Diurnal patterns
  • Self-similarity
  • Poisson session arrival
  • Heavy-tailed distribution
  • Geographical topology
  • Extract invariants from real world data
  • Extensive measurements!

18
Question?
19
Outline
  • Model and simulate Internet traffic
  • Its hard to model and simulate Internet
  • Internet and wireless measurements
  • Case study modeling heavy-hitter traffic

20
Why measuring?
  • To tell us what are the invariants, and what are
    just artifacts of the system
  • A base for realistic modeling and simulation
  • A common practice in other science disciplines
    (physics, biology, etc)

21
Things I am going to tell you in the next hour
  • What can you measure?
  • Things that you need to know when you measure
  • Where can you get Internet traffic measurements
    for free?

22
Measure the Internet
  • What can you measure
  • Traffic
  • Routing
  • Topology
  • Performance
  • Multicast
  • Wireless/Mobility

23
Tool for measuring traffic
  • Tcpdump/etherreal (libpcap)
  • Netflow
  • NetTrMet/RTG (SNMP)

24
tcpdump/Ethereal
  • tcpdump
  • Most commonly used packet collector
  • based on libpcap API
  • Output can be easily analyzed using awk/perl
    scripts
  • Ethereal
  • GUI-based
  • Support various trace formats, including tcpdump,
    snoop, etc.
  • Support various link-layer headers, including
    802.11, ATM, etc.
  • tcpdpriv
  • A commonly used packet anonymizer (to share
    traces with the others)
  • Libpcap-based
  • Link-level headers are passed through unchanged.

25
Usage of tcpdump
  • tcpdump -adeflnNOpqStvx -c count
  • -F file   -i interface -r file -s
    snaplen
  • -T type -w file expression
  • Must run as root or have sudo permission

26
ltoptiongt
  • -i Listen on interface. If unspecified, tcpdump
    searches the system interface list for the lowest
    numbered, configured up interface (excluding
    loopback)
  • -n Don't convert addresses (i.e., host
    addresses, port numbers, etc.) to names

27
ltoptiongt
  • -p Don't put the interface into promiscuous
    mode.
  • -q Quick (quiet?) output. Print less protocol
    information so output lines are shorter.
  • -r Read packets from file (which was created with
    the -w option). Standard input is used if file is
    -''.

28
ltoptiongt
  • -w Write the raw packets to file rather than
    parsing and printing them out. They can later be
    printed with the -r option. Standard output is
    used if file is -''.
  • -r Read packets from file (which was created
    with the -w option). Standard input is used if
    file is -''.
  • -S Print absolute, rather than relative, TCP
    sequence numbers

29
ltoptiongt
  • -s snarf snaplen bytes of data from each packet
    rather than the default of 68. 68 bytes is
    adequate for IP, ICMP, TCP and UDP but may
    truncate protocol information from name server
    and NFS packets. Packets truncated because of a
    limited snapshot are indicated in the output with
    proto'', where proto is the name of the
    protocol level at which the truncation has
    occurred.
  • Taking larger snapshots both increases the
    amount of time it takes to process packets and,
    effectively, decreases the amount of packet
    buffering. This may cause packets to be lost.
  • - Limit snaplen to the smallest number that
    will capture the protocol information you're
    interested in.

30
ltoptiongt
  • -t Don't print a timestamp on each dump line.
  • -tt Print an unformatted timestamp on each dump
    line.
  • -v (Slightly more) verbose output. For example,
    the time to live and type of service information
    in an IP packet is printed.
  • -vv Even more verbose output. For example,
    additional fields are printed from NFS reply
    packets.
  • -x Print each packet in hex.

31
ltexpressiongt
  • selects which packets will be dumped. If no
    expression is given, all packets will be dumped.
    Otherwise, only packets for which expression is
    true' will be dumped.
  • The expression consists of one or more
    primitives. Primitives usually consist of an id
    (name or number) preceded by one or more
    qualifiers.
  • There are three different kinds of qualifier.
  • lttypegt ltdirgt ltprotogt

32
ltqualifiergt
  • lttypegt
  • what kind of thing the id name or number refers
    to
  • Possible types are host, net and port
  • E.g., host csie.ncku.edu.tw', net 146.132',
    port 20'
  • If there is no type qualifier, host is assumed.

33
ltqualifiergt
  • ltdirgt
  • specify a particular transfer direction to and/or
    from id.
  • Possible directions are src, dst, src or dst and
    src and dst.
  • E.g., src csie.ncku.edu.tw', dst net 146.132',
    src or dst port ftp-data'.
  • If there is no dir qualifier, src or dst is
    assumed

34
ltqualifiergt
  • ltprotogt
  • restrict the match to a particular protocol.
  • Possible protos are ether, fddi, ip, arp, rarp,
    decnet, lat, sca, moprc, mopdl, tcp and udp.
  • E.g., ether src server1.ncku.edu.tw', arp net
    128.3', tcp port 21'.
  • If there is no proto qualifier, all protocols
    consistent with the type are assumed. E.g., src
    mail.ncku.edu.tw' means (ip or arp or rarp) src
    mail.ncku.edu.tw'

35
Complex expression
  • complex filter expressions are built up by using
    the words and, or and not to combine primitives.
  • E.g., host csie.ncku.edu.tw and not port ftp
    and not port ftp-data'.
  • Iidentical qualifier lists can be omitted.
  • E.g., tcp dst port ftp or ftp-data or domain'
    tcp dst port ftp or tcp dst port ftp-data or
    tcp dst port domain'.

36
Allowable primitives
  • dst host host
  • src host host
  • host host
  • ether dst ehost
  • ether src ehost
  • ether host ehost
  • gateway host

37
Allowable primitives
  • dst net net
  • src net net
  • net net
  • net net mask mask
  • net net/len
  • True if the IP address matches net a netmask
    len bits wide. May be qualified with src or dst.
  • dst port port
  • src port port
  • port port

38
Allowable primitives
  • less length
  • True if the packet has a length less than or
    equal to length. This is equivalent to len lt
    length.
  • greater length
  • ip proto protocol
  • True if the packet is an ip packet of protocol
    type protocol. Protocol can be a number or one of
    the names icmp, igrp, udp, nd, or tcp. Note that
    the identifiers tcp, udp, and icmp are also
    keywords and must be escaped via backslash (\)
  • ether broadcast
  • ip broadcast

39
Allowable primitives
  • ether multicast
  • ip multicast
  • ip, arp, rarp, decnet
  • short for ether proto p where p is one of the
    above protocols.
  • tcp, udp, icmp
  • short for ip proto p

40
Relation operator
  • expr relop expr
  • relop is one of gt, lt, gt, lt, , !
  • expr is an arithmetic expression composed of
    integer constants, the normal binary operators
    , -, , /, , , a length operator, and
    special packet data accessors.
  • To access data inside the packet, use the
    following syntax proto expr size Proto is
    one of ether, fddi, ip, arp, rarp, tcp, udp, or
    icmp. E.g. tcp0 means the first byte of the
    TCP header
  • For example, ether0 1 ! 0' catches all
    multicast traffic. The expression ip0 0xf !
    5' catches all IP packets with options.

41
Combining primitives
  • Primitives may be combined using
  • Negation (!' or not').
  • Concatenation (' or and').
  • Alternation (' or or').
  • Negation has highest precedence. Alternation and
    concatenation have equal precedence and associate
    left to right..
  • If an identifier is given without a keyword, the
    most recent keyword is assumed.
  • E.g., not host vs and ace is short for not host
    vs and host ace, which should not be confused
    with not ( host vs or ace )

42
Netflow
  • Built-in service for most Cisco router/switch
    that runs Cisco IOS
  • Provide flow-level information
  • First packet in a flow is used to build an entry
    in the cache
  • Per-interface basis
  • Useful for accounting/billing, traffic
    monitoring, user profiling, data mining, etc.

43
More on Netflow
  • Typical cache size 4K-128K (typical DRAM size
    2M-8M)
  • Need to use the cache efficiently
  • When to expire netflow cache entries
  • Idle time gt t
  • Long-lived flows (duration gt 30min)
  • TCP connections with FIN or RST
  • when cache becomes full (applying some heuristics
    to age flows)

44
Management of Netflow
  • Netflow FlowCollector
  • can collect flow info from multiple
    NetFlow-enabled devices
  • data volume reduction through selective filtering
    and aggregation
  • store flow information for off-line analysis
  • Netflow FlowAnalyzer
  • data visualization graphical data display
  • data export to external applications (such as
    Excel)
  • Netflow Server
  • collect flow statistics from multiple
    FlowCollector
  • further summarize NetFlow statistics by enabling
    bi-directional consolidation
  • store NetFlow statistics in a common commercial
    RDBMS (can be queried via SQL later)
  • encrypt and compress NetFlow statistics

45
NetTrMet
  • Collect flow data via SNMP
  • builds up packet and byte counts for traffic
    flows
  • Flows are defined by their end-point addresses
  • Address can be ethernet addresses, IP address or
    the combination of both
  • Can specify a set of rules to filter the flows of
    interest
  • Run under dos or Unix

46
RTG
  • A SNMP statistics monitoring system
  • Commonly used by ISPs
  • collect time-series SNMP data from a large number
    of interfaces
  • Run as a daemon
  • All collected data is inserted into a relational
    database where complex queries and reports may be
    generated via SQL
  • can poll at sub-one-minute intervals
  • utilities are included to generate traffic
    reports, 95th percentile reports and graphical
    data plots

47
Tool for measuring routing
  • Traceroute
  • tracert command for Windows
  • RouteView

48
traceroute
  • Trace the path from a source to a destination
  • Show how many hops a packet required to reach the
    destination and how long each hop takes.
  • Utilize IP Time-to-Live (TTL) field
  • TTL value specifies how many hops a packet is
    allowed to travel (decremented by 1 at each hop).
    An ICMP TIME_EXCEEDED response is returned to the
    source once TTL reaches 0.
  • Send a series of packets and incrementing the TTL
    value with each successive packet.

49
(No Transcript)
50
RouteView
  • A large collection of BGP routing tables from
    several backbones (from 60 vantage points and
    400 AS)
  • Aim to provide network operators the information
    about the global routing system from various
    locations around the Internet

51
BGP basics
  • BGP an inter-gateway protocol to route packets
    between Autonomous System (AS)
  • AS a group of networks that is controlled by a
    common network administrator on behalf of a
    single administrative entity. Each AS is assigned
    a globally unique number
  • Convey information about AS path topology
  • Run on top of TCP (port 179)
  • A path vector protocol

52
Path vector protocol
AS100 180.10.0.0/16 100
AS200 180.10.0.0/16 200 100
time
AS300 180.10.0.0/16 300 200 100
53
Tool for measuring topology
  • traceroute-based
  • Skitter
  • Rocketfuel

54
Skitter
  • effort of CAIDA
  • ICMP-based similar to traceroute
  • probing the paths from a source to many
    destinations IP addresses spread throughout the
    IPv4 address space
  • RTT and forward paths are
  • collected

55
Rocketfuel
  • Input
  • traceroute (utilizing public available tracroute
    servers)
  • BGP
  • DNS
  • Output (per ISP)
  • Backbone
  • POP
  • Peer links

56
Path discovery
  • Use 750 public available traceroute sources
  • Merge traceroute paths from multiple sources to
    multiple destinations to obtain network map
  • Brute-force (all src all dest) approach does
    not work
  • Too many addresses to probe (150M!)
  • Too much load for the traceroute server
  • Too much traffic for the network
  • Approach
  • Only probe the paths which are most relevant
  • Paths that transit the targeted ISP
  • Omit redundant paths
  • Other challenges
  • Alias one router might have multiple IP
    addresses, one for each of its interfaces
  • Geographical location of the router

57
Selected measurements
  • per-ISP map
  • Only choose traceroutes that are expected to
    transit the ISP (direct probing)
  • Use BGP routing tables
  • Data from RouteView
  • Path reduction
  • Some probes might have identical paths inside the
    ISP

58
Use BGP to choose traceroute
destination
AS path
closer to destination ?
1.2.3.0/24 8 11 4 2 5

6 2 5
  • traceroutes that are likely to traverse AS 2
  • from servers in AS 8, 11, 4, 6 to prefix
    1.2.3.0/24
  • If ALL paths to 1.2.3.0/24 includes AS 2
  • from anywhere to 1.2.3.0/24
  • from 1.2.3.0/24 to anywhere

59
Path reduction
  • Skip repeated traces of the same path
  • Same destination, same ingress point
  • Same ingress point, same egress point

60
effectiveness of selected measurements
  • Brute-force (all servers to all BGP prefix)
  • 150 million traceroutes required
  • Direct probing
  • 15 million traceroutes required
  • Direct probing path reduction
  • 300 thousand traceroutes required

61
Alias resolution
  • Alias traceroute reports the IP address of the
    interface on the router (not the router!)
  • The router might have multiple interfaces
  • Routers interfaces may be numbered from entirely
    different IP prefixes
  • Need to know interface 1 and
  • 2 are on the same router

62
Alias probe
  • If you send an UDP packet to interface A of a
    router and address to a non-existing port
  • By default, the router will return a ICMP port
    unreachable response back to you
  • The source address of ICMP packet will be the
    outgoing interface for the unicast route to you
    (interface B)
  • if we probe interface X and Y
  • and the resulting ICMP packets
  • have the same source address Z,
  • then we know X and Y are on
  • the same router

63
Other tricks for resolving alias
  • Compare TTL
  • Compare IP identifier (ID)
  • Packets sent consecutively will have consecutive
    IP identifier
  • Send probe packets to two potential aliases
  • Send another packet to the address that responded
    first
  • Aliases if x lt y lt z, and z x is small

64
Identify router location
  • Utilize DNS names
  • ISP typically use certain naming convention to
    name their routers
  • s1-bb11-nyc-3-0.sprintlink.net
  • A Sprint backbone router (bb11) in New York city
    (nyc)
  • p4-0-0-0.r01.miamfl01.us.bb.verio.net
  • A Verio backbone router (bb) in Miami, Florida
  • s1-neighborname.sprintlink.net
  • A neighboring router of Sprint

65
A typical POP structure
  • POP (Point Of Presence)
  • Consist of a set of backbone and access routers
  • Backbone routers
  • connect to other ISPs
  • typically fully connected
  • within the POP
  • Access routers
  • Connect to customers
  • Connect to routers from
  • the neighboring domains
  • Connect to two backbone routers
  • for redundancy

POP
66
ISP peering structure
  • Using BGP table
  • AS level whether two ASes peer with each other
  • Using Rocketfuel
  • Router level where and how many places these two
    ASes exchange traffic
  • Skewed distribution
  • ISP typically peer in a lot of places with a
    small number of other ISPs, and peer in only a
    few places with the most of other ISPs

67
Tool for measuring performance
  • Throughput
  • iperf
  • Bottleneck link Bandwidth
  • Pathchar
  • Packet Pair (Bprobe/Nettimer)
  • Latency
  • Ping
  • One second resolution
  • Hping3 can provide a higher resolution
  • traceroute
  • Loss
  • tcpdump

68
iperf
  • Need to setup a client and a server
  • Iperf -s -c lthostnamegt

69
Bottleneck Link Bandwidth Estimation
  • RTT variation
  • Dispersion of packet pairs/trains

70
RTT variation
s data packet size ste ICMP packet size bi
available bandwidth c light speed fi process
packet
71
Pathchar
  • Send a set of packet to the router
  • increase the packet size and repeat (1) again
  • Estimate the link bandwidth by solving the linear
    equations obtained from (1)(2)
  • Repeat (1)-(3) for each link on the path
  • Find the minimum of (4)

72
Packet Pair (ideal)
  • send a sequence of TCP probe packets
  • packets are queued before entering the bottleneck
  • a gap PrPb is created by the bottleneck link
  • bottleneck link bandwidth packet size / As

73
Life is not perfect
  • Lots of noise will affect the estimated
    bandwidth!
  • Effect of cross traffic
  • Packets are not queued before the bottleneck
    (case B)
  • Packets are queued again after the bottleneck
    (case C)
  • Packets arrive out-of-order
  • Packets traverse different path
  • Bottleneck changes over the course of connection
  • Router does not use a FIFO queue
  • Clock resolution

74
Filter the noise
  • Assumption correct estimate will appear more
    frequent than incorrect ones
  • Choose the one has higher density
  • histogram (bprobe)
  • kernel density estimator (nettimer)

75
bprobe
76
Tool for measuring multicast
  • Mtrace (IGMP)
  • mHealth (RTCP Mtrace)
  • Mlisten (RTP/RTCP)
  • RTPmon/RTPtools
  • Mantra

77
Mtrace
  • Multicast version of traceroute
  • Show the route from a receiver to the source
  • Traceroute
  • Based on increasing ICMP TTL
  • Does not work for multicast
  • ICMP TIME_EXCEED is typically disabled by
    multicast router
  • Use IGMP (Internet Group Management Protocol)
  • Multicast router keeps the state of
    incoming/outgoing interfaces of (S,G)
  • Reverse path lookup
  • Start at the receiver and trace back toward the
    source
  • Allow 3rd-party mtrace

78
IGMP
79
Reverse Path Lookup
  • Multicast IGMP Query packet on ALL-ROUTERS
    multicast address (224.0.0.2)
  • The last hop router of the receiver begins a
    mtrace after receiving the Query packet
  • The last hop router appends its info and change
    the packet type from Query to Request
  • The last hop router forward the packet via
    unicast to the previous router, the incoming
    interface of (S,G)
  • Same process is repeated until the source is
    reached
  • The router that connects to the source appends
    its info and change packet type from Request to
    Response
  • Response packet is then sent to the mtrace
    initiator

80
RTP/RTCP
  • RTP (Real Time Protocol)
  • TCP does not work for real time multicast
  • ACK implosion and timing requirements
  • Application Layer Framing (ALF) between
    Transport and Application
  • Commonly used in Mbone and streaming tools
  • Payload type ID, sequence numbering, timestamping
  • Consist of a data channel and a control channel
    (RTCP)
  • RTCP
  • A control protocol of RTP
  • Function
  • Deliver quality
  • Canonical name synchronize data from multiple
    tools (audio/video)
  • Estimate group size
  • Distribution of group membership info
  • Packet format
  • Sender report
  • Receiver report
  • Source description
  • Canonical name

81
Mhealth
  • A graphical multicast monitor tool
  • Collect data of a MBone session
  • listen RTCP traffic to obtain group information
    and deliver quality
  • Use Mtrace to trace the hops from each receiver
    to the source

82
Mlisten
  • A tool for collecting info when members join and
    leave a multicast group
  • Continuously monitor well-known multicast address
    used to advertise Mbone session
  • For each session, Mlisten join the audio and
    video groups and collect control and data packets
  • For each packet received, Mlisten record
  • Sender
  • Session name
  • Time received
  • At periodic interval, Mlisten identify any
    session or group members who has no activity for
    a threshold of period (session 2hr, member 2
    min) and record them

83
RTPMon
  • A tool that display the statistics of a RTP
    session by passively monitoring the RTCP traffic
  • Startup time
  • Sender
  • Receivers
  • Traffic statistics for each (sender,receiver)
    pair
  • Data sent
  • Loss
  • jitter
  • Route from the sender to a receiver (via Mtrace)

84
RTPtools
  • a number of applications that can be used for
    processing RTP data
  • rtpsend
  • generate RTP packets from a text file, generated
    by hand or rtpdump
  • rtpdump
  • capture and print RTP packets, generating output
    files suitable for rtpplay and rtpsend
  • rtpplay
  • play back RTP sessions recorded by rtpdump

85
Mantra..
  • A tool that collect multicast from multiple
    multicast-enabled routers
  • FIXW the largest multicast exchange point in
    west coast of US
  • STARTAP a core router between Interenet2 and
    commodity Internet
  • DANTE an exchange point between US and European
    research backbone
  • ORIX
  • Router View

86
..Mantra
  • Data collection
  • MBGP (Multicast Border Gateway Protocol)
  • A router exchange protocol that propagate
    topology information between domains
  • DVMRP (Distance Vector Multicast Routing
    Protocol)
  • Within the same domain
  • MSDP (Multicast Source Discovery Protocol)
  • A protocol that propagates info about active
    sources
  • Router forwarding tables

87
Tools for measuring wireless
  • Prismdump (or newer version of tcpdump)
  • 802.11
  • Ethereal (tcpdump with a GUI)
  • tethreal
  • netstumbler
  • wireless extension
  • Snort-wireless
  • A wireless intrusion detection system

88
netstumbler
  • A tool for detecting 802.11 WLAN
  • Usage
  • Verify if the WLAN is setup correctly
  • Detect other interfering WLANs in your area
  • Help aim directional antenna for long-haul WAN
    link
  • WarDriving

89
Wireless extension
  • API that allows a driver to access to the
    configuration and statistics of WLAN
  • Components
  • Use interface and tool
  • Driver interface

90
User interface and tool
  • cat /proc/net/wireless
  • Iwconfig
  • Iwspy
  • For mobile IP test
  • Allow driver to add
  • new addresses

91
Driver interface
  • Defined in /usr/include/linux/wireless.h
  • Example
  • get_wireless_stat
  • ioctl calls SIOCSIWFREQ

92
Measuring mobility
  • GPS
  • Association/disassociation patterns from base
    stations/access points
  • Tools SNMP, Syslog
  • Wireless signal strength
  • infer user location based on analysis of signal
    strength

93
What is War Driving?
  • Record the activities of wireless LANs from place
    to place
  • What do you need for War driving
  • a device capable of receiving an 802.11b signal
    (notebook w/ wireless card)
  • a device capable of moving around (some
    transportation)
  • A software that can log data (netstumbler/etherea
    l/GPS)
  • Then you just sit back and relax
  • You move these devices from place to place
  • Over time, you build up a database comprised of
    the network name, signal strength, location, and
    ip/namespace in use.

94
What is Wireless LAN?
  • It is a LAN
  • Extension of Wired LAN
  • Use High Frequency Radio Wave (RF)
  • Speed 2Mbps to 54Mbps
  • Distance 100 feet to 15 miles

95
Different version of 802.11
  • 802.11
  • IEEE family of specifications for WLANs
  • 2.4GHz 2Mbps
  • 802.11a
  • 5GHz, 54Mbps
  • 802.11b
  • Often called Wi-Fi, 2.4GHz, 11Mbps
  • 802.11e
  • QoS Multimedia support to 802.11b 802.11a
  • 802.11g
  • 2.4GHz, 54Mbps
  • 802.11i
  • An alternative of WEP

96
Access points
  • Access Point (AP)
  • A device that serves as a communications "hub"
    for wireless clients and provides a connection to
    a wired LAN
  • Beacon
  • Message transmitted at regular intervals by the
    Aps (100ms by default for many vendors)
  • Used to maintain and optimize communications to
    automatically connect to the AP

97
Ad-hoc mode
  • Ad Hoc Mode
  • Wireless client-to-client communication, the
    opposite is Infrastructure Mode

98
Infrastructure mode
  • Infrastructure Mode
  • A client setting providing connectivity to APs
  • As oppose to AdHoc Mode

99
Basic service set
  • SSID or BSSID
  • Basic Service Set Identifier

BSS An AP forms an association with one or more
wireless clients is referred to as a Basic
Service Set
100
Extended service set
ESS In order to increase the range and coverage
of the wireless network, one needs to add more
strategically placed APs to the environment to
increase density. This is referred to as an
Extended Service Set
  • ESSID
  • Extended Service Set Identifier

101
Non-overlapping channels
102
DSSS Channel
103
The RFMON mode
  • Like promiscuous mode in wired
  • Listen(Receive) only
  • Also known as Monitor Mode
  • You can capture raw 802.11 (such MAC-layer
    packets in this mod)
  • Many drivers now support RFMOD mode
  • Prism2
  • madwifi

104
Snort-wireless
  • Extended from Snort (an IDS for Internet) for
    wireless
  • allow one to specify custom rules for detecting
    specific 802.11 frames, rogue APs, AdHoc
    networks, and Netstumbler-like behaviour in the
    vicinity of the Snort-Wireless sensor

105
Snort format
  • ltactiongt wifi ltmacgt ltdirectiongt ltmacgt (ltrule
    optionsgt)
  • Use source and destination MAC address instead of
    IP address

106
ltactiongt
  • tells Snort what to do when it finds a packet
    that matches the rule criteria
  • alert generate an alert and then log the packet
  • Log log the packet
  • pass ignore the packet
  • Activate alert and then turn on another dynamic
    rule
  • Dynamic remain idle until activated by an
    activate rule , then act as a log rule

107
ltmacgt
  • Format
  • Single MAC Address00DEADBEEF00
  • MAC Address List 00DEADBEEF00,
    00DEADC0DE00, ....

108
ltdirectiongt
  • -gt
  • From source to destination
  • ltgt
  • Both directions

109
What info you can get from wireless packets
  • timestamp
  • Signal strength
  • SSID
  • Sender/receiver
  • Retransmission
  • Mobility (association/disassociation)

110
Received Signal Strength Indication
  • In arbitrary units (different vendors define it
    in different ways)
  • RSSI is typically used to determine when the
    amount of radio energy in the channel is below a
    certain threshold at which point the network card
    is clear to send (CTS).

111
Noise floor
  • Typically assumed as a constant
  • the noise power N kTB
  • where k is Boltzmann's constant, T is the
    temperature in Kelvin, B is the system bandwidth
  • For a 20Mhz OFDM channel we have -174
    10log10(20x106), or -101.7dBm thermal noise at
    the antenna. After including an additional 5dBm
    noise from the amplifier chain, we have -96dBm
  • RSSI 10 weak, 20 ok, 40 good
  • RSSI changes with time due to interference,
    channel fading etc.

112
What is signal strength?
  • Four common units for measuring RF signal
    strength
  • mW
  • dBm
  • RSSI
  • percentage

113
mW lt-gt dBm
  • dBm log10(mW) x 10
  • Example
  • 100mW log10(100) x 10 20 dBm
  • 50mW log10(50) x 10 16.9 dBm
  • 1mW log10(1) x 10 0 dBm
  • 0.5mW log10(0.5) x 10 -3.01 dBm
  • Its cumbersome to talk about 96 dBm as
    0.0000000002511 mW

114
RSSI
  • 802.11 standard
  • A mechanism by which RF energy is measured on the
    circuitry of a wireless NIC
  • An allowable range from 0 to 255
  • In reality
  • No vendor actually measures 256 different signal
    strength level
  • Use RSSI_Max
  • Cisco 100
  • Symbol 30
  • Atheros 60

115
Use RSSI
  • Chipset uses RSSI to decide if the channel clear
  • Clear channel threshold
  • Roaming threshold
  • RSSI_MAX is different from vendor to vendor
  • Clear channel/roaming threshold is different from
    vendor to vendor

116
Granularity of RSSI
  • RSSI are discrete integer numbers
  • Can not represent all possible energy levels (mW
    or dBm)
  • Many vendors map RSSI to dBm because of the
    logarithmic nature of dBm

dBm
5mW
117
RSSI lt-gt dBm
  • Most vendors use a table to map RSSI to dBm
  • Atheros
  • dBm RSSI 95
  • Cisco

118
Receive sensitivity
  • The minimum level of RF energy for the receiver
    to extract bit-stream
  • A NIC spec measured in dBm
  • Signal and noise are not distinguishable below
    receive sensitivity
  • Very close to RSSI0
  • Impossible to measure RSSI0
  • Cant decode a packet
  • The higher data rate, the high receive
    sensitivity required

119
Percentage metrics
  • RSSI RSSI_MAX percentage
  • E.g. for Atheros card, 50 60 50 RSSI 30
  • Good for site survey

120
What is signal quality?
  • In 802.11b standard
  • PN code correlation strength
  • In the context of DSSS modulation
  • Symbol
  • Data bits PN code (called spreading)
  • E.g. At 1Mbps/2Mbps
  • 1 single bit of data XORed 11-bit-long PN code
    (Barkers sequence, 101100111000)

121
Symbol correlation
  • symbol for 1 101100111000
  • received symbol 101100111001
  • symbol for 0 010011000111
  • received symbol 101100111001
  • the received symbol is closer to 1 than to 0
  • signal quality percentage of correct bits
  • reflect the corruption
    between AP and client
  • but not necessarily equal
    to SNR

122
Question?
123
Things to know when making measurements
  • Its not just plugging in a box and then start
    sniffing traffic
  • Administrative issue
  • Privacy and security
  • Technical issue
  • Error and imperfections
  • Large volume of data
  • Reproducible results
  • Making data publicly available

124
Error and imperfections
  • Precision
  • Limited by the measurement devices
  • Clock precision
  • How much details to record
  • Accuracy
  • Packet drops during recording or filtering
  • Duplicate or re-ordering due to packet filter
  • Clocks
  • Un-synchronized clocks
  • Buffered packets at NIC
  • Effect of middle-box
  • Trace edge-effect
  • Representative data

125
precision
  • Consider a tcpdump record
  • 1092727442.276251 IP 192.168.0.12022 gt
    192.168.0.1379320
  • How precise is it?
  • Answer at most 1 us, but perhaps much less

126
How precise is the packet captured by tcpdump?
  • Snapshot length limits the total data
  • filtering

127
Maintain meta data
  • E.g. when, where, how the traces are recorded
  • Giving the measurements a context
  • Meta data is important when the measurement is
    used by other people later for different purposes
  • Existing tools are weak here
  • Can be your potential project topic

128
Accuracy
  • An even harder problem than precision
  • Examples
  • Clock
  • arbitrarily off from true time
  • Jump forward or backward
  • Fail to move
  • Run arbitrarily fast or slow
  • Packet filter
  • Drop packets
  • Fail to report drops
  • Report drops that did not occur
  • Reorder packets
  • Duplicate packets
  • Record the wrong packets

129
Not measuring what you think youre measuring
  • Examples
  • Measuring TCP packet losses by counting
    retransmission
  • Packets can be replicated by the network
  • Counting TCP connection size by counting the
    difference between SYN and FIN
  • What if the remote host was down?

130
Calibration
  • Detect problems of precision/accuracy/misconceptio
    n
  • Goal Fix these problems post facto
  • Identify and remove faulty measurements
  • Find the outliers
  • E.g. what are the biggest and smallest RTT in the
    measurements?

131
Self-consistency check
  • Check against the expected protocol behavior
  • E.g. if a TCP receiver acknowledged data never
    sent, something must be wrong
  • Filter drops the data
  • Packet took another route
  • Data was sent before you measured
  • The TCP receiver is broken

132
Compare multiple measurements
  • Compare packets at both ends
  • Compare packet headers with payload
  • Compare measurements collected at different times

133
Techniques to detect inaccuracy
  • Examine outliers and spikes
  • Outliers unusually low or high values
  • Spikes values that appear a lot
  • E.g. extremely small RTT or extremely large
    connection
  • Consistency check
  • Compare against normal protocol/traffic behavior
  • Comparing multiple measurements
  • From different time
  • From different places
  • Use synthetic data to verify the correctness of
    software

134
Large volume of data
  • Disk space
  • Number of files
  • Process time
  • Memory usage
  • Maximum file size
  • 2G for older version of Linux
  • Software limitation
  • The number of data points can be input
  • Statistical limitation
  • Large datasets do not have statistically exact
    description
  • Tip early analysis with a smaller dataset

135
Reproducible results
  • One often cant reproduce the results from a
    complex measurements study due to
  • Tip
  • Version control
  • Detailed notebook

136
Re-producible analysis
  • A typical scenario
  • you collected the measurements, did the analysis
    and submitted the results to a conference
  • Months later, you got a feedback from the
    reviewer that asks you to re-do the measurements
    with a tweak
  • What would you do?
  • Introduce the tweak, re-crunch the numbers,
    update the table and then call it done
  • Or, you first re-run your scripts to understand
    how you got those numbers in the first place

137
But
  • For a good-sized measurement study, you often can
    not re-produce the exact earlier numbers
  • Youve lost the previous mental context of fudge
    factors, glitch removals, script inconsistency
  • Ad hoc notes
  • Removal of outliers
  • Random fixes
  • Different versions of analysis scripts
  • Rounding the numbers

138
Strategies
  • One single master that builds all results from
    raw data
  • Keep intermediary form of the data
  • Maintain a notebook
  • What have been done and what happened
  • Use version control
  • Need a way to visualize the changes after the
    re-run
  • Another potential project topic

139
Make data publicly available
  • Comment details about how measurements were taken
  • Where and when
  • Link properties (speed, utilization, loss, etc.)
  • Include analysis scripts that were used
  • Anonymization
  • Security, privacy, business sensitivity
  • Data-reduction request

140
Measurement infrastructure
  • Administrative issues
  • Its not easy to get fresh data by yourself
  • Places where you can get some existing data
  • NLANR
  • ITA
  • MAWI
  • NIMI
  • CAIDA
  • Internet 2

141
NLANR
  • Passive Measurement Analysis (PMA)
  • Active Measurement Project (AMP)

142
PMA
  • Collect passive IP header trace ranging from OC3
    to OC192 links
  • Each monitor captures a unique portion of overall
    network data
  • Capture 8 samples per day
  • 2 minutes per sample
  • 3.2G data per day
  • A number of OC48 long, continuous traces
  • From 1 hour to 45 days

143
AMP
  • 150 sites in US and some in other countries
  • Site to site measurements
  • Two meshes
  • HPC mesh (all in US, 140 sites)
  • International mesh
  • Data measured
  • round trip time (RTT)
  • packet loss
  • topology
  • throughput

144
Internet Traffic Archive (ITA)
  • Founded by Vern Paxson since 1996
  • Mainly are Web traces (and some wide-area TCP and
    traceroute traces)
  • Most traces are in the format of tcpdump or http
    log
  • Trace duration ranges from 2 hours to 6 months
  • Related software
  • tcpdpriv
  • Remove private information of tcpdump
  • tcp-reduce
  • a collection of shell scripts for reducing a
    tcpdump trace file to a summary of the
    corresponding TCP connections.
  • tracelook
  • a program for graphically viewing tcpdump traces.

145
MAWI (WIDE project)
  • Japan research efforts
  • Traffic from several trans-Pacific T1 lines, an
    US-Japan OC-3 line and 6Bone
  • Daily traces
  • 2 million packets per hour for trans-pacific
    lines
  • 6Bone traffic is still light (mainly BGP and
    ICMPv6)
  • Traces are in tcpdump format and anonymized with
    tcpdpriv

146
NIMI (National Internet Measurement
Infrastructure )
  • A set of measurement servers (probes) running on
    a set of hosts
  • Function
  • Receive and authenticate request
  • Execute the request at the appropriate time
  • Send the result back the requester
  • Daemon
  • nimid communicate with outside world
  • scheduled scheduling, execute measurements and
    packaging results
  • CPOC (Configuration Point Of Contact)
  • Configure and administer a set of NIMI probes in
    the same administration domain
  • Measurement client (MC)
  • A tool that allow end-user to send measurement
    request to NIMI probe
  • Data Analysis Client (DAC)
  • Where the measurement results are returned
  • The address of DAC is included in the request
    sent by MC

147
CAIDA (Cooperative Association for Internet Data
Analysis )
  • Affiliated with UCSD
  • Provide tools, data and analysis for research
    community
  • Data sources
  • Exchange points e.g. San Diego Network Access
    Point (SD-NAP)
  • Data from FIX-West
  • routing data from University of Oregon's Route
    Views project (www.antc.uoregon.edu/route-views)
    and Merit's IPMA (www.merit.edu/ipma/)
  • active measurement from skitter

148
Internet 2
  • Goal
  • A large-scale edge network for research community
  • Enable revolutionary application
  • Transfer new application/service to commercial
    Internet
  • Consist of 207 universities connected by 3
    networks Abilene, Quilt, ARENA
  • The participants collaborate with each other on
    studying and identifying, developing, and testing
    advanced network services, applications and
    technologies
  • Focus on end-to-end performance measurements
  • Active routing, delay, loss
  • Passive SNMP, Netflow

149
Question?
150
Case study
  • OK, now Ive told you all about how to measure,
    but can we do something with the measurements?
  • Case study
  • Modeling heavy-hitter traffic

151
Heavy hitters
  • Definition
  • A small percentages of flows but carry the
    majority of the bytes
  • Why is it important?
  • Anomaly and attack detection
  • Scalable differentiated service
  • Usage-based pricing and accounting

152
Modeling of heavy-hitter flows
  • People have characterized heavy-hitters in
    different ways
  • Size
  • Elephant and mice (Floyd et al)
  • Size gt 1 of link bandwidth
  • Duration
  • Tortoise and dragonfly (Brownlee and Claffy)
  • Duration gt 15 minutes
  • Burstiness
  • Alpha and beta traffic (Sarvotham and Riedi)
  • Burstpeak gt Aggu 3 Aggdev
  • How do they relate?
  • important for traffic engineering and modeling
    purpose
  • Traces used Los Nettos, NLANR

153
Our methodology
  • Study flows in four dimensions size,
Write a Comment
User Comments (0)
About PowerShow.com