Internet Monitoring - Results - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Internet Monitoring - Results

Description:

www.slac.stanford.edu – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 46
Provided by: cot115
Category:

less

Transcript and Presenter's Notes

Title: Internet Monitoring - Results


1
Internet Monitoring - Results
  • Les Cottrell SLAC
  • ltcottrell_at_slac.stanford.edugt
  • Presented at the ICFA Meeting, CERN, Mar 1998
  • Partially funded by MICS joint SLAC/LBL proposal
    on Internet End-to-end Performance Monitoring
    (IEPM)

2
Outline of Talk
  • What, why how are we (ESnet/HENP community)
    measuring?
  • What PingER measurement reports are available and
    what do they show
  • (short), intermediate long term
  • grouping and multi-site visualization
  • Traffic volume Traceroute measurements
  • Summary
  • Deployment/development, Internet Performance,
    Next Steps
  • Collaborations
  • NIMI/IPWT

3
Why go to the effort?
  • Apparent quality of Internet getting worse as
    size and demands increase
  • Internet woefully under-measured
    under-instrumented
  • Internet very diverse - no single path typical
  • Users need
  • realistic expectations, planning information
  • guidelines for setting and validating SLAs
  • information to help in identifying problems
  • help to decide where to apply resources

4
Importance of Response Time
  • Time is scarcest and most valuable commodity
  • Studies in late 70s and early 80s showed the
    economic value of Rapid Response Time
  • 0-0.4s High productivity interactive response
  • 0.4-2s Fully interactive regime
  • 2-12s Sporadically interactive regime
  • 12s-600s Break in contact regime
  • gt600s Batch regime
  • Threshold around 4-5s complaints increase
    rapidly.
  • Voice has threshold around 100ms

5
Perception of Poor Packet Loss
  • Above 4-6 packet loss video conferencing becomes
    irritating, and non native language speakers
    become unable to communicate.
  • The occurrence of long delays of 4 seconds or
    more at a frequency of 4-5 or more is also
    irritating for interactive activities such as
    telnet and X windows.
  • Above 10-12 packet loss there is an unacceptable
    level of back to back loss of packets and
    extremely long timeouts, connections start to get
    broken, and video conferencing is unusable.

6
Our Main Metric is Ping
  • Universally available, easy to understand
  • no software for clients to install
  • Low network impact
  • Provides useful real world measures of loss,
    response time, reachability, unpredictability

7
Ping Response vs Web Response 1/2
8
Ping Response vs Web Response 2/2
9
Ranked packet loss for 3 months
Stanford
Rome
UK
Cincinnatti
10
Sawtooth Effect
2 capacity ( 2Mbps)
Added 45 Mbps (quadrupled capacity)
3 capacity 9 Mbps
Holidays
11
RAL Last 180 Days plot
Lines are simply cubic splines fits to aid
eye Upper green and black points are response
time in ms Red blue are weekday loss Cyan are
weekend loss Note weekend/weekday differences
(cyan vs blue) Note Xmas/New Year lull Also note
quick onset of saturation at end August
September
12
Italian sites look similar to each other
13
Representative International HENP Site Loss
Jan-95 thru Nov-97
  • Note RL (UK) saw-tooths as add UK-US bandwidth
    (Apr-96, Feb-97, Aug-97)

14
Aggregation
  • Group measurements, for example
  • by area (e.g. N. America E, N. America E, W.
    Europe/Japan, others, by country)
  • trans-oceanic links, intercontinental links
  • separation e.g. number of hops, time zones
    crossed, IXPs crossed
  • ISP (ESnet, vBNS/I2, ...)
  • by monitoring site
  • one site seen from multiple sites
  • common interest/affiliation (XIWT, HENP )
  • user selectable

15
Group Selection (all sites monitoring CERN)
Select one of these groups
CMU CMU CNAF RL FNAL SLAC DESY DESY Carelton RMKI
RMKI CERN KEK
16
Group Response Time Jan-95 Nov-97
  • Improved between 1 and 2.5 / month
  • Response Loss similar improvements
  • care with new sites

17
Network Quiescence
  • Frequency of zero packet loss (for all time - not
    cut on prime time)

18
Ping Loss Quality
  • Want quick to grasp indicator of link quality
  • Loss is the most sensitive indicator
  • loss of packet requires 4 sec TCP retry timeout
  • Studies on economic value of response time by IBM
    showed there is a threshold around 4-5secs where
    complaints increase.
  • 0-1 Good 1-2.5 Acceptable
  • 2.5-5 Poor 5-12 Very Poor
  • gt 12 Bad

19
Quality Distributions
  • ESnet median good quality
  • All other groups poor or very poor
  • Critical to have good peering

20
Multi Collection Site Visualization
Collection Sites
Remote Sites
21
Intercontinental Grouping (Loss)
  • Move mouse over ? to see links

Looks pretty bad for intercontinental use
22
Top Level Domain Grouping (Loss)
Mouseover red dots gives more information on TLD
(e.g. chSwitzerland) Diagonals are within TLD
23
TLD (Response Time)
24
Grouping Details
Select metric
Select group
Sort
Color for quality
Also provides Excel for DIY at bottom
25
Recent Transoceanic trends
26
By Monitoring Site
27
CERN Monitoring TLDs
28
ESnet bytes accepted by site for Jan 98
Exchanges
LBL/ESnet
29
US HENP Traffic Growth
Exponential growth from 3-6
30
Multi Router Traffic Grapher (MRTG)
CERN-US E1(2Mbps) link
Added 2nd 2Mbps link
31
Traffic Volume for Germany (DFN)
DFN T1 Utilization 15 Jan 98 (5 min averages)
Green to US Blue from US
DFN T1 Utilization for 15 Jan 98 (5 min averages)
of 2 min periods in Dec-96 with peak
utilization gt y
From US
Samples
To US
32
Capacity/Load Ratios
  • Looking at the link capacity/average load
  • Most ESnet links show ratios of a few to several
    tens
  • The international links (CERN-Perryman (4), DFN
    (5), Italy (4), KEK (10), Canada (15)) show
    ratios of 4-15
  • The worst link appears to be the MAE-W-ESnet link
    at about 1.5 ratio
  • However this may not be the bottleneck link

33
Bottlenecks
  • Identification
  • Traceroute
  • from/to multiple sites can identify common path
    segments in the maps
  • Can see onset of losses with traceping
  • Pathchar can identify bottlenecks
  • Then need to work on
  • avoiding bottlenecks (new peering)
  • getting bottleneck owners to improve
  • this is difficult, lots of potential bottlenecks,
    bottlenecks move, not under our control

34
TracePing (Oxford)
Muliple routes seen
35
Traceroute
  • Reverse traceroute servers
  • Traceping
  • TopologyMap
  • Ellipses show node on route
  • Open ellipse is measurement node
  • Blue ellipse no reachable
  • Keeping history

From TRIUMF
36
GUI Traceroute (e.g. VisualRoute)
37
Summary
  • Deployment Development
  • ESnet/HENP has 14 Collection sites in 8 countries
    collecting data on gt 500 links involving 22
    countries
  • XIWT/IPWT deployed 10 collection sites using
    PingER tools
  • 600MB/month/link, 6 bps/link, .25 FTE _at_ analysis
    site, 1.5-2.5 FTE on analysis
  • HEPNRC gathering, archiving
  • Long term reports being ported to HEPNRC from
    SLAC
  • Long term analysis today usually requires tool
    like SAS

38
Summary
  • Deployment Development
  • Internet Performance
  • Performance within ESnet is good
  • Performance between ESnet other sites is poor
    to very poor on average
  • one of main causes is congestion points, so
    peering is critical
  • Intercontinental performance is very poor to bad
  • ESnet traffic accepted from major HENP labs
    growing by 3-6 per month
  • Response time improving by 1-2 / month
  • Packet loss improving between SLAC other sites
    by 3 / month

39
Summary
  • Deployment Development
  • Internet Performance (continued)
  • Links to sites outside N. America vary from good
    (KEK) to bad
  • Some of the bad sites are to be expected, e.g.
    FSU, China, Czeck Republic, some surprises such
    as UK
  • CERN, France, Germany acceptable to poor

40
Summary
  • Deployment Development
  • Internet Performance
  • Next Steps
  • Improve tools
  • Make long term reports at Analysis site available
    understandable
  • Look into prediction (extrapolations, develop
    models, configure and validate with data)
  • Pursue IETF Surveyor NIMI deployment

41
National Internet Measurement Infrastructure
(NIMI)
  • Secure, scalable infrastructure for scheduling
    monitoring, gathering data
  • Minimal amount of human intervention
  • Inexpensive probe built on PC FreeBSD platform
  • Dynamic - can add/modify measurement suites,
    initially includes
  • Traceroute
  • TReno - measures bulk transfer thruput
  • Poip - one way ping

42
Asymmetric One-way Delays
20
U Chicago to Advanced
Advanced to U Chicago
Loss
Loss
0
300ms
Delay
Delay
0ms
0
24
43
NIMI
  • Deployed at PSC, LBL, FNAL, platforms being
    configured at SLAC CERN
  • As NIMI becomes more real will start to use as
    infrastructure for IPPM Surveyors
  • Security
  • allows full policy control over any box you own
    or delegation of all or subsets
  • uses ACLs with authentication for requests, and
    encryption to prevent sniffing

44
Summary
  • Deployment Development
  • Internet Performance
  • Next Steps
  • Lots of collaboration
  • SLAC HEPNRC
  • 14 collection sites, 400 remote sites
  • Collection site tools CERN CNAF/ICFA
  • Oxford/TracePing
  • MapPing/MAPNet/NLANR
  • TRIUMF Traceroute topology Map
  • NIMI/LBNL Surveyor/IETF
  • XIWT/IPWT
  • Talks at IETF, XIWT, ICFA, ESCC ...

45
More Information
  • ICFA Monitoring WG home page (links to status
    report, meeting notes, how to access data, and
    code)
  • http//www.slac.stanford.edu/xorg/icfa/ntf/home.ht
    ml
  • WAN Monitoring at SLAC has lots of links
  • http//www.slac.stanford.edu/comp/net/wan-mon.html
  • Tutorial on WAN Monitoring
  • http//www.slac.stanford.edu/comp/net/wan-mon/tuto
    rial.html
  • MapPing Tool
  • http//www.slac.stanford.edu/warrenm/work/java/ne
    wjava/mapping.html
  • NIMI http//www.psc.edu/mahdavi/nimi_paper/NIMI.h
    tml
Write a Comment
User Comments (0)
About PowerShow.com