Title: Finding Network Problems that Influence Applications: Measurements and the E2EPI
1Finding Network Problems that Influence
ApplicationsMeasurements and the E2EPI
- Matt Zekauskas, matt_at_internet2.edu
- Windows on the Future
- Columbus, OH
2Overview
- Network Host contributing problems
- General and specific solutions
- Host Issues
- Measurement infrastructure
- Current Infrastructures
- Abilene views from the center
- Tools
3We Need Your Help
- What problems are you experiencing?
- Have you used a good tool?
- Give us the benefit of your experience
successful problem resolution?
4What Are The Problems?
- Packet loss
- Jitter
- Out-of-order packets (extreme jitter)
- Duplicated packets
- Excessive latency
- Interactive applications
- TCPs control system
5The Usual Suspects
- Host configuration errors
- Duplex mismatch (ethernet)
- Wiring/Fiber problem
- Bad equipment
- Bad routing
- Congestion
- Real traffic
- Unnecessary traffic (broadcasts, multicast,
denial of service attacks)
6Duke - Frankfurt
Goal 7 Mbps bi-directional UDP
Duke
Frankfurt
Teleglobe ATM
DFN
NCREN
Abilene
25 Broadway
Dante
WASH
NYCM
60 Hudson
Surveyor node
7The Duke-Frankfurt Problem
Abilene to Dante Link Utilization http//monon.uit
s.iupui.edu/abilene/nycm/dante-bits.html
8JPL/Caltech GSFC
- The situation
- Using Abilene
- Tuned hosts
- Things work locally
- Therefore it MUST be Abilene
- Tests show good flows router-router
- Intermediate tests point towards CA
- Bad fiber connection!
9Strategy
- (See dast.nlanr.net, ncne.nlanr.net)
- Most problems are local
- Test ahead of time!
- Is there connectivity reasonable latency?
(ping) - Is routing reasonable (traceroute)
- Is host reasonable (web100 java others)
- Is path reasonable (iperf)
10More Strategy Pointers
- http//dast.nlanr.net/Guides/GettingStarted/
- http//ncne.nlanr.net/research/tcp/
- http//www.ncne.nlanr.net/research/tcp/debugging/
- http//ncne.nlanr.net/research/taad/
11Application-Oriented Tools
- Pioneer
- http//pelle.internet2.edu/pioneer/
- Synthesis of existing infrastructure
- Focus video conferencing tests
- Goal use this to tell if video likely to work
- See also
- VideNet Scout
- Beacon being developed at the Ohio ITEC
12Internet2 Detective
- A prototype
- Windows tray application
- Red/green lights, am I on Internet2
- Multicast available
- IPv6 available
- Look for something on/after spring members
meeting
13Reference Servers
- H.323 conferencing
- Goal portable machines that tell you if system
likely to work (and if not, why?) - Moderate-rate UDP of interest
- E.g., ViDeNet Scout, http//scout.video.unc.edu/
- http//pelle.internet2.edu/pioneer/
- (TCP) Performance debugging
- Conjecture 80 of problems related to
- Host tuning (mostly buffers)
- Duplex mismatch path
- Other physical connection problem path
- Hard data from Claudia DeLuna, JPL also hearsay
14TCP Performance Tester
- Connect from suspect host via Web
- Ex http//www.dslreports.com/tweaks
- Ex http//noc.greatplains.net/measurement/
web100 bw test - Host tuning
- See Web100, TCP tuning pages
- Duplex Mismatch / Bad Physical Conn
- Low to moderate UDP tests, view loss
- Step functions duplex
- Constant rate loss bad physical
- Collocated traceroute server to check routing
15Packet Reflector
- Idea by Matt Mathis (good stuff his, bad mine)
- Test application with both ends in your lab, but
packets get full Internet experience (delay,
loss, jitter) - Example app machine is pointed at a router that
tunnels packets to remote location, then routed
normally back to other app machine - Benefit not application or transport dependent
16Packet Reflector
17For TCP Tuning
- Consider Web100 for Linux 2.4
- http//www.web100.org/
- NCNE Tuning Page
- http//www.psc.edu/networking/perf_tune.html
- http//www.ncne.nlanr.net/research/tcp/
18Host/OS Tuning Web100
- Goal TCP stack, tuning not bottleneck
- Large measurement component
- TCP performance not what you expect?Ask TCP why!
- Receiver bottleneck (out of receiver window)
- Sender bottleneck (no data to send)
- Path bottleneck (out of congestion window)
- Path anomalies (duplicate, out of order, loss)
- www.web100.org
19For TCP (and Streaming)
- Eliminating loss is the goal
- Focus on noncongestive losses
- TCP 100 Mbit Ethernet coast-to-coast
- Full size packets need 10-6 Ploss Mathis
- Less than 1 loss every 83 seconds
- http//www.psc.edu/mathis/papers/JTechs200105/
- GigE/655 10-8, 1 loss every 497 seconds
20One Technique ProblemIsolation via Divide and
Conquer
21If you like living on the bleeding edge
- Matt Mathis just released pathprobe, a Web100
tool, as a way to test campus networks
suitability for long-haul TCPs. - http//www.psc.edu/web100/pathprobe/
-
22End-to-End Measurement Infrastructure vision
- Ongoing monitoring to test major elements, and
(some, important) end-to-end paths. - Elements gigaPoP links, peering,
- Utilization
- Delay
- Loss
- Occasional throughput
- Multicast connectivity
23End-to-End Measurement Infrastructure Vision II
- There are many more paths end to end than can be
monitored. - Diagnostic tools available on-demand (with
authorization) - Show routes
- Perform flow tests (perhaps app tests)
- Parse/debug flows (a-la tcpdump or OCXmon with
heuristic tools)
24Implementation Considerations
- Define standard tests
- Not tied to one platform
- Uniform access to schedule tests, retrieve
results - Results in uniform format
- Capitalize on existing metrics, tools,
infrastructure - Consider commercial products, how to influence
vendors (longer-term) - Rigorously document
25Enabling Divide Conquerand Ongoing Monitoring
Wall Jack
P
P
Wall Jack
26Measurements from the Center
- Active
- Measurement within Abilene
- Measurement end to end
- Passive
- SNMP stats (esp. core Abilene links)
- Variables via router proxy
- Characterization of traffic
- Netflow OCxMON
27Measurement Projects
- AMP (round-trip delay, loss, routing)
- moat.nlanr.net/AMP
- 120 Internet2 campuses
- Surveyor (one-way delay, loss, routing)
- www.advanced.org/surveyor
- On many Internet2 campuses (70 sites)
- Abilene presence
- PMA (passive, packet traces)
- moat.nlanr.net/PMA
- 1 min, 8 times a day, 13 sites
28Measurement Projects
- PingER (round-trip delay, routing)
- http//www-iepm.slac.stanford.edu/pinger/
- Long term data from a few locations to many
- High-energy physics focus
- New iperf data application (bbftp, bbcp) data
- NIMI
- http//www.ncne.nlanr.net/nimi/
- Designed to be platform for experiments
- Undergoing some redesign/revitalization
29Usefulness
- AMP, Surveyor, Pinger
- If at your campus, a view from your campus
- If at destination, a view of destination
- Look for campus connected to same gigaPoP if not
at local or destination - Phase 0 measurement points for e2eperf
- Routing, congestion problems
30Abilene
- Abilene goal to be an exemplar
- Measurements open
- Tests possible to router nodes
- Web-mediated on-demand measurements
- Throughput tests routinely through backbone
- as well as existing utilization, etc.
31Abilene Upgrade
- GigE connected high-performance tester
- Latency tester
- Tests that will be available on-demand (web
mediated) - Measurements also collected for research
32Ad-hoc Active on Abilene Today
- Have some OC-3 ATM connected PCs
- With OC-3, can do moderate throughput testing
(e.g., iperf UDP TCP). 90 Mbps - Contact me (matt_at_internet2.edu) if you want to
perform an ad-hoc test
33Passive - Utilization
- The Abilene NOC takes
- Packets in,out
- Bytes in,out
- Drops/Errors
- ..for all interfaces, publishes internal links
peering points (at 5 min intervals) - ..via SNMP polling every 3 sec
- http//hydra.uits.iu.edu/abilene/traffic/
34(No Transcript)
35Abilene Pointers
- http//www.abilene.iu.edu/
- Monitoring
- Tools
- http//www.itec.oar.net/abilene-netflow
- http//netflow.internet2.edu/weekly/ (summaries)
36Some Commercial Tools
- Caveat only a partial list, give me more!
- Spirent (nee Netcom/Adtech)
- working on a box for end-to-end measurements
- SmartBits test at low high rates, QoS test
components or end-to-end path - NetIQ Chariot/Pegasus
- Agilent (like SmartBits, and FireHunter)
- Ixia (like SmartBits/Spirent)
- Brix Networks (like Surveyor, for QoS)
- jaalaM Technologies path debugger
37Some Noncommercial Tools
- Iperf dast.nlanr.net/Projects/iperf
- See also http//www-itg.lbl.gov/nettest/
- http//www-didc.lbl.gov/NCS/
- Flowscan
- http//www.caida.org/tools/utilities/flowscan/
- http//net.doit.wisc.edu/plonka/FlowScan/
- SLACs traceroute perl script
- http//www.slac.stanford.edu/comp/net/wan-mon/trac
eroute-srv.html - One large list
- http//www.slac.stanford.edu/xorg/nmtf/nmtf-tools.
html
38Quilt Measurement Project
- A measurement portal
- Example, GPN http//noc.greatplains.net/measureme
nt/ - SNMP utilization
- Ping data, hops, traceroute
- Iperf server, pchar server
- Router proxy
- TCP tester (options, buffer size, )
- http//www.thequilt.net/measurement/
- (Contact Rick Summerhill, rrsum_at_greatplains.net)
39What Campuses Can Do
- Export SNMP data
- I have an internet2 list, can add you
- Monitor loss as well as throughput
- Performance test point at campus edge
- Netperf or iperf, so can be from anywhere
- Traceroute looking glass
- Commercial (e.g., NetIQ) complements
- Im willing to keep a master list
- Create a measurement portal a-la Quilts
40Initial Measurement Support
- Matt Zekauskas, Internet2
- Ronn Ritke, NLANR/MOAT
- Tony McGregor, U Waikato MOAT
we were both on the design team
41Phase 0 Measurement
- Placing equipment is hard and time consuming
- Active measurement targets most useful to start
- Two infrastructures deployed in community
- AMP and Surveyor
- Can we leverage them?
- NOTE This is NOT meant to be exclusive!
42Phase 0 Measurement Goals
- Leverage existing measurement infrastructures
- Get something up quickly over perfection
- Learn whats missing
- Feedback to improve infrastructure
43AMP and Surveyor
- We have access to these infrastructures
- Both are currently designed to continuously take
low-bandwidth measurements - Coverage overlaps somewhat AMP is more widely
deployed (no GPS requirement) - Placement goals different and complementary
- AMP on typical university LAN
- Surveyor at campus edge Abilene backbone
44Hurdles (Challenges?)
- Both placed with low-bandwidth in mind
- But we desire throughput tests, too
- Additional tests may interfere with existing
tests - Scheduling important
- Secure access desirable
- Cant be denial-of-service platforms
- ?Some machines may not be available
45Initial Steps
- Abilene Surveyors will be throughput targets
- Work on PKI authentication mechanism
- Work on Scheduling
- Both platforms deploy traceroute observatories
- Common listing of both sites
- AMP allows occasional throughput by local campus
personnel only looking to develop common
mechanism with Surveyor - Looking to future tests being common
46Contact Information
- Matt Zekauskas, matt_at_internet2.edu
- Measurements Working Group
- http//www.internet2.edu/measurement/
- End-to-end Performance Initiative
- http//www.internet2.edu/e2epi/
47(Some) URLs
- Http//www.internet2.edu/measurement/
- http//www.advanced.org/surveyor/
- http//moat.nlanr.net/ http//dast.nlanr.net/
- http//www.ncne.nlanr.net/ http//www.ncne.org/
- http//www.caida.org/ http//www.web100.org/
- http//www.auckland.ac.nz/net/Internet/rtfm/
- http//www.slac.stanford.edu/xorg/icfa/ntf/home.ht
ml - http//www.merit.edu/ipma/
48www.internet2.edu
49(No Transcript)
50(No Transcript)
51(No Transcript)