Title: Internet2 E2E piPEs Project
1Internet2 E2E piPEs Project
2Internet2 E2E piPEs
- Project End-to-End Performance Initiative
Performance Environment System (E2E piPEs) - Approach Collaborative project combining the
best work of many organizations, including
DANTE/GEANT, Daresbury, EGEE, GGF NMWG,
NLANR/DAST, UCL, Georgia Tech, etc.
3Internet2 E2E piPEs Goals
- Enable end-users network operators to
- determine E2E performance capabilities
- locate E2E problems
- contact the right person to get an E2E problem
resolved. - Enable remote initiation of partial path
performance tests - Make partial path performance data publicly
available - Interoperable with other performance measurement
frameworks
4Measurement Infrastructure Components
5Sample piPEs Deployment
6Project Phases
- Phase 1 Tool Beacons
- BWCTL (Complete), http//e2epi.internet2.edu/bwctl
- OWAMP (Complete), http//e2epi.internet2.edu/owamp
- NDT (Complete), http//e2epi.internet2.edu/ndt
- Phase 2 Measurement Domain Support
- General Measurement Infrastructure (Prototype)
- Abilene Measurement Infrastructure Deployment
(Complete), http//abilene.internet2.edu/observato
ry - Phase 3 Federation Support
- AA (Prototype optional AES key, policy file,
limits file) - Discovery (Measurement Nodes, Databases)
(Prototype nearest NDT server, web page) - Test Request/Response Schema Support (Prototype
GGF NMWG Schema)
7piPEs Deployment
8NDT (Rich Carlson)
- Network Diagnostic Tester
- Developed at Argonne National Lab
- Ongoing integration into piPEs framework
- Redirects from well-known host to nearest
measurement node - Detects common performance problems in the first
mile (edge to campus DMZ) - In deployment on Abilene
- http//ndt-seattle.abilene.ucaid.edu7123
9NDT Milestones
- New Features added
- Configuration file support
- Scheduling/queuing support
- Simple server discovery protocol
- Federation mode support
- Command line client support
- Open Source Shared Development
- http//sourceforge.net/projects/ndt/
10NDT Future Directions
- Focus on improving problem detection algorithms
- Duplex mismatch
- Link detection
- Complete deployment in Abilene POPs
- Expand deployment into University campus/GigaPoP
networks
11How Can you Participate?
- Set up BWCTL, OWAMP, NDT Beacons
- Set up a measurement domain
- Place tool beacons intelligently
- Determine locations
- Determine policy
- Determine limits
- Register beacons
- Install piPEs software
- Run regularly scheduled tests
- Store performance data
- Make performance data available via web service
- Make visualization CGIs available
- Solve Problems / Alert us to Case Studies
12Example piPEs Use Cases
- Edge-to-Middle (On-Demand)
- Automatic 2-Ended Test Set-up
- Middle-to-Middle (Regularly Scheduled)
- Raw Data feeds for 3rd-Party Analysis Tools
- http//vinci.cacr.caltech.edu8080/
- Quality Control of Network Infrastructure
- Edge-to-Edge (Regularly Scheduled)
- Quality Control of Application Communities
- Edge-to-Campus DMZ (On-Demand)
- Coupled with Regularly Scheduled Middle-to-Middle
- End User determines who to contact about
performance problem, armed with proof
13Test from the Edge to the Middle
- Divide and conquer Partial Path Analysis
- Install OWAMP and / or BWCTL
- Where are the nodes?
- http//e2epi.internet2.edu/pipes/pmp/pmp-dir.html
- Begin testing!
- http//e2epi.internet2.edu/pipes/ami/bwctl/
- Key Required
- http//e2epi.internet2.edu/pipes/ami/owamp/
- No Key Required
14Example piPEs Use Cases
- Edge-to-Middle (On-Demand)
- Automatic 2-Ended Test Set-up
- Middle-to-Middle (Regularly Scheduled)
- Raw Data feeds for 3rd-Party Analysis Tools
- http//vinci.cacr.caltech.edu8080/
- Quality Control of Network Infrastructure
- Edge-to-Edge (Regularly Scheduled)
- Quality Control of Application Communities
- Edge-to-Campus DMZ (On-Demand)
- Coupled with Regularly Scheduled Middle-to-Middle
- End User determines who to contact about
performance problem, armed with proof
15Abilene Measurement Domain
- Part of the Abilene Observatory
- http//abilene.internet2.edu/observatory
- Regularly scheduled OWAMP (1-way latency) and
BWCTL/Iperf (Throughput, Loss, Jitter) Tests - Web pages displaying
- Latest results http//abilene.internet2.edu/ami/bw
ctl_status.cgi/TCP/now Weathermap
http//abilene.internet2.edu/ami/bwctl_status_map.
cgi/TCP/now - Worst 10 Performing Links http//abilene.internet2
.edu/ami/bwctl_worst_case.cgi/TCP/now - Data available via web service
- http//abilene.internet2.edu/ami/webservices.html
16Quality Control of Abilene Measurement
Infrastructure (1)
- Problem Solving Approach
- Ongoing measurements start detecting a problem
- Ad-hoc measurements used for problem diagnosis
- On-going Measurements
- Expect Gbps flows on Abilene
- Stock TCP stack (albeit tuned)
- Very sensitive to loss
- Canary in a coal mine
- Web100 just deployed for additional reporting
- Skeptical eye
- Apparent problem could reflect interface
contention
17Quality Control of Abilene Measurement
Infrastructure (2)
- Regularly Scheduled Tests
- Track TCP and UDP Flows (BWCTL/Iperf)
- Track One-way Delays (OWAMP)
- IPv4 and IPv6
- Observe
- Worst 10 TCP flows
- First percentile TCP flow
- Fiftieth percentile TCP flow
- What percentile breaks 900 Mbps threshold
- General Conclusions
- On Abilene, IPv4 and IPv6 statistically
indistinguishable - Consistently low values to one host or across one
path indicates a problem
18A (Good) Day in the Life of Abilene
19First two weeks in March 50th percentile right at
980 Mb/s 1st percentile about 900 Mb/s Take it as
a baseline.
20Beware the Ides of March 1st percentile down to
522 Mb/s Circuit problems along west coast. nb
50th percentile very robust.
21Recovery sort of life through 29 April 1st
percentile back up to mid-800s, lower and
shakier. nb 50th percentile still very robust.
22Ah, sudden improvement through 5-May 1st
percentile back up above 900 Mb/s and more
stable. But why??
23Then, while Matt Z is tearing up the tracks 1st
percentile back down to the 500s. Diagnosis
something is killing Seattle. Oh, and Sunnyvale
is off the air.
24Matt fixes Sunnyvale, and things get (slightly)
worse both Seattle and Sunnyvale are bad. 1st
percentile right at 500 Mb/s. Diagnosis web100
interaction.
25Matt fixes the web100 interaction. 1st percentile
cruising through 700 Mb/s. Life is good.
26Friday the (almost) 13th JUNOS upgrade induces
packet loss for about four hours along many
links. 1st percentile falls to 63
Mb/s. Long-distance paths chiefly impacted.
27A Known Problem
- Mid-May routers all got a new software load to
enable a new feature - Everything seemed to come up, but on some links,
utilization did not rebound - Worst-10 reflected very low performance across
those links - QoS parameter configuration format change
28(No Transcript)
29Nice weekend. 1st percentile rises to 968
Mb/s. But why??
30(No Transcript)
31We Found It First
- Streams over SNVA-LOSA link all showed problems
- NOC responded Found errors on SNVA-LOSA link
- (NOC is now tracking errors more closely)
- Live (URL subject to change) http//abilene.inter
net2.edu/ami/bwctl_percentile.cgi/TCPV4/1/50/14118
254811367342080_14169839516075950080
32Example piPEs Use Cases
- Edge-to-Middle (On-Demand)
- Automatic 2-Ended Test Set-up
- Middle-to-Middle (Regularly Scheduled)
- Raw Data feeds for 3rd-Party Analysis Tools
- http//vinci.cacr.caltech.edu8080/
- Quality Control of Network Infrastructure
- Edge-to-Edge (Regularly Scheduled)
- Quality Control of Application Communities
- ESNet / ITECs (33) See Joe Metzgers talk to
follow - eVLBI
- Edge-to-Campus DMZ (On-Demand)
- Coupled with Regularly Scheduled Middle-to-Middle
- End User determines who to contact about
performance problem, armed with proof
33Example Application Community VLBI (1)
- Very-Long-Baseline Interferometry (VLBI) is a
high-resolution imaging technique used in radio
astronomy. - VLBI techniques involve using multiple radio
telescopes simultaneously in an array to record
data, which is then stored on magnetic tape and
shipped to a central processing site for
analysis. - Goal Using high-bandwidth networks, electronic
transmission of VLBI data (known as e-VLBI).
34Example Application Community VLBI (2)
- Haystack lt-gt Onsala
- Abilene, Eurolink, GEANT, NorduNet, SUNET
- User David Lapsley, Alan Whitney
- Constraints
- Lack of administrative access (needed for Iperf)
- Heavily scheduled, limited windows for testing
- Problem
- Insufficient performance
- Partial Path Analysis with BWCTL/Iperf
- Isolated packet loss to local congestion in
Haystack area - Upgraded bottleneck link
35Example Application Community VLBI (3)
- Result
- First demonstration of real-time, simultaneous
correlation of data from two antennas (32 Mbps,
work continues) - Future
- Optimize time-of-day for non-real-time data
transfers - Deploy BWCTL at 3 more sites beyond Haystack,
Onsala, and Kashima
36TSEV8 Experiment
- Intensive experiment
- Data
- 18 scans, 13.9 GB of data
- Antennas
- Westford, MA and Kashima, Japan
- Network
- Haystack, MA to Kashima, Japan
- Initially, 100 Mbps commodity Internet at each
end, Kashima link upgraded to 1 Gbps just prior
to experiment
37TSEV8 e-VLBI Network
38Network Issues
- In week leading up to experiment, network showed
extremely poor throughput 1 Mbps! - Network analysis/troubleshooting required
- Traditionally pair-wise iperf testing between
hosts along transfer path, step-by-step tracing
of link utilization via Internet2/Transpac-APAN
network monitoring websites - Time consuming, error prone, not conclusive
- New approach automated iperf-testing using
Internet2s bwctl tool (allows partial path
analysis), one single website to integrate link
utilization statistics into one single website - No maintenance required once setup, for the first
time an overall view of the network and bandwidth
on segment-by-segment basis
39E-VLBI Network Monitoring
http//web.haystack.mit.edu/staff/dlapsley/tsev7.h
tml
40E-VLBI Network Monitoring
http//web.haystack.mit.edu/staff/dlapsley/tsev7.h
tml
41E-VLBI Network Monitoring
- Use of centralized/integrated network monitoring
helped to enable identification of bottleneck
(hardware fault) - Automated monitoring allows view of network
throughput variation over time - Highlights route changes, network outages
- Automated monitoring also helps to highlight any
throughput issues at end points - E.g. Network Inteface Card failures, Untuned TCP
Stacks - Integrated monitoring provides overall view of
network behavior at a glance
42Result
- Successful UT1 experiment completed June 30 2004.
- New record time for transfer and calculation of
UT1 offset - 4.5 hours (down from 21 hours)
43Acknowledgements
- Yasuhiro Koyama, Masaki Hirabaru and colleagues
at National Institute for Information and
Communications Technology - Brian Corey, Mike Poirier and colleagues from MIT
Haysack Observatory - Internet2, TransPAC/APAN, JGN2 networks
- Staff at APAN Tokyo XP
- Tom Lehman - University of Southern California -
Information Sciences Institute East
44Example piPEs Use Cases
- Edge-to-Middle (On-Demand)
- Automatic 2-Ended Test Set-up
- Middle-to-Middle (Regularly Scheduled)
- Raw Data feeds for 3rd-Party Analysis Tools
- http//vinci.cacr.caltech.edu8080/
- Quality Control of Network Infrastructure
- Edge-to-Edge (Regularly Scheduled)
- Quality Control of Application Communities
- Edge-to-Campus DMZ (On-Demand)
- Coupled with Regularly Scheduled Middle-to-Middle
- End User determines who to contact about
performance problem, armed with proof
45How Can you Participate?
- Set up BWCTL, OWAMP, NDT Beacons
- Set up a measurement domain
- Place tool beacons intelligently
- Determine locations
- Determine policy
- Determine limits
- Register beacons
- Install piPEs software
- Run regularly scheduled tests
- Store performance data
- Make performance data available via web service
- Make visualization CGIs available
- Solve Problems / Alert us to Case Studies
46(No Transcript)
47Extra Slides
48American / European Collaboration Goals
- Awareness of ongoing Measurement Framework
Efforts / Sharing of Ideas (Good / Not
Sufficient) - Interoperable Measurement Frameworks (Minimum)
- Common means of data extraction
- Partial path analysis possible along
transatlantic paths - Open Source Shared Development (Possibility, In
Whole or In Part) - End-to-end partial path analysis for
transatlantic research communities - VLBI Haystack, Mass. ?? Onsala, Sweden
- HENP Caltech, Calif. ?? CERN, Switzerland
49American / European Collaboration Achievements
- UCL E2E Monitoring Workshop 2003
- http//people.internet2.edu/eboyd/ucl_workshop.ht
ml - Transatlantic Performance Monitoring Workshop
2004 - http//people.internet2.edu/eboyd/transatlantic_w
orkshop.html - Caltech lt-gt CERN Demo
- Haystack, USA lt-gt Onsala, Sweden
- piPEs Software Evaluation (In Progress)
- Architecture Reconciliation (In Progress)
50Example Application Community ESnet / Abilene (1)
- 33 Group
- US Govt. Labs LBL, FNAL, BNL
- Universities NC State, OSU, SDSC
- http//measurement.es.net/
- Observed
- 400 usec 1-way Latency Jump
- Noticed by Joe Metzger
- Detected
- Circuit connecting router in the CentaurLab to
the NCNI edge router moved to a different path on
metro DWDM system - 60 km optical distance increase
- Confirmed by John Moore
51Example Application Community ESnet / Abilene (2)
52American/European Demonstration Goals
- Demonstrate ability to do partial path analysis
between Caltech (Los Angeles Abilene router)
and CERN. - Demonstrate ability to do partial path analysis
involving nodes in the GEANT network. - Compare and contrast measurement of a lightpath
versus a normal IP path. - Demonstrate interoperability of piPEs and
analysis tools such as Advisor and MonALISA
53Demonstration Details
- Path 1 Default route between LA and CERN is
across Abilene to Chicago, then across Datatag
circuit to CERN - Path 2 Announced addresses so that route between
LA and CERN traverses GEANT via London node - Path 3 Lightpath (discussed earlier by Rick
Summerhill) - Each measurement node consists of a BWCTL box
and an OWAMP box next to the router.
54All Roads Lead to Geneva
55Results
- BWCTL http//abilene.internet2.edu/ami/bwctl_stat
us_eu.cgi/BW/14123130651515289600_1412424390274344
5504 - OWAMP http//abilene.internet2.edu/ami/owamp_stat
us_eu.cgi/14123130651515289600_1412424390274344550
4 - MONALISA
- NLANR Advisor
56Insights (1)
- Even with shared source and a single team of
developer-installers, inter-administrative domain
coordination is difficult. - Struggled with basics of multiple paths.
- IP addresses, host configuration, software
(support source addresses, etc.) - Struggled with cross-domain administrative
coordination issues. - AA (accounts), routes, port filters, MTUs, etc.
- Struggled with performance tuning measurement
nodes. - host tuning, asymmetric routing, MTUs
57Insights (2)
- Connectivity takes a large amount of coordination
and effort performance takes even more of the
same. - Current measurement approaches have limited
visibility into lightpaths. - Having hosts participate in the measurement is
one possible solution.
58Insights (3)
- Consider interaction with security lack of
end-to-end transparency is problematic. - Security filters are set up based on expected
traffic patterns - Measurement nodes create new traffic
- Lightpaths bypass expected ingress points