Title: PIPE Dreams
1PIPE Dreams
- Trouble Shooting Network Performance for
Production Science Data Grids - Presented by Warren Matthews at CHEP03, San
Diego March 24-28, 2003
2Abstract
The vision of science grids allocating resources
to analyze huge quantities of HENP data clearly
depends on reliable network performance. Tools
developed at SLAC in conjunction with the
Internet2 PIPES project will help to ensure this.
In this talk, these tools will be discussed and
the procedure for publishing performance data, in
particular using the Globus toolkit's MDS and web
services will be reviewed. The subsequent
analysis and trouble-shooting methodology will be
discussed with real world examples from the
particle physics data grid (PPDG) and the
European data grid (EDG).
3Overview
- What is the problem ?
- What is PIPES ?
- Network performance monitoring
- Problem identification
4Network Monitoring for the Grid
- The Data Grid consists of many components that
must interoperate
Farm
Data
requestor
Farm
Data
The Network
Data
Farm
requestor
Resource Broker
5Allocate Resources
- The resource broker must be fully informed
- Measurement is required !
Farm
Data
requestor
12 pkt loss
Farm
Data
The Network
OC48
80 Utilization
Data
Farm
requestor
Resource Broker
6What is PIPES ?
- Internet2
- End-to-end performance initiative
- PI Performance Evaluation System (PIPES)
- PIPES Monitoring Platform (PMP)
- Overlap with goals of HENP
- Tremendous resources
7IEPM-BW
- Package developed at SLAC
- Measurement Engine
- Iperf, bbftp, bbcp, ping, traceroute
- Abwe, owamp, udpmon, gridftp
- Job Manager
- Data Storage and data server
- Analysis Engine
8LANL
EDG
KEK
CERN
TRIUMF
NIKHEF
NERSC
FNAL
IN2P3
ANL
CHI
CERN
PPDG/GriPhyN
SNV
ESnet
ORNL
RAL
JLAB
NY
UCL
ORNL
SLAC
UManc
SLAC
Imperial
JAnet
DL
NNW
BNL
APAN
Stanford
RIKEN
Stanford
INFN-Roma
APAN
INFN-Padua
Geant
CalREN
INFN-Milan
Abilene
SEA
CESnet
NY
NASA
WASH
SNV
Monitoring Site
SOX
HSTN
ATL
DNVR
CLV
IPLS
UTAH
SDSC
UFL
CALTECH
I2
UTDallas
UMich
Rice
NCSA
9NNW
BaBar Grid
Manchester
10 Gbps
TVN
622Mbps
RAL
Janet
ESnet
SWERN
SLAC
Bristol
Geant
Stanford
DFN
Dresden
Calren
Abilene
1 Gbps
2.5 Gbps
Renater
IN2P3
10(No Transcript)
11Problem Identification
- Typical Scenario
- User complains file transfer is slow
- Net admin runs ping, traceroute, iperf test
- Complain to upstream provider
- Proactive
- What do we mean by throughput?
- How do we know there was a performance hit?
- Our approach is diurnal changes
12(No Transcript)
13Alarms
- Too much to keep track of
- Rather not wait for complaints
- Automated Alarms
- Rolling average à la RIPE-TT
- May not be the best approach
- AMP Automated Detection System
14(No Transcript)
15(No Transcript)
16Limitations
- Could be over an hour before alarm is generated
- More frequent measurements impact the network and
measurements overlap - Low impact tools allow finer grained measurement
- Use NWS multi-variate method
- Use SCIDAC ABwE tool
- Use PingER, OWAMP
17(No Transcript)
18Publishing
- Many monitoring projects, publish data to allow
them to inter-operate - MDS
- EDG NM Schema
- Web Services
- GLUE NE Schema
- GGF NMWG
- Hierarchy Doc
- Tools Doc
./get_data 2003 3 18 6 1 41 1.61 1.601 1.62 0
19Net Rat
- Alarm System
- Multiple tools
- Multiple measurement points
- Trigger further measurements
- Cross reference off site stats
- Informant database
- No measurement is authoritative
- Cannot even believe a measurement
20Log
03/20/2003 201346 ALARM pcgiga
throughput305.224 ctresh512.95
athresh312.91 03/20/2003 201348 TRACE no
change in route detected 03/20/2003 201607 CALM
Throughput within acceptable limits. ALARM
CANCELLED
21Toward a Monitoring Infrastructure
- MAGGIE
- Measurement and Analysis package built on
NIMI/Akenti - EDEE
- production-quality Data Grid for Europe
22More Information
- IEPM Home Page
- IEPM-BW
- I2 E2E and PIPES
- RIPE-TT
- AMP Automated Event Detection
- NWS
- ABWE
23End
This talk made possible by the IEPM team at SLAC
(Les Cottrell, Connie Logg, Jiri Navratil, Jerrod
Williams, Fabrizio Coccetti), and the many
developers and maintainers around the world.