EGEEEPCC Work on Network Performance Monitoring PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: EGEEEPCC Work on Network Performance Monitoring


1
EGEE/EPCC Work on Network Performance Monitoring
  • GÉANT2 JRA1 Meeting, Berlin
  • June 26 2008

2
Outline
  • Introduction
  • EGEE Challenges and Strategy for NPM
  • Architecture and Tools Developed
  • Diagnostic Tool
  • PCP Probes Coordination Protocol
  • Deployment Challenges
  • Future Work

3
EPCC involvement with NPM
  • EGEE-NPM
  • Aim Make available network performance
    measurements for EGEE infrastructure to rest of
    project
  • EGEE 1 April 2004 31 March 2006
  • JRA4
  • BAR
  • NPM
  • Initial version of NPM services
  • EGEE-II 1 April 2006 31 March 2008
  • SA1 (Operations)
  • NPM only
  • NM-WG version 2 services
  • PCP
  • JISC-NPM 1April 2008 31 March 2009
  • Disseminate work to other JISC projects,
    collaborate with others, eg DEISA2

4
EGEE Challenges
  • Scale and heterogeneity of EGEE fabric poses a
    requirement to support diversity of all kinds
  • Multitude of ways of collecting monitoring data
  • Different measurement types
  • End-to-end
  • Appropriate to experience of user and
    application, eg TCP achievable bandwidth
  • Backbone
  • Lower level measurements, used to pin-point
    source of problems
  • Different measurement tools
  • Different data formats
  • Many administrative domains
  • Different user groups

5
The Importance of end-to-end monitoring
  • Most network problems can be attributed to the
    last mile
  • Campus issues, not backbone
  • Grid users want to know the expected performance
    of their application
  • Dont always realise that they wont get the full
    backbone bandwidth
  • Network infrastructure, configuration, firewalls
    etc
  • Influence of other network users
  • Machine TCP configuration, disk system, memory,
    processor speed etc
  • Application itself

Grids require reliable monitoring of the network
from source machine on one campus right through
to destination machine elsewhere
6
Strategy
  • Facilitate access to data collected by existing
    measurement tools
  • Lots already exist so no need to develop our own
  • Data federation through use of GGF/OGF NM-WG
    schema
  • Has prompted fruitful collaboration with GÉANT2
    JRA1
  • Availability of perfSONAR utilisation data
    through our tools

7
NPM Architecture
  • User Interface
  • Path Selection
  • Metric Selection
  • Plotting of results

Clients
  • Mediator
  • Single point of contact for clients
  • Metadata discovery
  • Brokers data requests

Middleware
  • e2emonit
  • Active end-to-end data
  • perfSONAR
  • Passive utilisation data from networks such as
    GÉANT2

Frameworks
8
Tools and Supported Frameworks
  • Clients
  • Diagnostic Tool
  • For use by people
  • Web based application for ease of access
  • Middleware
  • Mediator
  • Single point of contact for clients
  • Clients do not need to maintain list of
    frameworks
  • Discovery of metadata
  • Insulate clients from interface changes
  • Exposes NM-WG web-service interface
  • Added value services
  • caching of data
  • Measurement Frameworks
  • e2emonit
  • End-to-end metrics (TCP/UDP achievable bandwidth,
    RTT, packet loss, OWDV)
  • Active measurement tools (iperf, ping, udpmon)

9
NPM Diagnostic Tool
  • The Diagnostic Tool can be accessed using a
    standard web browser, which users can be
    individually authorised to use.
  • The intended user is a NOC/GOC/ROC operator, but
    anyone can use it to investigate problems.
  • The sites and metrics displayed depend on where
    and which measurement tool has been deployed,
    using NM-WG metadata queries to the Mediator.
  • Currently deployed with access to some perfSONAR
    MAs and test e2emonit data.

10
NPM Diagnostic Tool (2)
  • The parameters used to gather measurements are
    shown.
  • Here the iperf tool was used to measure the TCP
    achievable bandwidth.
  • These parameters can be useful in interpreting
    the results.

11
NPM Diagnostic Tool (3)
  • Information from multiple paths may be plotted at
    the same time.
  • Here utilisation data for the GÉANT2/JANET router
    is plotted for both inbound and outbound traffic
    over the course of one week, obtained from the
    GÉANT2 perfSONAR Measurement Archive.

12
Deployment Challenges
  • The usefulness of NPM depends on the data that is
    available
  • Providing data federation tools not enough by
    itself
  • We would like to use data that is already
    collected
  • But monitoring tools currently not sufficiently
    deployed across sites
  • Ideally individual regional federations or VOs
    make decisions on which tools to deploy for their
    infrastructure
  • E.g. GridPP deployment of gridmon within UK
  • We then help to make this data available through
    an NM-WG interface

13
Gridmon (1)
  • Network monitoring for the UK GridPP
    infrastructure (UK contribution to EGEE)
  • Mark Leese _at_ STFC Daresbury Lab
  • Active end-to-end measurements
  • Similar tools/metrics to e2emonit
  • TCP/UDP achievable bandwidth, RTT, packet loss
  • Well defined set of sites and paths of interest
  • Tier 1 centre to all, Tier 2 centres to others in
    same region
  • Hope to soon deploy NM-WG web service
  • Useful comparison of schema implementations
  • Integrate into Mediator and DT

14
Gridmon (2)
15
Gridmon (3)
16
More Deployment Challenges
  • Deployment of monitoring tools is not so easy
  • There has to be a clear benefit to the site
    before they install tools
  • This benefit is not obvious until after an
    incident has occurred, by which time it is too
    late
  • Firewall changes may be difficult (eg ICMP
    blocked by default)
  • Technically or politically
  • Tools need to be trivial to install and robust
    when running
  • Sys-admins very busy
  • Need to carefully consider scheduling for
    end-to-end tests
  • Overlapping measurements
  • Network overload

Solution ? Develop PCP
17
PCP Probes Control Protocol
  • Developed to solve management overhead of running
    active measurement probes
  • eg manual cron jobs
  • Token-based mechanism to co-ordinate periodic
    execution of monitoring tasks
  • But applicable to any kind of task requiring
    regular scheduling across administrative domains
  • Prevents overlapping measurements
  • Probe will not run until token received
  • Groups of sites form cliques
  • Robust
  • Can cope with sites in the clique being
    unreachable
  • Secure
  • Only pre-defined activities may be run
  • VOMS/X.509 based authentication of users
  • But designed as pluggable security

18
PCP Operation
19
Even More Deployment Challenges
  • Different user groups may have widely different
    requirements for displaying data
  • e.g. site or service admins may just want an
    alarm that tells them your network is broken,
    and never look at the DT
  • But network people would not contemplate
    investigating problems without clear historical
    data to look at
  • The network is still assumed by many to just
    work

20
The Future (1)
  • We are no longer involved in EGEE
  • But funded by JISC for a further year to do
    similar work
  • EGEE plans
  • EGEE SA2 (ENOC etc) have a small amount of effort
    from DFN
  • On-demand measurements, requested by ENOC
  • Central web server for authorisation, archiving,
    control
  • New BWCTL-like plugins for traceroute, ping, DNS
    lookup, nmap
  • LHC-OPN Deploying perfSONAR services

21
(TSA2.2.4) Network monitoring tools DFN
  • Network monitoring tools for efficient
    troubleshooting
  • Launch test on demand from the Grid site under
    central server control ping, traceroute, DNS
    lookup, nmap and bandwith measurements

2
ENOC supervisor
1
3
ENOC
5
4
administrator
Grid site B
Grid site A
Local site light PerfSONARs sensor
Central ENOC monitoring server
SA2 Networking support Transition meeting May 08
21
22
The Future (2)
  • Gridmon
  • Collaboration around NM-WG v2 interfaces
  • DEISA
  • Fewer sites involved, currently 11
  • DEISA plan to evaluate perfSONAR in the coming
    months
  • But we need to do something useful soon
  • Is there an opportunity to work together to
    deliver something useful to DEISA that would also
    enhance perfSONAR?
  • Alarms?
  • Presentation?

23
The Future (3)
  • For large projects in general
  • If multiple frameworks are deployed, then have to
    pursue interoperability through NM-WG and use
    Mediator-like components
  • But are multiple frameworks really deployed?
  • Where is NM-WG going?
  • Why not install try to install perfSONAR services
    everywhere?

24
Summary
  • Provision of federated access to network
    measurement data has been demonstrated
  • Based on OGF NM-WG schema
  • Getting access to data itself is much harder
  • Deployment challenges
  • Need to sell to sites the value of having data
    available
  • Differences between metrics provided by network
    providers and those that can be provided by
    individual sites
  • end-to-end active vs. passive monitoring
  • Should projects be attempting to do their own
    monitoring?
  • If they dont then it is left up to providers
  • But only projects can provide meaningful
    end-to-end measurements
  • What happens when a site is active in multiple
    projects?
Write a Comment
User Comments (0)
About PowerShow.com