PCP The Probes Coordination Protocol - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

PCP The Probes Coordination Protocol

Description:

A secure, robust framework for scheduling and coordinating regular tasks across ... Token-based mechanism to co-ordinate periodic execution of monitoring tasks ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: paulg85
Category:

less

Transcript and Presenter's Notes

Title: PCP The Probes Coordination Protocol


1
PCP TheProbesCoordinationProtocol
  • A secure, robust framework for scheduling and
    coordinating regular tasks across multiple sites

2
Overview
  • Background
  • Motivation
  • The Probes Coordination Protocol
  • New implementation
  • PCP implementation features
  • Summary

3
Background
  • Work has spanned three projects
  • European Data Grid (EDG) 2001-2004
  • Enabling Grids for eScience (EGEE/EGEE-II)
    2004-2008
  • Joint Information Systems Committee (JISC) NPM
    2008-2009
  • Network performance measurements
  • The collection of monitoring data in a Grid
    environment
  • Grid users want to know the expected performance
    of their network-based application
  • e2emonit, gridmon

4
Motivation
  • Issues for collecting monitoring data
  • Different measurement types
  • End to end
  • Backbone
  • Different tools
  • Different formats
  • Heterogeneous environments
  • Grid!
  • Many administrative domains
  • Different user groups

5
The problem - sites
  • Deployment of monitoring tools is not so easy
  • There has to be a clear benefit to the site
    before they install tools
  • This benefit is not obvious until after an
    incident has occurred, by which time it is too
    late
  • Firewall changes may be difficult
  • Technically or politically
  • Tools need to be trivial to install and robust
    when running
  • Sys-admins very busy
  • Need to carefully consider scheduling for
    end-to-end tests
  • Overlapping measurements
  • Network overload

6
The problem - users
  • Users need to be able to start, stop and adjust
    the measurements
  • Potentially on remote administrative domains
  • Traditionally system administrators manually set
    up, start and stop cron jobs for the tools
  • This caused various problems for scalability,
    coordination and basic practicalities

7
SolutionThe Probes Coordination Protocol
  • Developed to solve the management overhead of
    running active measurement probes
  • Token-based mechanism to co-ordinate periodic
    execution of monitoring tasks
  • But has other applications
  • Initially developed as part of EDG (Robert
    Harakaly et al.)
  • Prototype implementation in C usable but lacking
    some features
  • Re-engineered and extended by EPCC to address
    these issues

8
PCP Operation
  • Client/Server model
  • Based on a system of tokens passed between sites
  • Client submits tokens to a site
  • Server acts upon the arrival of a token
  • registers and monitors job tokens
  • Performs function defined by an admin token
  • Sites are grouped into cliques

9
PCP Token
  • Trigger for activity at a site
  • Job token
  • Name an identifier
  • Delay time to wait before executing the job for
    the first time
  • Period frequency of command
  • Command indicator of which command to run at
    the sites
  • Member(s) sites in the clique to run the
    command
  • Admin token
  • List - for retrieving data about the activities
    currently registered at a site
  • Kill destroys the named clique activity
  • Clear removes (i.e. deregisters) all the
    activities from a site
  • Update modifies the named clique activity with
    the new token message (enables changes to values
    such as the period)
  • Exit stops the PCP server at the given site
  • Also can include security information

10
PCP Clique
  • The clique represents a group of sites, all of
    which are required to run a particular activity
    at particular intervals
  • Example will look at clique with three sites, A,
    B and C ...

11
Example PCP Token
  • Lines beginning with are ignored as comments
  • namePJG-EPCC-PCP_TEST
  • membersitea.epcc.ed.ac.uk
  • membersiteb.epcc.ed.ac.uk
  • membersitec.epcc.ed.ac.uk
  • period1800
  • timeout0
  • delay300
  • commandpcp_test
  • ownersomebody_at_epcc.ed.ac.uk
  • lockDependenttrue

12
PCP normal operation
13
PCP Site failure operation
14
PCP Lock operation
  • Individual sites may temporarily wish to drop out
    of a clique
  • Previously required inter-site coordination to
    stop/restart commands
  • Enabled via a locking mechanism
  • Administrator sets the lock
  • Lock dependent tokens are not allowed to execute
  • Lock either expires or is removed by
    administrator
  • The site operates normally as part of the clique

15
PCP Features
  • For NPM, prevents overlapping measurements
  • Probe will not run until token received
  • Extensible plug-in design
  • Communication
  • TCP/IP
  • Security
  • VOMS/X.509 based authentication
  • Limited set of commands can be run
  • Logging
  • Configurable to various levels
  • Security-related messages straightforwardly
    distinguishable
  • Portable
  • Pure java

16
Summary
  • Protocol provides a means for scheduling regular
    tasks at multiple sites with minimal overheads
    for both users and administrators
  • Software is
  • Portable
  • Secure
  • Robust
  • Extensible
  • Available for download http//www.egee-npm.org/pc
    p/
  • Any questions?
  • Thank you
  • p.graham_at_epcc.ed.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com