Grid Monitoring Futures with Globus - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Grid Monitoring Futures with Globus

Description:

Grid Monitoring Futures with Globus. Jennifer M. Schopf. Argonne National Lab. April 2003 ... All the data about every node of every site ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 54

Provided by: jennife62

Category:

more less

Transcript and Presenter's Notes

Title: Grid Monitoring Futures with Globus

1
Grid Monitoring Futures with Globus

Jennifer M. Schopf
Argonne National Lab
April 2003

2
My Definitions

Grid
Shared resources
Coordinated problem solving
Multiple sites (multiple institutions)
Monitoring
Discovery
Registry service
Contains descriptions of data that is available
Expression of data
Access to sensors, archives, etc.

3
What do I mean by Grid monitoring?

Different levels of monitoring needed
Application specific
Node level
Cluster/site Level
Grid level
Grid level monitoring concerns data
Shared between administrative domains
For use by multiple people
(think scalability)

4
Grid Monitoring Does Not Include

All the data about every node of every site
Years of utilization logs to use for planning
next hardware purchase
Low-level application progress details for a
single user
Application debugging data (except perhaps
notification of a failure of a heartbeat)
Point-to-point sharing of all data over all sites

5
Overview of This Talk

Evaluation of information infrastructures
Globus Toolkit MDS2, R-GMA, Hawkeye
Insights into performance issues
(publication at HPDC 2003)
What monitoring and discovery could be
Next-generation information architecture
Open Grid Services Architecture mechanisms
Integrated monitoring discovery arch for GT3

6
Performance and the Grid

Its not enough to use the Grid, it has to
perform otherwise, why bother?
First prototypes rarely consider performance
(tradeoff with devt time)
MDS1centralized LDAP
MDS2decentralized LDAP
MDS3decentralized Grid service
Often performance is simply not known

7
Globus Monitoring andDiscovery Service (MDS2)

Part of Globus Toolkit, compatible with other
elements
Used most often for resource selection
aid user/agent to identify host(s) on which to
run an application
Standard mechanism for publishing and discovery
Decentralized, hierarchical structure
Soft-state protocols
Caching
Grid Security Infrastructure credentials

8
MDS2 Architecture
9
Relational Grid Monitoring Architecture (R-GMA)

Implementation of the Grid Monitoring
Architecture (GMA) defined within the Global Grid
Forum (GGF)
Three components
Consumers
Producers
Registry
GMA as defined currently does not specify the
protocols or the underlying data model to be
used.

10
GGF Grid Monitoring Architecture
11
R-GMA

Monitoring used in the EU Datagrid Project
Steve Fisher, RAL, and James Magowan, IBM-UK
Based on the relational data model
Used Java Servlet technologies
Focus on notification of events
User can subscribe to a flow of data with
specific properties directly from a data source

12
R-GMA Architecture
13
Hawkeye

Developed by Condor Group
Focus automatic problem detection
Underlying infrastructure builds on the Condor
and ClassAd technologies
Condor ClassAd Language to identify resources in
a pool
ClassAd Matchmaking to execute jobs based on
attribute values of resources to identify
problems in a pool

14
Hawkeye Architecture
15
Comparing Information Systems

16
Some Architecture Considerations

Similar functional components
Grid-wide for MDS2, R-GMA Pool for Hawkeye
Global schema
Different use cases will lead to different
strengths
GIIS for decentralized registry no standard
protocol to distribute multiple R-GMA registries
R-GMA meant for streaming data currently used
for NW data Hawkeye and MDS2 for single queries
Push vs Pull
MDS2 is PULL only
R-GMA allows push and pull
Hawkeye allows triggers push model

17
Experiments

How many users can query an information server at
a time?
How many users can query a directory server?
How does an information server scale with the
amount of data in it?
How does an aggregator scale with the number of
information servers registered to it?

18
Testbed

Lucky cluster at Argonne
7 nodes, each has two 1133 MHz Intel PIII CPUs
(with a 512 KB cache) and 512 MB main memory
Users simulated at the UC nodes
20 P3 Linux nodes, mostly 1.1 GHz
R-GMA has an issue with the shared file system,
so we also simulated users on Lucky nodes
All figures are 10 minute averages
Queries happening with a one second wait between
each query (think synchronous send with a 1
second wait)

19
Metrics

Throughput
Number of requests processed per second
Response time
Average amount of time (in sec) to handle a
request
Load
percentage of CPU cycles spent in user mode and
system mode, recorded by Ganglia
High when running small number compute intensive
aps
Load1
average number of processes in the ready queue
waiting to run, 1 minute average, from Ganglia
High when large number of aps blocking on I/O

20
Performance of Information Servers vs. Number of
Users
21
Experiment 1 Summary

Caching can significantly improve performance of
the information server
Particularly desirable if one wishes the server
to scale well with an increasing number of users
When setting up an information server, care
should be taken to make sure the server is on a
well-connected machine
Network behavior plays a larger role than
expected
If this is not an option, thought should be given
to duplicating the server if more than 200 users
are expected to query it

22
Directory Server Scalability
23
Experiment 2 Summary

Because of the network contention issues, the
placement of a directory server on a highly
connected machine will play a large role in the
scalability as the number of users grows
Significant loads are seen even with only a few
users, it will be important that this service be
run on a dedicated machine, or that it be
duplicated as the number of users grows.

24
Information Service Throughput vs. Num. of
Information Collectors
25
Experiment 3 Summary

Too many information collectors is a performance
bottleneck
Caching data helps
Alternatively, register to more instances of
information servers with each handling a subset
of the collectors

26
Overall Results

Performance can be a matter of deployment
Effect of background load
Effect of network bandwidth
Performance can be affected by underlying
infrastructure
LDAP/Java strengths and weaknesses
Performance can be improved using standard
techniques
Caching multi-threading etc.

27
So what could monitoring be?

Basic functionality
Push and pull (subscription and notification)
Aggregation and Caching
More information available
More higher-level services
Triggers like Hawkeye
Viz of archive data like Ganglia
Plug and Play
Well defined protocols, interfaces and schemas
Performance considerations
Easy searching
Keep load off of clients

28
Topics

Evaluation of information infrastructures
Globus Toolkit MDS2, RGMA, Hawkeye
Throughput, response time, load
Insights into performance issues
What monitoring and discovery could be
Next-generation information architecture
Open Grid Services Architecture mechanisms
Integrated monitoring discovery arch for GT3

29
Open Grid Services Architecture (OGSA)

Defines standard interfaces and behaviors for
distributed system integration, especially
Standard XML-based service information model
Standard interfaces for push and pull mode access
to service data
Notification and subscription

30
Key OGSI concept - serviceData

Every service has its own service data
OGSA has common mechanism to expose a service
instances state data to service requestors for
query, update and change notification
Monitoring data is baked right in
Service-level concept, not host-level concept

31
serviceData

Every Grid Service can expose internal state as
serviceData elements
An XML element of arbitrary complexity
Each service has a serviceData set
The collection of serviceData Elements (SDEs)
Example state of a host is exposed as an SDE by
GRAM.
Similar to MDS2 GRIS functionality, but in each
service (rather than once per host)

32
ExampleReliable File Transfer Service
File Transfer
Internal State
Data transfer operations
33
MDS3 Monitoring and Discovery System

Consists of a various components
Core functionality
Information providers
Higher level services
Clients

34
Core Functionality

Xpath support
XPath is a language that describes a way to
locate and process items in XML docs by using an
addressing syntax based on a path through the
document's logical structure or hierarchy
Xindice support native XML database
Registry support

35
Schema Issues

Need to keep track of service data schema
Avoid conflicts
Find the data easier
Should really have unified naming approach
All of the tool are schema-agnostic, but
interoperability needs a well-understood common
language

36
MDS3 Information Providers in June Release

All the data currently in core MDS2
Full data in the GLUE schema for compute elements
(CE)
Ganglia information provider for cluster data
will also be available from Ganglia folks (with
luck)
Service data from RFT, RLS, GRAM
GT2 to GT3 work
GridFTP server data
Software version and path data
Documentation for translating your GT2
information provider to a GT3 information provider

37
MDS3 Higher Level Products

Higher-level services can perform actions on
service data collected from other services
Part of this functionality can be provided by a
set of building blocks provided
Provider interface GRIS-style API for writing
information providers
Service Data Aggregator set up subscriptions to
data for other services, and publish it as a
single data stream
Hierarchy Builder allow for hierarchy of
aggregators

38
MDS3 Index Server

Simplest higher-level service is the caching
index service
Much like the GIIS in MDS2
Will have configurablity like an GIIS hierarchy
Will also have PHP-style scripts, much as
available today

39
(No Transcript)
40
Clients currently in GT3

findServiceData command line client
Same functionality of grid-info-search
C bindings
Core C bindings provide findServiceData C
function
findServiceData command line client gives an
example of using it to parse out information (in
this case, registry contents)

41
Service Data Browser

GUI client to display service data from any
service
Extensible for data-specific visualization
A version was released with GT3 alpha
http//www.globus.org/ogsa/releases/
alpha/docs/infosvcs/sdbquickstart.html

42
Comparing Information Systems

43
Is this enough?

No!
Many places where additional help developing MDS3
is needed

44
We Need More Basic Information

Interfaces to other sources of data
GPT data
Other monitoring systems
Others?
Service data from other components
Every service has service data
OGSA-DAI
Will need to interface on schema

45
We Will Need More GUIs and Clients

Additional GUI visualizers may be implemented to
display service data specific to a particular
port type (as part of service data browser)
Additional Client interfaces possibly
Integration into current portals, brokers

46
We Need MoreHigher Level Services

We have a couple planned
Archiving service
Trigger template

47
Post-3.0 release Archiving Service

Will allow subscription to service data
Logging in a flexible way
Well defined interfaces for mining
Open questions
Best way to store time-series of arbitrary XML?
Best way to query this archive?
Link to OGSA-DAI?
Link to other archivers?

48
Post-3.0 release Trigger Template

Will provide a template to allow subscription to
data, reasoning about that data, and a course of
action to take place
Essentially, a gateway service between OGSA
Notifications and some other notification
framework, with filtering of notifications
Example Subscribe to disk space information,
send mail to sys admin when it reached 90 full
Needed trigger template and several small
examples of common triggers, and documentation
for how users could extend them or write new
ones.

49
Other Possible HigherLevel Services

Site Validation Service
Job Tracking Service
Interfacing to Netlogger?

50
We Need Security

Need I say more?

51
Summary

Current monitoring systems
Insights into performance issues
What we really want for monitoring and discovery
is a combination of all the current systems
Next-generation information architecture
Open Grid Services Architecture mechanisms
MDS3 plans
Additional work needed!

52
Thanks

Testbed/Experiment support and comments
John Mcgee, ISI James Magowan, IBM-UK Alain Roy
and Nick LeRoy at University of Wisconsin,
MadisonScott Gose and Charles Bacon, ANL Steve
Fisher, RAL Brian Tierney and Dan Gunter, LBNL.
This work was supported in part by the
Mathematical, Information, and Computational
Sciences Division subprogram of the Office of
Advanced Scientific Computing Research, U.S.
Department of Energy, under contract
W-31-109-Eng-38. This work also supported by
DOESG SciDAC Grant, iVDGL from NSF, and others.

53
Additional Information

MDS3 technology coordinators
Ben Clifford (benc_at_isi.edu)
Jennifer Schopf (jms_at_mcs.anl.gov)
Zhang, Freschl and Schopf, A Performance Study
of Monitoring and Information Services for
Distributed Systems, to appear in HPDC 2003
http//people.cs.uchicago.edu/hai/hpdcv25.doc
MDS-3 information
Soon at www.globus.org/mds

54
Extra Slides
55
Why Information Infrastructure?

Distributed, often complex, performance-critical
nature of Grids apps demands tools for
Discovering available resources
Discovering available sensors
Integrating information from multiple sources
Archiving and replaying historical information
These and other functions are provided by an
information infrastructure
Many projects are concerned with design,
deployment, evaluation, and application

56
Performance of GIS Information Servers vs. Number
of Users
57
Performance of GIS Information Servers vs. Number
of Users
58
Performance of GIS Information Servers vs. Number
of Users
59
Performance of GIS Information Servers vs. Number
of Users
60
Directory Server Scalability
61
Directory Server Scalability
62
Directory Server Scalability
63
Directory Server Scalability

Write a Comment

User Comments (0)