Title: Information Systems describing resources
1Information Systemsdescribing resources
- Grid Middleware 4
- David Groep, lecture series 2005-2006
2Outline
- Taxonomy of information systems
- hierarchies and republishers
- Grid Monitoring Architecture
- push and pull, subscriptions
- Performance of an IS
- collecting information
- sensors
- IS content schemas and approaches
3Grid Information Systems
- Concerns data
- shared between administrative domains
- for use by multiple people or VOs
- So it does not include things like
- cluster temperature monitoring
- debugging streams
- accounting history
4Classification of information systems
- Which monitoring systems types are suitable for
grid? - Paper
- http//www.cs.man.ac.uk/zanikols/fgcs05.pdf
- Different types are
- Level 0
- self-contained not accessible by programs (but
only e.g. web) - Level 1
- events are accessible remotely at the single
producer level - Level 2
- includes republishers with fixed functionality
- Level 3
- supports hierarchies of republishers
5System taxonomy levels of systems
- Components used in information systems
- and taxonomy levels
graphics and concept from S. Zanikolas et al.,
FGCS 21 (2005) 163-188
6Information system classes
- Level 2 or 3 system are suitable
- Reference architecture GMA
- Grid Monitoring Architecture requirements
- (performance) information with relatively short
lifetime - frequent updates
- (should) carry quality-of-information status as
well - but when you get down to it, almost anything
fits in this architecture - including directories with
- relatively static information
- suitable mainly for resource state
7Grid Monitoring Architecture
- Definition of terms and roles (GWD-GP-16-2)
- Functions
- Registry (directory)
- Add, Update, Remove, Search
- Producer
- Maintain Registration, Accept Query, Accept
(Un)subscribe, Locate Consumer, Notify,
Initiate (Un)subscribe - Consumer
- Locate Producer, Initiate Query, (Un)subscribe,
Maintain Registration, Accept Notification,
(Un)subscribe, Locate Event Schema
8GMA Intermediaries
- Also referred to as republishers
- make it a level-3 system
- Examples
- Latest Producer
- return the last value of an event
- Archiver (history producer)
- storage of historical monitoring data
- e.g. accounting records
9Directories
- Information providers publish information to a
directory - Directories may be linked in networked
hierarchies - Information is usually also in a DIT-like
structure(Directory Information Tree) - Typical implementation LDAP
10Approaches to sending information
- Orthogonal to the topology is the information
flow model - Push model
- information gets published regardless of its use
- bet its there (in higher-level aggregators) when
its needed - e.g. Condor Hawkeye, LCG BDII
- Hybrid
- information location gets published
- consumers can subscribe to information and from
then on continuously get it - e.g. R-GMA, (MDS4?)
- Pull model
- information is retrieved on-demand, and you
cannot subscribe - e.g. MDS-2
11Information Systems
- Examples shown in this lecture
- Monitoring and Discovery Service (MDS)
- Relational Grid Monitoring Arch (R-GMA)
- Hawk eye
- Berkeley-DataBase Information Index (BDII)
121 MDS2
- Part of GT2.x
- Typical use resource selection by brokers
- Architecture
- decentralized
- hierarchical
- soft-state protocols with timeouts
- supports caching in index servers
- Security GSI (optional)
13MDS2 Architecture
graphic J. Schopf, GFNL masterclass 2005
Distributed Monitoring and Information Services
for the Grid
14MDS2 information flow
- Soft-state registration of GRISes with GIISes
- time out on the registration (TTL and nextUpdate)
- Data retrieved on-demand from underlying GRIS
- timeout on the answer
- resources silently drop out if they fail
- GRISes collect information using scripts
- GIISes can be collated in arbitrary hierarchies
152 R-GMA
- straight implementation of the GMA
- uses a relational representation of the data
- notification/subscription directly from the
source - implementation in Java
- developed in EU DataGrid and EGEE JRA1
- UK cluster, Steve Fisher (RAL), et al.
16R-GMA Archirecture
17MON Box
- Every site has a MON box to proxy information
- local cache of info in memory
- through-channel to systems behind a firewall
- producers/consumers connect actively to the MON
box - Multiple producers can publish in the same table
- joins can be done, but only via a secondary
producer - Usually deployed with a single registry
18R-GMA plain SQL interface
- bosuidavidg1001 rgma
- Welcome to the R-GMA virtual database for Virtual
Organisations.
- Your local R-GMA server is
- https//eg.nikhef.nl8443/R-GMA
- You are connected to the following R-GMA Registry
services - https//lcgic01.gridpp.rl.ac.uk8443/R-GMA/Regis
tryServlet - You are connected to the following R-GMA Schema
service - https//lcgic01.gridpp.rl.ac.uk8443/R-GMA/Schem
aServlet - Type "help" for a list of commands.
- rgmagt show tables
- ------------------------------------------
- Table Name
- ------------------------------------------
- ArchiverTestTable
- ...
- GlueCE
- ...
19Queries
- rgmagt select UniqueID,Name,TotalCPUs from GlueCE
WHERE UniqueID LIKE 'ulakbim' - -------------------------------------------------
--------------------- - UniqueID
Name TotalCPUs - -------------------------------------------------
--------------------- - ce.ulakbim.gov.tr2119/jobmanager-lcgpbs-seegrid
seegrid 126 - ce.ulakbim.gov.tr2119/jobmanager-lcgpbs-trgrida
trgrida 126 - ce.ulakbim.gov.tr2119/jobmanager-lcgpbs-lhcb
lhcb 126 - ...
203 Hawkeye
- Condor information system
- publishes class-ads for
- matchmaking
- fault detection
- periodic updates to the agents by the modules
- information kept in the agents
21Hawkeye architecture
graphic J. Schopf, GFNL masterclass 2005
Distributed Monitoring and Information Services
for the Grid
224 BDII GIP
- BDII conceptually similar to Hawkeye
- but data is pulled rather than pushed
- mentioned here because of its wide-spread
deployment in EGEE/LCG, OSG, c - Generic Information Providers (GIP)
- scripting framework to produce LDIF
- static values overridden by output from scripts
- periodically, LDAP queries sent to subordinate
directories - with time-out on the answer
- previous answer is persistent for a defined
amount of time - contrary to MDS2, BDII will never forget
- Paperhttp//indico.cern.ch/materialDisplay.py?co
ntribId126sessionId23materialIdpaperconfId0
23BDII organisation
BDII
Site BDII
24BDII scaling
- OpenLDAP update (write) is not optimized
- with SleepyCat Berkeley DB, simultaneous
read/write lead to timeouts - So, put in a forwarder service that redirects to
a pool of OpenLDAP/DB backends that swap roles
25WS style information systems
- MDS4
- based on WS-RF, WS-Notification mechanisms
- provides a common aggregator framework for
- index service (republisher)
- trigger service (send events, mails, execute
programs) - archive service
- NAREGI Distributed Information Service
- Aggregator collect information from various
sources - put these as CIM objects in a database
- OGSA-DAI front-end to the database with CIM
objects - PS OGSA-DAI (Data Access Integration) is a
system for providing uniform grid access to
database resources
26MDS4 Aggregator Framework
27NAREGI Distributed Information Service
graphicSatoshi Matuoka, Tokyo Institute of
Technology NII, NAREGI
28Status
- Both developed and available
- neither been tested yet at the very large scale
- i.e. O(1000) resources, thousands of simultaneous
queries
29Hierarchies and Views
30Views on the information system
- For resource information
- information view on those resources to which the
viewer potientially has access - a single global root is neither feasible nor
needed - a per-VO or per-infrastructure view is sufficient
- For application level monitoring
- fine-grained access control needed
- at the VO or user level
- attributes in the schema may have different
privacy levels - requires view management like in regular databases
31Typical hierarchical top levels today
- per-infrastructure
- e.g. EGEE/LCG, OSG, NAREGI
- used by many VOs
- needs support at the infrastructure level
- per-VO view
- prevalent in grass-roots deployment
- all systems can support both
- although not all in the same wayR-GMA works
with per-site mon boxes that (today) use a
central registry -gt one per infrastructure
32Performance
- an example of a grid performance study
33Performance analysis
- Best paper so far X. Zhang, J. Freschl, J.
Schopf, A performance study of monitoring and
information services for distributed systems,
in Proceedings of the 12th IEEE High
Performance Distributed Computing (HPDC-12 2003),
IEEE Computer Society Press, Seattle, WA, USA,
2003, pp. 270282. - Perf results on R-GMA are outdated, but basics
still do hold - MDS2 has since been replaced with MDS4 (in GT4)
- The three systems selected are indicative of the
different classes, and thus its a very valuable
comparison! - Data in the next slides by Jennifer Schopf
- from the GridForum NL/ISOC NL Masterclass 2005
34Roles of components in the comparison
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
35Performance analysis
- Three characteristics systems
- MDS2 (pull system, with and without caching)
- R-GMA (hybrid, straight GMA implementation
w/Relational IF) - Hawkeye (push system, from Condor)
- Tests done on a small test bed (7 systems)
- scaling has not been tested
- but results are at least comparable
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
36Performance analysis other facts
- Keep in mind that MDS2 Hawkeye are programmed
in C - R-GMA is in Java
- This R-GMA version relied heavily on threads
- i.e. implementation was straight translation of
architecture - JVM and Linux kernel 2.4 dont like too many
O(500) threads
37Model for evaluation
- paper attempts to compare similar properties in
the three systems - deploy in a standard mode (as depicted)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
38Experiments in Zhang et al.
- How many users can query an information server at
a time? - How many users can query a directory server?
- How does an information server scale with the
amount of data in it? - How does an aggregator scale with the number of
information servers registered to it?
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
39Experiments
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
40Comparing Information Systems
- We also looked at the queries in depth -
NetLogger - 3 phases
- Connect, Process, Response
Response
Process
Connect
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
41Testbed
- Lucky cluster at Argonne
- 7 nodes, each has two 1133 MHz Intel PIII CPUs
(with a 512 KB cache) and 512 MB main memory - Users simulated at the UC nodes
- 20 P3 Linux nodes, mostly 1.1 GHz
- R-GMA has an issue with the shared file system,
so we also simulated users on Lucky nodes - All figures are 10 minute averages
- Queries happening with a one second wait between
each query (think synchronous send with a 1
second wait)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
42Metrics
- Throughput
- Number of requests processed per second
- Response time
- Average amount of time (in sec) to handle a
request - Load
- percentage of CPU cycles spent in user mode and
system mode, recorded by Ganglia - High when running small number compute intensive
aps - Load1
- average number of processes in the ready queue
waiting to run, 1 minute average, from Ganglia - High when large number of aps blocking on I/O
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
43Information Server Throughputvs. Number of Users
(Larger number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
44Query Times
400 users
50 users
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
45Experiment 1 Summary
- Caching can significantly improve performance of
the information server - Particularly desirable if one wishes the server
to scale well with an increasing number of users - When setting up an information server, care
should be taken to make sure the server is on a
well-connected machine - Network behavior plays a larger role than
expected - If this is not an option, thought should be given
to duplicating the server if more than 200 users
are expected to query it
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
46Directory Server Throughput
(Larger number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
47Directory Server CPU Load
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
48Query Times
400 users
50 users
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
49Experiment 2 Summary
- Because of the network contention issues, the
placement of a directory server on a highly
connected machine will play a large role in the
scalability as the number of users grows - Significant loads are seen even with only a few
users, it will be important that this service be
run on a dedicated machine, or that it be
duplicated as the number of users grows.
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
50Information Server Scalabilitywith Information
Collectors
(Larger number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
51Experiment 3 Load Measurements
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
52Experiment 3 Query Times
80 Info Collectors
30 Info Collectors
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
53Sample Query
Note log scale
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
54Experiment 3 Summary
- The more the data is cached, the less often it
has to be fetched, thereby increasing throughput - Search time isnt significant at these sizes
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
55Aggregate Information Server Scalability
(Larger number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
56Load
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
57Query Response Times
400 Info Servers
50 Info Servers
(Smaller number is better)
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
58Experiment 4 Summary
- None of the Aggregate Information Servers scaled
well with the number of Information Servers
registered to them - When building hierarchies of aggregation, they
will need to be rather narrow and deep having
very few Information Servers registered to any
one Aggregate Information Server.
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
59Overall Results
- Performance can be a matter of deployment
- Effect of background load
- Effect of network bandwidth
- Performance can be affected by underlying
infrastructure - LDAP/Java strengths and weaknesses
- Performance can be improved using standard
techniques - Caching multi-threading etc.
ideas, graphics, results J. Schopf, GFNL
masterclass 2005 Distributed Monitoring and
Information Services for the Grid
60Observations on the performance study
- Measures performance, not stability
- test bed size is only 7 machines and 10 clients
- local cluster, i.e. latency is well controlled
- In a real-life deployment, complexity is
determining factor in success - simple systems are more likely to survive
- systems with soft-state registration timeouts
(like MDS) are more prone to instabilities than
systems based on a persistent elephant-style
memory (like BDII) (c.f. sypical signal
processing issues)
61Assorted Issues
62Access Control
- AuthN is simple
- keeps out 99 of the rogue information
- doesnt do a bit for privacy preservation
- not every cert owner is a grid user for a
specific infrastructure - course grained ACLs better
- grid-mapfile, ACLs on access to service
- keeps known bad guys out
- still no privacy
- fine-grained acls
- support within the DB engine is actually
requiredas its too hard to retro-fit otherwise
63Timeout issues
- differences in timeouts in information providers
lead to phase difference effects in the system - temporary amnesia of aggregate information
indices - cumulative delays
- Timeouts
- registration period for GRIS with a GIIS
- time to bind to GRIS (defines if a resource is up
or down) - time to produce information entries
- cache TTL in the GIIS
- timeout before removing stale information from
GIIS - essentially its feed-back signal theory -)
64Content of the Information System
65Approaches to resource information
- Resource description
- GLUE
- CIM
- and similar but slightly different schemas for
ARC and GT2 - Job description
- Unicores AJO
66Information Schemas GLUE
- Describes resource availability information
- Common for various middleware suites
- Known limitations
- not even all specified info is actually used
- contains lots of info that are un-used
- cannot express information needed for brokering
at the appropriate granularity level (this is
fundamental for all such information schemas) - More specifics discussed with each component
- See http//infnforge.cnaf.infn.it/glueinfomodel/
67Glue Abstractions
- Core Entities
- Site name, contact info, latitude/longitude,
sponsor - Service type, version, endpoint, status, WDSL
URL, Semantics URN, StartTime - Cluster
- ComputingElement Info, State, Policy, ACBRule,
- VOView ACBRule, Running, Waiting, Free, ERT,
WRT, - SubCluster HostOperatingSystem, HostAppSWRTEnv,
- StorageElement
68GLUE Core Schema
69GLUE Cluster
70GLUE Storage
71GLUE Linking compute and storage
- Useful is storage is accessible via POSIX, or via
faster networks - position of such a binding is difficult
- abused for pure-SE info as this is the only place
where the file path to the storage was specified
72Alternative schemas with the same viewpoint
- Original GT2 schema (obsolete)
- NorduGrid ARC
73CIM Common Information Model
- object oriented abstraction of information (DMTF)
- uses abstractions, dependencies, inheritance
- goes beyond a mere information model by
- defining methods for standard object behaviour
- trying to solve every possible problem (and solve
the perpetuum mobile issue in the process ) - information components of CIM can use used to
represent resources
74Common Information Model (CIM)
- Object-oriented schema developed by the DMTF
- representation in different formats (such as XML)
- See http//www.dmtf.org/standards/cim/
- Extended for grid elements by the GCS-WG
- BatchService, c
- The NAREGI grid is main user of this system
75Example CIM Job Submission Interface
76The Unicore information model
- Describe the resource requests(so opposite
viewpoint compared to GLUE) - the resources themselves need not be described,
since they will bid on the job requests - we will deal with this one in the Brokering CE
lecture
77Summary
- Information systems used across multiple
organisations and by multiple people or VOs - taxonomy classiciation (republishing data flow)
- Any grid information system needs
- programmatic access via producer/consumer APIs
- compositional IS freedom (VO or infrastructure
hierarchies) - focus has been on resource selection
- used for brokering decisions, either by people or
programs - needs a common information schema or translators
- for application-level information systems
- user-defined schema and a schema registry (like
R-GMA)