Title: Performance Comparison of Grid Information Services
1Performance Comparison of Grid Information
Services
- Beth Plale
- Computer Science Dept.
- Indiana University
- Unified Relational GIS Project
- Collaborative project with
- Peter Dinda, Northwestern University
2Schemas in performance evaluation influenced by
- Key Concepts and Services of a Grid Information
Service, Beth Plale, Peter Dinda, Gregor von
Laszewski, IASTED Parallel and Distributed
Computing Systems (PDCS), September 2002
3Types of Resource Information
Grid Entity Description
Organizations Accountable bodies and owners of resources
People Resource admins, resource providers, GIS admins
Physical resources Compute resources, network interfaces, benchmark results, number of users, load
Services Job manager, load leveler, other GIS
Comm resources Link capacity, switch capacity, error rate, drop rate
Software packages BLAS, LAPACK, etc.
Event producers Generators of event streams
Event channels Event stream propagation vehicle
Event dictionaries List of commonly used event types
Instruments Radar systems, telescopes, etc.
Network paths Available bandwidth and expected latency
Network topologies Hosts, switches, routers
Wireless devices Wireless hosts, wavepoints, cells, etc.
Virtual organizations Groups of collaborators
4Criteria for Inclusion in GIS
- Defn object in repository represents entity in
real-world grid - Grid entity has representation in GIS repository
if grid entity - can be described
- has value to more than one application
- has persistency needs beyond single application
run
5Services Provided by GIS
- Query interface request for information through
query language - e.g., SELECT FROM WHERE in SQL
- Update interface request to add/update
information in repository - e.g., UPDATE in SQL
- Management interface activation, deactivation of
service
6Additional GIS Functionality
- Replication
- Provision of replica transparency
- Distribution (a grid-driven necessity)
- Partitioning of information across sites.
-
- Security interface
- Object level or column level?
- Access control
7GCE testbed portal
View of GIS service Interoperability
1.
Xpath query
XML doc
GCE testbed XML schema
Xpath query
Xpath query
2.
XML doc
converter
SQL query
3.
LDAP query
XML db
mySQL
LDAP
Xindice
8Benchmark Evaluation of Alternate GIS
Representations
- Evaluation of three databases relational
(mySQL), LDAP (openLDAP), and XML (Xindice) - Database schemas derived from single ER diagram
and based partly on GLUE v8 - Benchmark set of query and update use cases
derived from Grid job submission. - Cost metric minimized query response times,
minimized update times, and minimized size of
resulting query set.
9Benchmark Evaluation Assumptions
- Grid entities have complex relationships.
- The questions asked of GIS data are becoming more
complex. - Some entities require extremely rapid update
rates. - Thus a cost metric that considers multiple
aspects - Minimized query response times,
- Minimized update times, and
- Minimized size of resulting query set.
10Benchmark Evaluation
GCE XML
GLUE v8
E-R diagram
input schemas
represent as
transform into schema for
relat- ional (mySQL)
LDAP (open LDAP)
Grid GIS Benchmark Use Cases
XML (Xindice)
evaluate against
populate by
GCE job submission use cases
scripts and existing data
11Set I 05-02, large multi-site project
Set II 01-02, large academic HPC site
Object classes Classes w/ instances Object instances
30 10 242
Object classes Classes w/ instances Object instances
19 5 106
Top 5 classes -- MDSDevice -- HostInfo --
MDSDeviceGroup -- top -- MDSSoftware
Top 5 classes -- Globus Queue --
GlobusServicesJobMgr -- GlobusNetworkInterface --
GlobusPhysicalResource -- GlobusDaemon
36.5 24.5 13.5 8.5 7.0 ------- 90.0
42.0 26.0 17.5 8.0
6.0 ------- 100.0
Top 5 classes -- GlobusFileInstance --
GlobusQueueEntry -- GlobusQueue --
GlobusOrganization -- GlobusServiceJobManager
Set III 11-00, DOE site
80.0 6.5 3.2 1.8 1.8 ------- 94.5
Object classes Classes w/ instances Object instances
31 19 17531
12E-R Diagram
computing elements
users
application sources
network cards
has
has
clusters
instan from
use
user accounts
has
has
end points
applications
subclusters
network benchmarks
run on
host, port, protocol
has
nodes
has
is-a
is-a
end-to-end connections
hosts (compute nodes)
network nodes
network paths
traceroute packet loss, latency.roundtripDelay.pin
g, bandwidth.avail.TCP.singleStream
GLUE v8
13Relational (table) representation
computing elements
users
application sources
network cards
clusters
user accounts
applications
end points
subclusters
host, port, protocol
end-to-end connections
traceroute packet loss, latency.roundtripDelay.pin
g, bandwidth.avail.TCP.singleStream
14Hierarchical representation
EDTtop
network nodes
compute elements
user
network path
clusters
application sources
connections
user accounts
application
subclusters
hosts (compute nodes)
endpoints
15Benchmark set of Use Cases of GIS query and
update
- Use cases based on job submission.
- examples drawn from HotPage (M. Thomas)
- Query 1 Suppose user is part of NPACI
organization and knows his/her binary runs better
on T3E. - Of machines in NPACI organization, give me list
of T3Es and their location for which availability
is good, a binary is resident, and I have an
account.
16Return machines and locations
SELECT C.CPUmodel, C.name, C.location FROM
Cluster as C, SubCluster as SC, Host as H,
Application as A, UserAccount as UA, User as
U WHERE C.Organization NPACI and
SC.OwningCluster C.ClusterName and
SC.CPUModel T3E and A.OSName SC.OSName
and A.Owner Jane Lee and A.Location
C.Location For All H where H.OwningCluster
C.ClusterName avg(H.SMPLoad1minX100 lt 0.50)
C.ClusterUniqueID UA.ID and UA.ID U.ID
and U.Name Jane Lee and UA.ExpireDate gt
21-July-2002 and UA.ActivateDate lt
21-July-2002
Cluster is NPACI and user has binary on machine
Availability is good
User has valid account on cluster
-gt GLUEv8
17- Of machines in NPACI organization, give me list
of T3Es and their location for which availability
is good, a binary is resident, and I have an
account. - availability is good could be defined
different - -- Defined here as average load over all nodes
in a SMP is less than .50. - -- More difficult is existence of 20 contiguous
nodes. - Binary is resident is fairly easy, binary is
nearby is a harder question to answer. - Show histographic usage of my job or show
historical usage of machine X for task Y where Y
is job submission or transfer rate to HPSS
18(No Transcript)
19Benchmark Evaluation
GCE XML
GLUE v8
E-R diagram
input schemas
relat- ional (mySQL)
LDAP (open LDAP)
Grid GIS Benchmark Use Cases
XML (Xindice)
GCE job submission use cases
scripts and existing data
http//www.cs.indiana.edu/plale