Globus Toolkit 4: Futures and Open Issues

About This Presentation

Title:

Globus Toolkit 4: Futures and Open Issues

Description:

... end: e.g., future WS control channel. Back-end: e.g., HPSS, cluster file ... the world to do more science more efficiently then ever before. ... and Discovery ' ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 84

Provided by: jennif275

Category:

more less

Transcript and Presenter's Notes

Title: Globus Toolkit 4: Futures and Open Issues

1
Globus Toolkit 4Futures and Open Issues

Jennifer M. Schopf
UK National eScience Centre
Argonne National Lab

2
What is a Grid

Resource sharing
Computers, storage, sensors, networks,
Sharing always conditional issues of trust,
policy, negotiation, payment,
Coordinated problem solving
Beyond client-server distributed data analysis,
computation, collaboration,
Dynamic, multi-institutional virtual orgs
Community overlays on classic org structures
Large or small, static or dynamic

3
Not A New Idea

Late 70s Networked operating systems
Late 80s Distributed operating system
Early 90s Heterogeneous computing
Mid 90s - Metacomputing
Then the Grid Foster and Kesselman, 1999
Also called parallel distributed computing

4
Why is this hard/different?

Lack of central control
Where things run
When they run
Shared resources
Contention, variability
Communication
Different sites implies different sys admins,
users, institutional goals, and often strong
personalities

5
So why do it?

Computations that need to be done with a time
limit
Data that cant fit on one site
Data owned by multiple sites
Applications that need to be run bigger, faster,
more

6
What Is the Globus Toolkit?

The Globus Toolkit is a collection of solutions
to problems that frequently come up when trying
to build collaborative distributed applications
Heterogeneity
Focus on simplifying heterogenity for application
developers
Working towards more vertical solutions in
future versions.
Standards
Capitalize on and encourage use of existing
standards (IETF, W3C, OASIS, GGF).
Reference implementations of new/proposed
standards in these organizations.

7
With Grid Computing Forget Homogeneity!

Trying to force homogeneity on users is futile.
Everyone has their own preferences, sometimes
even dogma.
The Internet provides the model

8
Evolution of the Grid
App-specific Services
Open Grid Services Arch
Web services
Increased functionality, standardization
GGF OGSI, WSRF, (leveraging OASIS, W3C,
IETF) Multiple implementations, including Globus
Toolkit
X.509, LDAP, FTP,
Globus Toolkit
Defacto standards GGF GridFTP, GSI (leveraging
IETF)
Custom solutions
Time
9
Globus is Service-Oriented Infrastructure
Technology

Software for service-oriented infrastructure
Service enable new existing resources
E.g., GRAM on computer, GridFTP on storage
system, custom application service
Uniform abstractions mechanisms
Tools to build applications that exploit
service-oriented infrastructure
Registries, security, data management,
Open source open standards
Each empowers the other
eg monitoring across different protocols is
hard
Enabler of a rich tool service ecosystem

10
Globus Toolkit V4.0

Major release planned April 29th 2005
Second internal release candidate cut yesterday
still on good track for this date
Fifteen months of design, development, and
testing
1.8M lines of code
Major contributions from five institutions
Hundreds of millions of service calls executed
over weeks of continuous operation
Significant improvements over GT3 code base in
all dimensions

11
Our Goals for GT4

Usability, reliability, scalability,
Web service components have quality equal or
superior to pre-WS components
Documentation at acceptable quality level
Consistency with latest standards (WS-, WSRF,
WS-N, etc.) and Apache platform
WS-I Basic (Security) Profile compliant
New components, platforms, languages
And links to larger Globus ecosystem

12
(No Transcript)
13
GT4 Components and Performance

Globus Toolkit Components
Core
Security
Data Management
Resource Management
Monitoring
Performance in the broadest sense of the word.
How fast
How many
How stable
(How easy)
www-unix.globus.org/toolkit/docs/development/4.0-d
rafts/perf_overview.html

14
GT4 Web Services Core

Supports both Globus services (GRAM, RFT,
Delegation, etc.) user-developed services
Redesign to enhance scalability, modularity,
performance, usability
Leverages existing WS standards
WS-I Basic Profile WSDL, SOAP, etc.
WS-Security, WS-Addressing
Adds support for emerging WS standards
WS-Resource Framework, WS-Notification
Java, Python, C hosting environments

15
GT4 Web Services Core
16
Open Source/Open Standards

WSRF developed in collaboration with IBM
Currently in OASIS process
Contributions to Apache for
WS-Security
WS-Addressing
Axis
Apollo (WSRF)
Hermes (WS-Notification)

17
Java Core Performance

Weve been working hard to increase basic
messaging performance
Factor of four improvement over GT3 so far
Reliability
Core can scale to a very large number of
resources (gt10,000)

18
Java Core Messaging Performance
19
GT4 Security Highlights

Standards based support for message level and
transport level security
Transport level is default due to performance
Standards based authorization (SAML) via
Community Authorization Service (CAS) or callouts
Stand-alone delegation service
More authentication options
MyProxy, simpleCA,

20
GT4s Use of Security Standards
21
GT4 Security
Users
22
Security Performance

Weve measured performance for both WS and
transport security mechanisms
See next slide for graph
Transport security is significantly faster than
WS security
We made transport security (i.e. https) our
default
Were working on making it even faster by using
connection caching

23
(No Transcript)
24
GT4 Data Management

Stage large data to/from nodes
Replicate data for performance reliability
Locate data of interest
Provide access to diverse data sources
File systems, parallel file systems, hierarchical
storage (GridFTP)
Databases (OGSA-DAI)

25
GT4 Data Functions

Find your data Replica Location Service
Managing 40M files in production settings
Move/access your data GridFTP, RFT
High-performance striped data movement
Couple data execution management
GRAM uses GridFTP and RFT for staging

26
GridFTP in GT4

100 Globus code
No licensing issues
Stable, extensible
IPv6 Support
XIO for different transports
Striping ? multi-Gb/sec wide area transport
Pluggable
Front-end e.g., future WS control channel
Back-end e.g., HPSS, cluster file systems
Transfer e.g., UDP, NetBLT transport

27
GridFTP Performance

TeraGrid Striping results
30Gbs network, 32 IBM ia64 nodes
Ran varying number of stripes
Ran both memory-to-memory and disk-to-disk

28
Memory to MemoryStriping Performance

High linear scalability (slope near 1)
27 Gbs on a 30 Gbs link (90 utilization) with 32
nodes

29
Disk to Disk Striping Performance

Limited by the storage system
Achieved 17.5 Gbs

30
And in conversation

We think we have hit the limit of python code.
The GridFTP C libraries are delivering data so
fast to the buffers that the python client code
cannot keep up in doing the fseek, fwrite, and
then re-register the data callback. We are going
to have to code our "transfer agents" entirely in
C for S5.Not a bad problem to have
Scott Koranda, Dept of Physics, University of
Minnesota

31
Reliable File TransferThird Party Transfer

Fire-and-forget transfer
Web services interface
Many files directories
Integrated failure recovery

RFT Client
SOAP Messages
Notifications(Optional)
RFT Service
GridFTP Server
GridFTP Server
32
RFT Performance Stats

Current maximum request size is approx 20,000
entries with a default 64MB heap size.
Infinite transfer - LAN
120,000 transfers (servers were killed by
mistake)
Was a good test. Found a corner case where
postgres was not able to perform 3 update
queries / sec and was using up CPU
Infinite transfer WAN
67000 transfers (killed because of the same
reason as above)
Sloan Digital Sky Survey DR3 archive move
900K files, 6 TB
Killed the transfer several times for
recoverability testing
No human intervention has been required to date

33
Replica Location Service

Identify location of files via logical to
physical name map
Distributed indexing of names, fault tolerant
update protocols
GT4 version scalable stable
Managing 40 million files across 10 sites

Index
Index
34
LIGO Use of RLS

Some hands-on numbers
Produce 1 TB per day
8 sites
gt 3 million entries in the RLS
gt 30 million files
This replication of data using RLS and GridFTP
is enabling more gravitational wave data analysts
across the world to do more science more
efficiently then ever before. Globus RLS and
GridFTP are in the critical path for LIGO data
analysis.

35
Data Replication Service (tech preview)

Pull missing files to local site

Site B
Site A
List of required Files
Reliable File TransferService
Data Replication Service
Data Replication Service
Reliable File Transfer Service
GridFTP
Local ReplicaCatalog
Replica LocationIndex
Local Replica Catalog
ReplicaLocationIndex
GridFTP
36
OGSA-DAI

Flexible Composable Middleware
Data access
Relational XML Databases, semi-structured files
Data integration
Multiple data delivery mechanisms, data
translation
Extensible Efficient framework
Request documents contain multiple tasks
A task execution of an activity
Group work to enable efficient operation
Extensible set of activities
gt 30 predefined, framework for writing your own
Moves computation to data
Pipelined and streaming evaluation
Concurrent task evaluation

37
OGSA-DAI

Current Release Release 5 in GT4
Added Installation wizards indexed files
gt1100 registered users we know about
Running on 3 message passing infrastructures
Release 6 May 2005
Improved client side API
Explicit control of sequential parallel tasks
Dynamic reconfigurability
WS-DAI reference implementation

38
Execution Management (GRAM)

Common WS interface to schedulers
Unix, Condor, LSF, PBS, SGE,
More generally interface for process execution
management
Lay down execution environment
Stage data
Monitor manage lifecycle
Kill it, clean up
A basis for application-driven provisioning

39
GT4 GRAM

2nd-generation WS implementation
optimized for performance, stability,
scalability
Streamlined critical path
Use only what you need
Flexible credential management
Credential cache delegation service
GridFTP RFT used for data operations
Data staging streaming output
Eliminates redundant GASS code
Single and multi-job support

40
GT4 GRAM StructureWSRF/WSN Poster Child
Service host(s) and compute element(s)
GT4 Java Container
Compute element
Local job control
GRAM services
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Delegation
Client
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
41
Some of our Goals

GRAM should add little to no overhead compared
to an underlying batch system
Submit as many jobs to GRAM as is possible to the
underlying scheduler
Goal - 10,000 jobs to a batch scheduler
Goal efficiently fill the process table for
fork scheduler
Submit/process jobs as fast to GRAM as is
possible to the underlying scheduler
Goal - 1 per second
We are not there yet
A range of limiting factors at play

42
Design Decisions

Efforts and features towards the goal
Allow job brokers the freedom to optimize
E.g. Condor-G is smarter than globusrun
Protocol steps made optional and shareable
Reduced cost for GRAM service on host
Single WSRF host environment
Better job status monitoring mechanisms
More scalable/reliable file handling
GridFTP and RFT instead of globus-url-copy
Removal of non-scalable GASS caching
GT4 tests performing better than GT3 did
But more work to do

43
GRAM 3.9.4 performance

Throughput
Test Simple job to fork scheduler (/bin/date)
no staging, streaming, or cleanup
77 jobs/min sustained
60 jobs/minute with delegation
Long Running test
Ran 500,000 sequential jobs over 23 days
These included staging, delegation, fork job
manager

44
Gram Performance (2)

Concurrency
Job submits to Condor scheduler (long running
sleep job) no staging, streaming, or cleanup no
delegation
Current limit is 32,000 jobs due to a Linux
directory limit
using multiple sub-directories will resolve this,
look for this in 4.2

45
Monitoring and Discovery

Every service should be monitorable and
discoverable using common mechanisms
WSRF/WSN provides those mechanisms
A common aggregator framework for collecting
information from services, thus
Index Service Registry supporting Xpath queries,
with caching
Trigger Service perform action on condition
Deep integration with Globus containers
services every GT4 service is discoverable
GRAM, RFT, GridFTP, CAS,

46
GT4 Monitoring Discovery
Clients (e.g., WebMDS)
GT4 Container
WS-ServiceGroup
MDS-Index
Registration WSRF/WSN Access

adapter
GT4 Cont.
GT4 Container
MDS-Index
MDS-Index
Custom protocols for non-WSRF entities
Automated registration in container
GridFTP
RFT
GRAM
User
47
MDS4 Extensibility

Aggregator framework provides
Registration management
Collection of information from Grid Resources
Plug in interface for data access, collection
,query,
WebMDS framework provides for customized display
XSLT transformations

48
MDS4 in 3.9.5Index Query Performance

Small queries 10 minute averages
Message size 7.5 KB
Requests processed 11262
Average round-trip time in milliseconds 16
Medium queries 10 minute averages
Message Size 32KB
Queries processed 6232
Average round-trip time in milliseconds 29

49
Long Running Test

Ran 14 days (killed by accident during other
testing)
Over 94 million requests processed,
76 requests/sec average
13 millisecond average Query RTT
Has also had diperf tests run against it (next
slide)

50
(No Transcript)
51
GT4 Documentationis Much Improved!
52
The Globus Ecosystem

Globus components address core issues relating to
resource access, monitoring, discovery, security,
data movement, etc.
GT4 being the latest version
A larger Globus ecosystem of open source and
proprietary components provide complementary
components
A growing list of components
These components can be combined to produce
solutions to Grid problems
Were building a list of such solutions

53
Many Tools Build on, or Can Contribute to,
GT4-Based Grids

Condor-G, DAGman
MPICH-G2
GRMS
Nimrod-G
Ninf-G
Open Grid Computing Env.
Commodity Grid Toolkit
GriPhyN Virtual Data System
Virtual Data Toolkit
GridXpert Synergy

Platform Globus Toolkit
VOMS
PERMIS
GT4IDE
Sun Grid Engine
PBS scheduler
LSF scheduler
GridBus
TeraGrid CTSS
NEES
IBM Grid Toolbox

54
2005 and Beyond

We have a solid Web services base
We now want to build, on that base, a open source
service-oriented infrastructure
Virtualization
New services for provisioning, data management,
security, VO management
End-user tools for application development
Etc., etc.

55
Globus and its User Community

How can we best support you?
We try to provide the best software we can
We use bugzilla other community tools
We work to grow the set of contributors
How can you best support us?
Become a contributor of software, bug fixes,
answers to questions, documentation
Provide us with success stories that can justify
continued Globus development
Promote Globus within your communities

56
Working with GT4

Download and use the software, and provide
feedback
Join gt4friends_at_globus.org mail list
Review, critique, add to documentation
Globus Doc Project http//gdp.globus.org
Tell us about your GT4-related tool, service, or
application

57
So

GT4 is a significant step forward in the quality,
functionality and standards compliance of GT.
Beta release available for immediate use, final
April 29th
Downloads and docs at
www.globustoolkit.org

2nd Edition www.mkp.com/grid2
58
But
59
Things heard about Grids

Isnt the Grid just a funding construct? (SC
01)
"Grid computing has been more hype than reality,
- Hewlett-Packard CEO Carly Fiorina, 10/03
Customers don't need the Globus Toolkit to do
high-performance compute clusters" - Charles
Fitzgerald, a Microsoft general manager,
Information Week 1/05
We tried to install Globus and found out that it
was too hard to do. So we decided to just write
our own.

60
Where are allthe (happy) users?

In July 04 I spoke with 25 UK user groups, and
on occasion it got ugly
www.nesc.ac.uk/technical_papers/UKeS-2004-08.pdf
Many users have been told to use the Grid to get
funding, not because they actually want to
There are a few well known successes (LHC,
CACTUS, and a couple others) but this isnt
widespread enough to be considered more than a
one-off

61
We expected using Gridsto be a lot of work

Parallel computing showed us that they If you
build it they will come scenario just wont work
Until debuggers, fast compilers, languages,
libraries, etc. the users didnt want to use
parallel machines
Many hundreds, even thousand, of hours went into
re-writing codes for parallel machines

62
but how much is acceptable?

There is the impression (right or wrong) that
only heroic efforts will allow you to use a Grid
Some re-writing of code required
Access to resources isnt easy even once code is
changed

63
Where are we today?

What a user would like
Run my job, finish by lunch
Get a data set that has these attributes
Tell me when that simulation will finish
Where are we today
Specify exact machines, data files, explicit data
transfers, etc
Little (or no) dynamic information or prediction

64
Where are we today (cont)

General agreement we have basic functionality
Tell me what this set of resources look like
Run this job on that resource
Transfer this file
Globus (among others) does give these basic
building blocks (mostly)
General agreement general functionality isnt
enough by far

65
How do we move forward?

Users will only come when they have decent tools
Simple enough for easy use
Robust enough for stupid use
Still allow work arounds for hard-core use
Users are hampered by software that doesnt do
what they need it to
Globus is NOT an end-to-end solution

66
2) Why doesnt Grid softwaredo what we need it
to yet?

Globus doesnt provide end-to-end solutions
Globus Ecosystem Tutorial
Globus is building blocks still missing
vertical solutions
Mismatch between developer vision of use and
users vision of use
Many tools are used off label

67
Off Label Use

Tool built to do A is used for B
This is good since a user has something to use
This is bad since the tool is being in a way that
wasnt envisaged
Arch concerns
Scaling concerns
Etc
But without use of the tool, theres no way to
know how it will be used!

68
What is a usage scenario?

Information from the user about a specific use
case
Whats the right level of detail?
Whats a general use case?
Note much application built software is one-off,
but we need general tools that can adapt
Who does this?
Application scientists and computer scientists
speak different languages (eg. C. Pancake)

69
Will Grid softwareever meet users needs?

Without better communication between developers
and users, the Grid cannot succeed
Grids are about people, not just technology

70
3. Need for StandardsInformation as a Case Study

Open question how should I store the
information about a Grid?
Globus Monitoring and Discovery Service (MDS)
A tool that does streaming data like R-GMA?
A cluster tool over many sites like Ganglia?
A certification tool like Inca from the TG
project?
A Grid-wide data base?
All of these are right for some of the data, no
one is right for all uses

71
Why are so many tools bad?

Large number of tools isnt bad
Large number of tools that have no way to
interoperate is!
Grid3 has 8 different tools in use
LHC has an equal number (at least!)

72
Need for Standard Interfaces

Need for standard APIs and protocols to allow
easier
Access to data sources
Registration of data
Archiving tools
Standards for what information is available
Standards for what that information means
Standards for communication of errors
This is in part what inspired Globuss move to
web services!

73
Standards are a necessity,not a luxury

Without standards of all kinds protocols, APIs,
languages - all the information in the world
wont do us any good
Open question about right process for
standardization
GGF, OASIS, IETF
Need for standards vs standardizing too soon
Need for standard vs time lag for agreement on
standards

74
4. How do we make Grids secure?

Without security we cant have a Grid
EVERYTHING needs to be secure-
Who can run on a machine
File transfers
What data does someone have access to (program
data, system data)
Who can access which services?

75
Security vs. Usability

Users want security but dont want to deal with
it
If security is hard- it wont be used
Most security (including Grid Security
Infrastructure (GSI)) is based on public key
infrastructure (PKI)
Users have files (public and private keys) that
must be secure, use reasonable passwords, etc.

76
What about

Multiple certificates?
Group access?
Dynamic policy changes?
Scalability?
Overheads
Etc., etc., etc

77
Security is an open question

Until security is made easier to use, it wont be
used
Until security is made easier to manage at the
group level, it wont be used
Without security no one will really use the Grid

78
5. Socio-political Issues

Hardest problems are often not technical ones
Multiple administration domains means multiple
policies
Multiple countries means multiple communication
styles
Decisions are often made on non-technical basis

79
Communication is hard

Too many people in the mix
Not everyone is informed of status updates
Often hallway conversation becomes what people
believe
Too often assumptions are not verified
Many communication styles can lead to
misunderstandings

80
What to do?

Ongoing efforts to continue better communication
are needed to build a global community
When in doubt ask someone of directly!
And please constructive criticism, reporting of
errors, etc just saying Globus Sucks simply
isnt helpful ?

81
6 Other open problems

What performance is acceptable?
What do we do about variance?
What about easier testbed setup?
Where are the benefits to encourage sharing on
the Grid?
Where are the benefits for the sys admins users
get a plus, PIs get a plus but what about them?
How do we educate the funding agencies about the
need for hardened software, documentation, and
support?
What cost models are needed by the Grid?
Economic Grids are only the first step
And many more

82
Progress

Significant improvements in security
infrastructure
Basic functionality is much closer
More funding aid for support
Need for better-defined use cases and simpler
deployment has been strengthened, as has the need
for basic information and basic information
services

83
Where are theperformance metrics for success?

No more Grid papers, just a footnote that
states This work was achieved using the Grid
Supercomputer centers dont give a user the
choice of using their machines or the Grid, that
line doesnt exist (TG does this now!)
SuperComputing demos can be run at any time of
the year

84
Conclusion

Many interesting problems are left both in
terms of research and deployment issues
Much work is being done to help address these
open issues
Next years open issues will be different yet

85
References

This talk
www.mcs.anl.gov/jms/Talks (not there yet)
Globus Alliance
www.globus.org
Globus Performance
http//www-unix.globus.org/toolkit/docs/developmen
t/4.0-drafts/perf_overview.html
Journal paper version open questions (dated)
www.mcs.anl.gov/Pubs/jmspubs.html
Conversations with 25 UK User groups
http//www.nesc.ac.uk/technical_papers/UKeS-2004-0
8.pdf

86
Contact Information

Jennifer M. Schopf
jms_at_mcs.anl.gov
www.mcs.anl.gov/jms
Support from DOE, NSF, Microsoft, NeSC, JISC

Slides on how globus works

88
How Globus Works

Globus is a distributed open source community
with many contributors users
CVS, documentation, bugzilla, email lists
Modular structure allows many to contribute
Globus Alliance Board provides governance when
needed
Meritocracy individuals who demonstrate ongoing
contributions commitment
Primarily what to include, when to release
Globus Alliance is an informal partnership of
organizations led by Board members

89
Evolution of the Globus Alliance

Argonne/U.Chicago (Childers, Foster) 1995
USC/ISI (Kesselman) 1995
Edinburgh (Atkinson, Parsons) 2003
Swedish PDC (Johnsson, Mulmo) 2003
NCSA (Welch) 2004
Univa (Czajkowski, Tuecke) 2004
Other contributors will surely be added

90
From eScience to eBusiness

Since 2001, growing interest in Globus for
commercial use
Enterprises, IT vendors, ISVs asking Globus
leaders to address commercial needs
But hard to do in a research laboratory
In response, we have created two new
organizations
Globus Consortium
Univa

91
Globus Consortium(www.globusconsortium.com)

Nonprofit organization funded by companies to
advance Globus Toolkit for enterprise use
Initial sponsor members HP, IBM, Intel, Sun
Initial contributors Nortel, Univa
First two projects already identified
Member-driven software quality improvements
Contributions to job submission standards
Other projects to be defined, e.g.
Develop new features key to enterprise use
Education outreach

Provider of commercial support, services,
products around open source Globus
Commercial distribution of GT4 beyond
Integration with enterprise systems
Committed to open source open standards
Founded by Tuecke, Foster, Kesselman
Tuecke left Argonne to be CEO
Foster, Kesselman remain at Argonne, ISI
Experienced management team
Rich Miller, Vas Vasiliadis, Paul Davé, Bob
Mandel

93
Globus and its User Community

How can we best support you?
We try to provide the best software we can
We use bugzilla other community tools
We work to grow the set of contributors
How can you best support us?
Become a contributor of software, bug fixes,
answers to questions, documentation
Provide us with success stories that can justify
continued Globus development
Promote Globus within your communities

94
Working with GT4

Download and use the software, and provide
feedback
Join gt4friends_at_globus.org mail list
Review, critique, add to documentation
Globus Doc Project http//gdp.globus.org
Tell us about your GT4-related tool, service, or
application

95
So

GT4 is a significant step forward in the quality,
functionality and standards compliance of GT.
Beta release available for immediate use, final
April 29th
Downloads and docs at
www.globustoolkit.org

2nd Edition www.mkp.com/grid2
96

Slides on open issues

97
6. What about performance?

Its not enough to use the Grid, it has to
perform otherwise, why bother?
First prototypes rarely consider perf.
MDS1centralized LDAP
MDS2decentralized LDAP
MDS3decentralized Grid service
MDS4-decentralized Web service
Often performance is simply not known

98
Performance of GIS Information Servers vs. Number
of Users
Zhang, Freschl, and Schopf, A Performance Study
of Monitoring and Information Services for
Distributed Systems, submitted to HPDC 2003.
99
What we found

Performance can be a matter of deployment
Effect of background load
Effect of network bandwidth
Performance can be affected by underlying
infrastructure
LDAP/JAVA strengths and weaknesses
Performance can be improved using standard CS
techniques
Caching multi-threading etc.

100
Moral

Performance should be analyzed early and often
Prototypes should be recognized as such and
thrown out
Without performance, no reason to use a Grid

101
7. What do we do about variance?

Resources on the Grid change with time
Bandwidth
CPU load
Disk space
Memory usage
Queue sizes

102
Variance technical problem

How do you tell if something is slow versus
broken?
How do you make a prediction?

103
Variance socio-political

Users want the same application to take roughly
the same amount of time every time you run it
Our experience a longer running time thats
more predictable is preferred to a high variance,
high risk situation

104
Moral

Variance is here to live with, we need techniques
to take advantage of it

105
8. How do we set up a Grid testbed?

Bill Johnson, LBNL, talks about this often, based
on IPG experience
Get the sys admins involved
Have a standard set-up
Make this a priority at the start of a project
Accounting open issue
Cross-site scheduling open issue
Policies across sites open issue

106
Example Installing Globus

GT2- how do you know its installed ok?
Now have test scripts
http//www-unix.globus.org/toolkit/testing/
But it would be better to have something
automatic
GT3 how many configuration files do I need to
work with?

107
Moral

Users are building testbeds, but this is still
hard
Need to have rule of thumb published for
assistance with this

108
4. How do we understand information once we get
it?

Assume we have access to information about the
Grid can we use it?
Grid3 has 8 different tools in use, that give
conflicting answers
A monitoring system says the load on machine X
is Y
A scheduler wants to evaluate this data
No common language for this to be communicated
Some effort now to come up with a common schema
(GLUE schema, work with CIM in GGF) but this only
touched the surface, no agreement for moving
forward