HEP Applications Evaluation of the EDG Testbed and Middleware - PowerPoint PPT Presentation

About This Presentation

Title:

HEP Applications Evaluation of the EDG Testbed and Middleware

Description:

Number of Views:57

Avg rating:3.0/5.0

Slides: 15

Provided by: wp0

Category:

more less

Transcript and Presenter's Notes

Title: HEP Applications Evaluation of the EDG Testbed and Middleware

1
HEP Applications Evaluation of the EDG Testbed
and Middleware

2
Introduction

4

5
Use Case Analysis

EDG release 2.0 has been evaluated against the
HEPCAL Use Cases
Of the 43 Use Cases
13 (was 10) are fully implemented
4 (was 8) are largely satisfied, but with some
restrictions or complications
11 (was 8) are partially implemented, but have
significant missing features
15 (was 17) are not implemented
Missing functionality is mainly in
Virtual data (not considered by EDG)
Metadata catalogues and file collections (still
needs more work)
Authorisation, job control and optimisation
(partly delivered but not integrated)

6
Lessons Learnt - General

Having real users on an operating testbed on a
fairly large scale is vital many problems
emerged which had not been seen in local testing.
Problems with configuration are at least as
important as bugs - integrating the middleware
into a working system takes as long as writing
it!
Grids need different ways of thinking by users
and system managers. A job must run anywhere it
lands. Sites are not uniform so jobs should make
as few demands as possible.

7
Job Submission

Limitations seen in 1.4 are largely gone
Efficiency over 90 in stress tests (1600 jobs)
Failures are 1 in normal use (after
resubmission)
Most failures now at globus/site level, not
broker
Can still be sensitive to poor or incorrect
information from Information Providers
Info providers have improved, configuration
generally better
No black hole sites lately (but still possible)
Still hard to diagnose errors (invalid script
response???)
Advanced features (checkpointing, DAGMAN,
interactivity, accounting, ) largely untested,
some not integrated

8
Information Systems

R-GMA is a big improvement on MDS
Tables, SQL queries, much easier to publish,
Largely a personal view, experiments have mostly
not used it yet
Took a very long time to become stable during
the D8.4 evaluation R-GMA availability was O(75)
Latest version installed for the EU review looks
much better total end-to-end efficiency now gt
95, R-GMA is 100 (but testbed is now lightly
loaded)
NO SECURITY!
And no Registry/schema replication
Need to check published information for accuracy
(or at least sanity!)
GLUE schema is not in EDG/LCG control, and has
proved very hard to change

9
Replica Management

Now mostly just works
Command line tools are fairly intuitive
Sometimes processes can hang
Orphan processes sometimes left behind when job
ends
Some inconsistencies found when used with POOL
Interaction with SE schema is still unclear
Works, but gives artificial restrictions on NFS
access
Bulk operations, mirroring and client-server
architecture lost with GDMP
Java command-line tools are very slow (tens of
seconds)
Fault tolerance is important error conditions
should leave things in a consistent state,
failures should be re-tried where possible

10
Replica Catalogues

11
Mass Storage

12
VO Management

13
User View of the Testbed

Site configuration is very complex, there is
usually one way to get it right and many ways to
be wrong
LCFG is a big help in ensuring uniform
configuration
Middleware should be self-configuring (and
self-checking) as far as possible
Need well-defined certification procedures,
checked on an ongoing basis (sites decay with a
half-life of a few weeks)
Services should fail gracefully when they hit
resource limits
The grid must be robust against failures and
misconfiguration. Large grids will always be
broken, so errors are not exceptional!
Many HEP experiments require outbound IP
connectivity from worker nodes
Still no solution, discussion is needed
Scalability? Still only 20 sites 1 job/minute!