EGEE Middleware Robin Middleton with much (most) material from Bob Jones Frederic Hemmer Erwin Laure - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

EGEE Middleware Robin Middleton with much (most) material from Bob Jones Frederic Hemmer Erwin Laure

Description:

EGEE is a project funded by the European Union under ... V. Breton. F. Harris. J-P. Gautier. J. Orellana. A. Edlund. F. Gagliardi. B. Jones. A. Blatecky ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 48
Provided by: robinmi8
Category:

less

Transcript and Presenter's Notes

Title: EGEE Middleware Robin Middleton with much (most) material from Bob Jones Frederic Hemmer Erwin Laure


1
EGEE Middleware Robin Middletonwith much
(most) material fromBob JonesFrederic
HemmerErwin Laure
GridPP EB-TB Meeting, 13th May 2004
www.eu-egee.org
EGEE is a project funded by the European Union
under contract IST-2003-508833
2
Contents
  • Introduction
  • EGEE structure, activities,
  • Middleware JRA1
  • Organisation
  • Design Team
  • Service Oriented Architecture
  • Initial Components
  • EGEE Middleware Prototype
  • Integration, Testing SCM
  • JRA1 External Links

3
Introduction
4
Overview
  • 70 partners (funded) many unfunded
    contributions
  • 11 Federations
  • 32M EU funds ? 60M in total
  • 2 years (initially)
  • started 1st April 2004
  • The EGEE Vision
  • To deliver production level Grid services, the
    essential elements of which are manageability,
    robustness, resilience to failure, and a
    consistent security model, as well as the
    scalability needed to rapidly absorb new
    resources as these become available, while
    ensuring the long-term viability of the
    infrastructure.
  • To carry out a professional Grid middleware
    re-engineering activity in support of the
    production services. This will support and
    continuously upgrade a suite of software tools
    capable of providing production level Grid
    services to a base of users which is anticipated
    to rapidly grow and diversify.
  • To ensure an outreach and training effort which
    can proactively market Grid services to new
    research communities in academia and industry,
    capture new e-Science requirements for the
    middleware and service activities, and provide
    the necessary education to enable new users to
    benefit from the Grid infrastructure.

5
EGEE Implementation
  • From day 1 (1st April 2004)
  • Production grid service based on the LCG
    infrastructure running LCG-2 grid middleware (SA)
  • LCG-2 will be maintained until the new generation
    has proven itself (fallback solution)
  • In parallel develop a next generation grid
    facility (JRA)
  • Produce a new set of grid services according to
    evolving standards (Web Services)
  • Run a development service providing early access
    for evaluation purposes
  • Will replace LCG-2 on production facility in 2005

6
EGEE Activities
7
Orientation
Equivalent EDG Work Packages / Groups WP6 WP7 WP
1-5 6 QAG Security Group WP7 WP12 WP11 WP11 WP8
-10 ?
  • EGEE includes 11 activities
  • Services
  • SA1 Grid Operations, Support and Management
  • SA2 Network Resource Provision
  • Joint Research
  • JRA1 Middleware Engineering and Integration
  • JRA2 Quality Assurance
  • JRA3 Security
  • JRA4 Network Services Development
  • Networking
  • NA1 Management
  • NA2 Dissemination and Outreach
  • NA3 User Training and Education
  • NA4 Application Identification and Support
  • NA5 Policy and International Cooperation

8
Services Activities
  • SA1 Grid Operations Support
  • Objectives Create operate a production quality
    infrastructure
  • 48 partners, approx 45 of total project budget
  • regional structure
  • Builds on the existing LCG infrastructure to
    provide expanded grid facility for many
    application domains
  • SA2 Network Resource Provision
  • Objectives Ensure EGEE access to network
    services provided by GEANT and the NRENs to link
    users, resources and operational management
  • 3 partners, approx 1.5 of total project budget
  • Most work will be associated with defining
    SLR/S/As

9
Joint Research Activities
  • JRA1 Middleware Engineering and Integration
  • Objectives
  • Provide robust, supportable middleware components
  • Integrate grid services to provide a consistent
    functional basis for the EGEE grid infrastructure
  • Verify the middleware forms a dependable and
    scalable infrastructure that meets the needs of a
    large, diverse eScience user community
  • 5 partners, approx 16 of total project budget
  • Middleware design team active
  • Core software team has been working quickly to
    produce the design of an initial prototype
  • Taking input from HEP ARDA project as well as
    final requirements/assessments from EDG project
  • Initial prototype foreseen at end of April
  • Not all services implemented, not for general
    distribution
  • EDG testbed infrastructure being reused for JRA1
    clusters

10
Joint Research Activities (II)
  • JRA4 Network Services Development
  • Objectives Network oriented joint research to
    provide end-to-end services
  • Network reservation, performance monitoring and
    diagnostics tools
  • Explore links to how Grid resources are
    organise/allocated
  • Investigation of potential impact IPv6 on grids
  • 5 partners, approx 2.5 of total project budget
  • Tight collaboration with DANTE and the NRENs,
    especially through future GN2 project and
    potential network oriented FP6 projects
  • JRA3 Security
  • Objectives Enable secure European Grid
    infrastructure operation
  • Overall security architecture and framework
  • Policies to be adopted by other EGEE activities
    (middleware, operations etc.)
  • 5 partners, approx 3 of total project budget
  • JRA2 Quality Assurance
  • Objectives Foster production delivery of
    quality Grid software operations.
  • 2 partners, approx 2 of total project budget
  • Many procedures and guidelines already defined

11
Networking Activities
  • NA4 Application identification and support
  • Objectives Identify and support a broad range of
    applications from diverse domains, starting with
    the pilot domains HEP and Biomedical
  • 20 partners, approx 12.5 of total project
    budget
  • ARDA project is interface with HEP applications
  • Initial BMI applications identified
  • Industrial forum set-up in a self-financing mode
  • NA3 User Training and Induction
  • Objectives Develop training programme addressing
    beginners and advanced users. Internal EGEE
    induction courses.
  • 22 partners, approx 4 of total project budget
  • Plans for initial training courses well advanced
  • Will be able to offer training in the summer on
    dedicated infrastructure
  • NA2 Dissemination and Outreach
  • Objectives Disseminate the benefits of the EGEE
    infrastructure to new user communities
  • 20 partners, approx 5 of total project budget

12
Whos who
13
JRA1
  • Middleware (Re-)Engineering Integration

14
JRA1 Organisation
15
Software Clusters
  • Tools, Testing Integration (CERN) clusters
  • Development clusters
  • UK
  • CERN
  • IT/CZ
  • Nordic
  • Clusters have a reasonable sized (distributed)
    development testbed
  • Taken over from EDG
  • Nordic cluster to be finalized
  • Link with Integration Tools clusters
    established
  • Clusters up and running!
  • Nordic (security) cluster ? JRA3

16
Design Team
  • Formed in December 2003
  • Current members
  • UK Steve Fisher
  • IT/CZ Francesco Prelz
  • Nordic David Groep
  • VDT Miron Livny
  • CERN Predrag Buncic, Peter Kunszt,
    Frederic Hemmer, Erwin Laure
  • Started service design based on component
    breakdown defined by the LCG ARDA RTAG
  • Leverage experiences and existing components from
    AliEn, VDT, and EDG.
  • A working document
  • Overall design APIs
  • https//edms.cern.ch/document/458972
  • Basis for architecture (DJRA1.1) and design
    (DJRA1.2) document

17
Guiding Principles
  • Lightweight (existing) services
  • Easily and quickly deployable
  • Interoperability
  • Allow for multiple implementations
  • Resilience and Fault Tolerance
  • Co-existence with deployed infrastructure
  • Run as an application
  • Service oriented approach
  • Follow WSRF standardization
  • No mature WSRF implementations exist to date,
    hence start with plain WS WSRF compliance is
    not an immediate goal
  • Review situation end 2004

18
High Level Service Decomposition
  • Taken from the ARDA blueprint

Nordic
  • Some services have no clear attribution to
    cluster(according to TA)
  • Some services involve collaboration of multiple
    clusters

19
Initial Focus
  • Data management
  • Storage Element
  • SRM based allow POSIX-like access
  • Workload management
  • Computing Element
  • Allow pull and push mode
  • Information and monitoring
  • Security
  • Need to integrate components with quite different
    security models
  • Start with a minimalist approach based on VOMS
    and myProxy

20
Storage Element
  • Strategic SE
  • High QoS reliable, safe..
  • Has usually an MSS
  • Place to keep important data
  • Needs people to keep running
  • Heavyweight
  • Tactical SE
  • Volatile, lightweight space
  • Enables sites to participate in an opportunistic
    manner
  • Lower QoS

21
Storage Element Interfaces
  • SRM interface
  • Management and control
  • SRM (with possible evolution)
  • Posix-like File I/O
  • File Access
  • Open, read, write
  • Not real posix (like rfio)

Management
SRM interface
POSIXAPI File I/O
User
rfio
dcap
chirp
aio
dCache
NeST
Castor
Disk
22
Catalogs
  • File Catalog
  • Filesystem-like view on logical file names
  • Replica Catalog
  • Keep track of replicas of the same file
  • (Meta Data Catalog)
  • Attributes of files on the logical level
  • Boundary between generic middleware and
    application layer

23
Files and Catalogs Scenario
24
Computing Element
  • Layered service interfacing
  • various batch systems (LSF, PBS, Condor)
  • Grid systems like GT2, GT3, and Unicore
  • CondorG as queuing system on the CE
  • Allows CE to be used in push and pull mode
  • Call-out module to change job ownership
    (security)
  • Lightweight service
  • should be possible to dynamically install e.g.
    within an existing globus gatekeeper

25
Information Service
  • Adopt a common approach to information and
    monitoring infrastructure.
  • There may be a need for specialised information
    services
  • e.g. accounting, package management, grid
    information, monitoring, provenance, logging
  • these may be built on an underlying information
    service
  • A range of visualisation tools may be used
  • Using R-GMA


26
Authentication/Authorization
  • Different models and mechanisms
  • Authentication based on Globus/GSI, AFS, SSH,
    X509, tokens
  • Authorization
  • AliEn exploits mechanism of RDBMS backend
  • EDG gridmap file VOMS credentials and
    LCAS/LCMAPS
  • VDT gridmap file CAS, VOMS (client)
  • Security and protection at a level acceptable by
    fabric managers and end users needs to be
    discussed and blessed in advance.

27
A minimalist approach to security
  • Need to integrate components with quite different
    security model
  • Start with a minimalist approach
  • Based on VOMS (proxy issuing) and myProxy (proxy
    store)
  • User stores proxy in myProxy from where it can be
    retrieved by access services and sent to other
    services
  • Credential chain needs to be preserved
  • Allow service to authenticate client
  • Local authorization could be done via LCAS if
    required
  • User is mapped to group accounts or components
    like LCMAPS are used to assign local user identity

28
Towards a prototype
  • Focus on key services discussed exploit existing
    components
  • Initially an ad-hoc installation at Cern and
    Wisconsin
  • Aim to have first instance ready by end of April
  • Open only to a small user community
  • Expect frequent changes (also API changes) based
    on user feedback and integration of further
    services
  • Enter a rapid feedback cycle
  • Continue with the design of remaining services
  • Enrich/harden existing services based on early
    user-feedback

This is not a release! Its purely an ad-hoc
installation
29
Planning
  • Evolution of the prototype
  • Envisaged status at end of 2004
  • Key services need to fulfill all requirements
    (application, operation, quality, security, )
    and form a deployable release
  • Remaining services available as prototype
  • Need to develop a roadmap
  • Incremental changes to prototype (where possible)
  • Early user feedback through ARDA and early
    deployment on SA1 pre-production service
  • Detailed release plan being planned
  • Converge prototype work with integration
    testing activities
  • Need to get rolling now!
  • First components will start using SCM in May

30
Integration
  • A master Software Configuration Plan is being
    finalized now
  • It contains basic principles and rules about the
    various areas of SCM and Integration (version
    control, release management, build systems, bug
    tracking, etc)
  • Compliant with internationally agreed standards
    (ISO 10007-2003 E, IEEE SCM Guidelines series)
  • Most EGEE stakeholders have already been involved
    in the process to make sure everybody is aware
    of, contributes to and uses the plan
  • An EGEE JRA1 Developer's Guide will follow
    shortly in collaboration with JRA2 (Quality
    Assurance) based on the SCM Plan
  • It is of paramount importance to deliver the plan
    and guide as early as possible in the project
    lifetime

31
Testing
  • The 3 initial testing sites are CERN, NIKHEF and
    RAL
  • More sites can join the testing activity at a
    later stage !
  • Must fulfil site requirements
  • Testing activities will be driven by the test
    plan document
  • Test plan being developed based on user
    requirements documents
  • Application requirements from NA4 HEPCAL III,
    AWG documents, Bio-informatics requirements
    documents from EDG
  • Deployment requirements being discussed with SA1
  • ARDA working document for core Grid services
  • Security work with JRA3 to design and plan
    security testing
  • The test plan is a living document it will
    evolve to remain consistent with the evolution of
    the software
  • Coordination with NA4 testing and external groups
    (e.g. Globus) established
  • Solid steps towards MJRA1.3 (PM5)

32
Convergence with Integr Tstg
  • Development clusters need to get used to SCM
  • During May, initial components of the prototype
    need to follow SCM
  • Proposed components
  • R-GMA
  • VOMS
  • RLS
  • GFAL (is this 3rd party?)
  • SRM (will there be an EGEE implementation or just
    3rd party?)
  • New developments need to follow SCM from the
    beginning
  • ISSUE perl modules seem not to fit well

33
Convergence with Integr Tstg II
  • IT/CZ
  • Put EDG code under SCM for training purposes and
    be prepared to move components to EGEE when
    needed
  • VOMS in May
  • UK
  • Full R-GMA under SCM in May
  • CERN/DM
  • RLS in May
  • GFAL ?

34
Development Roadmap
  • Prototype work as starting point
  • Priorities need to be adjusted based on user
    feedback
  • Incremental, frequent releases
  • All discussions and decisions take place in the
    design team
  • Project-wide body being formed to oversee this
    activity
  • PTF ? Project Technical Forum
  • Boundary conditions
  • Architecture document due end of Month 3 (June)
  • Design document due end of Month 5 (August)

35
JRA1/SA1 - Process description
  • No official delivery of requirements from SA1 to
    JRA1 stated in the TA
  • The definition, discussion and agreement of the
    requirements has already started, done through
    dedicated meetings
  • This is an ongoing process
  • Not all the requirements defined yet
  • Set of requirements agreed, need basic agreement
    to start working! But can be reviewed at any time
    there is a valid reason for it

36
JRA1/SA1 - Requirements
  • Middleware delivery to SA1
  • Release management
  • Deployment scenarios
  • Middleware configuration
  • JRA1 will provide a standard set of configuration
    files and documentation with examples that SA1
    can use to design tools. Format to be agreed
    between SA1-JRA1
  • It is the responsibility of SA1 to provide
    configuration tools to the sites
  • Enforcement of the procedures
  • Platforms to support
  • Primary platform Red Hat Enterprise 3.0, gcc
    3.2.3 and icc8 compilers (both 32 and 64-bits) .
  • Secondary platform Windows (XP/2003), vc 7.1
    compiler (both 32 and 64-bits)
  • Versions for compilers, libraries, third party
    software
  • Programming languages
  • Packaging and software distribution
  • Others
  • Sites must be allowed to organize the network as
    they wish, internal or external connectivity,
    NAT, firewall, etc, all must be possible, no
    special constraints. WNs must not require
    Outgoing IP connectivity Not inbound
    connectivity either.

37
JRA1/JRA3
  • A lot of progress has been achieved here
  • Security Group formed, JRA1 members identified
  • First meeting scheduled on May 5-6, 2004
  • GAP analysis planned by then
  • VOMS Administration support clarified
  • Handled by JRA3
  • Issue VOMS effort reporting

38
JRA1/JRA4
  • SCM plan presented and discussed
  • More discussions on which components of JRA4 will
    be required in the overall architecture/design
    need to take place

39
American Involvement in JRA1
  • UWisc
  • Miron Livny part of the design Team
  • Condor Team actively involved in reengineering
    resource access
  • In collaboration with Italian Cluster
  • ISI
  • Identification of potential contributions started
    (e.g. RLS)
  • Focused discussions being planned
  • Argonne
  • Collaboration on Testing started
  • Support for key Globus Components enhancements
    being discussed

40
JRA1 and other activities
  • NA4
  • HEP ARDA project started ensures close
    relations between HEP and middleware
  • Bio activities with similar spirit needed
    focused meeting tentatively being planned for May
  • SA1
  • Revision of requirements (platforms)
  • JRA2
  • QAG started
  • Monthly meeting established
  • JRA3
  • Necessary structures established
  • Focused Meeting in May
  • JRA4
  • Architectural components required need to be
    clarified
  • Other projects
  • Potential drain of resources for dissemination
    activities

41
The End
42
(No Transcript)
43
UK Cluster
  • R-GMA
  • Interface to various graphical tools
  • Monitoring largely driven by application and
    infrastructure needs
  • Information system clarify role in
    job-submission/data mgmt cycles (e.g. role of
    GLUE)
  • Interface to other monitoring systems (e.g.
    Grid3)
  • Understand R-GMA role in
  • Accounting
  • Job provenance
  • Logging bookkeeping

44
IT/CZ Cluster
  • Resource Access (aka CE)
  • Interface to various batch systems
  • Starting with the integration of CondorG
  • JobFetcher (implementing pull model)
  • Site policy mgmt, enforcement, and advertisement
  • WMS
  • High level optimizer components at TaskQueue
  • Matchmaking
  • Job adjustment
  • VO policy management and enforcement
  • TaskQueue interactions

45
IT/CZ Cluster II
  • Accounting
  • LCG accounting system (usage records) has to be
    considered
  • Role of DGAS needs to be understood
  • LB
  • Assessment of its role in
  • Accounting
  • Job provenance
  • Relationship to R-GMA
  • VOMS
  • Relationship to JRA3
  • Integration into Access Service

46
CERN/DM Cluster
  • SE
  • Posix-like file I/O
  • GFAL/aio relationship
  • SRM interface
  • Will EGEE provide an implementation will we ship
    an implementation will we just make it a
    requirement?
  • Space reservation not in v1.1 migration path to
    v2.1?
  • File Catalog
  • Schema evolution/customization to different
    user-groups
  • Server implementation
  • Metadata catalog interaction

47
CERN/DM Cluster II
  • Replica Catalog
  • Deployment model (wrt File Catalog)
  • Schema evolution
  • Distributed Catalog
  • Metadata Catalog
  • Mostly in application domain
  • File Transfer Service
  • Overlaps with WMS/CE
  • Local and global (VO) policy enforcement
  • Error handling and recovery transaction handling
    and boundaries load-balancing and fail-over
    modes
  • Upgrade resilience
  • Data subscription service
  • GDMP functionality
  • How does it relate to FTS
  • Integration into Access Service
Write a Comment
User Comments (0)
About PowerShow.com