Title: Grid Analysis Environment GAE
1Grid Analysis Environment (GAE)
2Outline
System View
Frameworks
Early Results
GAE
User View
Associated Projects
3Goal
- Provide a transparent environment for a physicist
to perform his/her analysis (batch/interactive)
in a distributed dynamic environment Identify
your data (Catalogs), submit your (complex) job
(Scheduling, Workflow,JDL), get fair access to
resources (Priority, Accounting), monitor job
progress (Monitor, Steering), get the results
(Storage, Retrieval), repeat the process and
refine results - Support data transfers ranging from the
(predictable) movement of large scale (simulated)
data, to the highly dynamic analysis tasks
initiated by rapidly changing teams of scientist
4System View
(Domain) Applications
(Domain) Portal
Monitoring
(High Level Services) Global Services
Service Oriented Architecture (Frameworks)
Local Services
Network
Compute
Storage
Resources
Interface Specifications!
Development
Support/ Feedback
Testing
System Stages
Deployment
5 System View (Details)
- Domains
- Virtual Organization and Role management
- Service Oriented Architecture
- Authorized Access
- Access Control Management(groups/individuals)
- Discoverable
- Protocols (XML-RPC, SOAP,.)
- Service Version Management
- Frameworks Clarens, MonALISA,..
- Monitoring
- End-to-end monitoring,collecting and
disseminating information - Provide Visualization of Monitor Data to Users
6System View (Details)
- Local Services (Local View)
- Local Catalogs, Storage Systems, Task Tracking
(Single User Tasks), Policies, Job Submission - Global Services (Global View)
- Discovery Service, Global Catalogs, Policies
- High Level Services (Autonomous)
- Acts on monitor data and has global view
- Scheduling, Data Transfer, Network Optimization,
Tasks Tracking (many users)
7System View (Details)
- (Domain) Portal
- One Stop Shop for Applications/Users to access
and Use Grid Resources - Task Tracking (Single User Tasks)
- Graphical User Interface
- User session logging (provide feedback when
failures occur) - (Domain) Applications
- ORCA/COBRA, IGUANA, PHYSH,.
8Framework (MonALISA)
App
WS
Monitor Sensors (Web Services/Applications/)
App
WS
(1) Publish
SS
SS
MonALISAStation Servers
(2) Disseminate
MonALISA JINI Network
MonALISA based Application Servers
AppS
(3) Subscribe
AppS
(4) Steer/Retrieve
Web Services (WS), Applications (App)
WS
WS
App
- Service/Software Discovery
- Policy Dissemination
- Supporting Global and High Level Services
- ..
9Framework (Clarens)
3rd party application
- Authentication (X509)
- Access control on Web Services.
- Remote file access (and access control on files).
- Discovery of Web Services and Software.
- Shell service. Shell like access to remote
machines (managed by access control lists). - Proxy certificate functionality
- Group management VO and role management.
- Good performance of the Web Service Framework
- Integration with MonALISA
Service
Clarens
Web server
XML-RPC, SOAP. JavaRMI, JSON RPC, ..
http/ https
Clarens
Client
10(Single) User View (Analysis)
8
Client Application
1
2
Steering
Dataset service
7
3
Discovery
Catalogs
4
9
Planner/ Scheduler
Job Submission
- Catalogs to select datasets,
- Resource Application Discovery
- Schedulers guide jobs to resources
- Policies enable fair access to resources
- Robust (large size) data (set) transfer
Execution
6
Storage Management
5
5
Monitor Information
Data Transfer
Policy
Thousands of user jobs (multi user environment)
Storage Management
- Feedback to users (e.g. status of their jobs)
- Crash recovery of components (identify and
restart) - Provide secure authorized access to resources and
services.
11Projects associated to Grid Enabled Analysis
- DISUN (deployment)
- Deployment and Support for Distributed Scientific
Analysis - Ultralight (development)
- Treating the network as resource
- Vertically Integrated Monitor Information
- Multi User, resource constraint view
- MCPS (development)
- Provide Clarens based Web Services for batch
analysis (workflow) - SPHINX (development)
- Policy based scheduling (global service) exposed
a Clarens Web Service using MonALISA monitor
information - SRM/Dcache (development)
- Service based data transfer (local service)
- Lambda Station (development)
- Authorized programmability of routers using
MonALISA CLARENS - PHYSH
- CLARENS based services for command line user
analysis - CRAB
- Client to support user analysis using Clarens Web
Service Framework
Identify complementary features and integrate
12Projects associated to Grid Enabled Analysis
- Clarens_Application (development/testing)
- Logging functionality
- Providing Web Services for catalogs
- Steering
- Portal development (GUI)
- Remote file access
- Distributed testing environment
- MonALISA_Application (development)
- Monitor applications for network, compute and
storage - Providing interface to accounting systems
- .
- OSG (deployment/testing)
- Privilege Project
- Policy Project
- Phedex
- Data transfer
- Condor
- High throughput computing
- ..
Identify complementary features and integrate
13Combining Grid Projects into Grid Analysis
Environment
Clarens_Applications
MonALISA_Applications
PHEDEX
SPHINX
CRAB
Development
Grid Analysis Environment
Ultralight
SRM/dCache
Support/ Feedback
Testing
MonALISA, Clarens,., Frameworks
OSG
Lambda Station
PHYSH
Deployment
Policy
Privilege Project
Condor
..
DISUN
MCPS
GAE focuses on integration
14Early Results
15GAE Deployment
- Clarens has been deployed on 30 machines. Other
sites Caltech, Florida, Fermilab, CERN,
Pakistan, INFN - Multiple service instances have been deployed on
several Clarens servers. Different sets of
service instances are deployed on each server to
mimic a realistic distributed service
environment. - Installation of CMS (ORCA, COBRA, IGUANA,) and
LCG (POOL, SEAL,) software on Caltech GAE
testbed. Serves as environment to integrate
applications as web services into the Clarens
framework. - Work with CERN to have the GAE components
included in CMS software distribution. - GAE components being integrated in the DPE and
VDT distribution used in US-CMS. - Demonstrated distributed multi user GAE prototype
at SC03 - Ultimate goal The GAE backbone (Clarens)
deployed on all tier-N, associated to different
Clarens web servers will be (GAE) services that
interface with CMS and LCG software, to enable
physicists to perform analysis in a distributed
environment. - PHEDEX deployed at Caltech, UFL, UCSD and
transferring data - UFL submitting analysis jobs with CRAB
16GAE Deployment
- Prototype Completed Jan 14 _at_ Caltech-FL workshop
- Now extending prototype functionality
- Now involving physicists/early adopters
- First round optimized data transfer (UAE
milestone) coincides with CMS 10 data challenge
in 2005 - 4 types of testbeds
- Developers testbed
- Network testbed (see network talk) Ideally
suited to test large scale data movement, and
scalability of job submissions - OSG Integration Beta testbed
- OSG Operations Alpha testbed
17Services
(Provenance catalog) CERN
POOL catalog
PubDB
(Data catalog) Caltech/CERN
BOSS
(Job submission) CERN/Caltech/INFN
(Transfer catalog) CERN
TMDB
Refdb
(Data, prvenance catalog)CERN/caltech
Monte carlo processing service
FNAL/Caltech
(Monte carlog)FNAL
MOPDB
(Monte carlo production) FNAL
MCrunjob
UFL
Codesh
Sphinx
(Scheduling) UFL
(Monitoring) Caltech
MonaLisa
Storage resource management
SRM
Service discovery
GROSS
ACL management
Physics analysis job submission
VO management
on wish list to become a service or to
interoperate with this service
accessible through a web service
has javascript front end
File access
service being developed
Clarens core service
18GAE Distributed Testing
19Clarens Grid Portals
PDA
Job execution
Catalog service
Collaborative Analysis Destop
20Software and Web Service Discovery Available
21MonALISA Integration
Query repositories for monitor information
Gather and publish access patterns on collections
of data
Publish web services information for discovery in
other distribution systems
22August 2004
Host 1
Querying for datasets
service
(2) Query for dataset
Grid scheduler
(2) Query for dataset
Host 2
(3) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
runjob
(1) Discover pool catalog, refdb, grid schedulers
Host 3
Host 6
Client
(2) Query for dataset
Host 4
(1) Discover pool catalog, refdb, grid schedulers
(2) Query for dataset
Host 7
Client code has no knowledge about location of
services, except for several urls for discovery
services
Multiple clients will query and submit jobs
23SC04 November 2004
Scheduling Push Model
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
Push model has limitations once the system
becomes resource limited
(3) Submit job(s)
Scheduler
(2) Query resource status
(2) Query resource status
(2) Query resource status
(2) Query resource status
Farm 4
Uniform job submission layer
BOSS
PBS
24Client code and global manager have no knowledge
about location of services, except for several
urls for discovery services
November 2004
Similarity with other approaches (PEAC)
(1) Discover a global manager
(7) Submit job(s)
service
discover
client
(2) Request session (dataset)
(3) Discover catalog service
Multiple clients query and submit jobs
(4) Get list of farms that have this dataset
(6) Allocate time
(5) Reserve process time
(5) Reserve process time
(5) Reserve process time
(10) Alive signal during processing
(9) Data ready?
(7) Report access statistics to MonaLisa
(9) Data moved
(7) Move data to nodes
Job
(8) Create job
25Peac test run with MonaLisa
26Lessons learned
- Quality of (the) service(s)
- Lot of exception handling needed for robust
services (gracefully failure of services) - Time outs are important
- Need very good performance for composite services
- Discovery service enables location independent
service composition. - Semantics of services are important (different
name, name space, and/or WSDL) - Web service design Not every application is
developed with a web service interface in mind - Interfaces of 3rd party applications change
Rapid Application Development - Social engineering
- Finding out what people want/need
- Overlapping functionality of applications (but
not the same interfaces!) - Not one single solution for CMS
- Not every problem has a technical solution,
conventions are also important
27(Future) Work
- Integration of runjob into current deployment of
services - Full chain of end to end analysis
- Develop/deploy accounting service (ppdg
activity?) - Steering service (NUST collaboration)
- Autonomous replication
- Trend analysis using monitor data
- Improve exception handling
- Integrate/interoperability mass storage (e.g.
SRM) applications into/with Clarens environment - E2E error trapping and diagnosis cause and
effect - Strategic Workflow re-planning
- Adaptive steering and optimization algorithms
- Multi user
- Data movement using PHEDEX
- Improved GUI interface (NUST Collaboration)
- Core Clarens (NUST Collaboration)
28Information http//ultralight.caltech.edu http//u
ltralight.caltech.edu/gaeweb/portal WIKI http//u
ltralight.caltech.edu/gaeweb/wiki/