Title: other servers
1GRID Analysis Environment Where the physics
gets done
More information GAE web page
http//ultralight.caltech.edu/gaeweb/ Clarens web
page http//clarens.sourceforge.net MonaLisa
http//monalisa.cacr.caltech.edu/ SPHINX
http//www.griphyn.org/sphinx/Research/research.ph
p
Scientific Exploration at the High Energy Physics
Frontier
Grid Analysis Environment (GAE)
- Physics experiments consist of large
collaborations CMS and ATLAS each encompass 2000
physicists from approximately 150 institutes
(300-400 physicists in 30 institutes in the US) - Experiments produce petabytes to exabytes of data
- The Acid Test for Grids crucial for LHC
experiments - Large, diverse, distributed community of users
- Support for 100s to 1000s of analysis tasks,
shared among dozen of sites - Widely varying task requirements and priorities
- Need for priority schemes, robust authentication
and security - Operates in a severely resource limited and
policy constrained global system - Dominated by collaboration policy and strategy
- Requires real-time monitoring task and workflow
tracking decisions often based on a global
system view - Where physicists learn to collaborate on analysis
across the country, and accross world regions - Focus is on the LHC CMS experiment but
architecture and services can potentially be used
in other (physics) analysis environments
HEP Challenges Frontiers of Information
Technology
- Rapid access to petabytes data stores
- Secure, efficient, transparent access to
heterogeneous worldwide distributed computing and
data handling resources - A collaborative scalable distributed environment
for thousands of physicists to enable physics
analysis - Tracking the state and usage patterns of
computing and data resources, to make possible
rapid turnaround and efficient utilization of
resources
Challenges need to be met so as to provide an
integrated, managed, distributed infrastructure
that can serve virtual organizations on a
global scale
Web browser ROOT (analysis tool) Python Cojac
(detector viz.)/ IGUANA (cms viz tool)
Finding data for CMS analysis (GAE use case)
The GAE Architecture
Structured Peer-to-Peer GAE Architecture
Analysis Client
service
Analysis Client
- The GAE, based on the Clarens web services
framework, easily allows a Peer-to-Peer
configuration to be built, with the associated
robustness and scalability features - Flexible allows easy creation, use and
management of complex VO structures - A typical Peer-to-Peer scheme would involve the
Clarens servers acting as Global Peers that
broker GAE client requests among all the Clarens
servers available worldwide
Host 1
(2) Query for dataset
- Analysis clients talk standard protocols to the
Grid Services Web Server, a.k.a. the Clarens
Grid Portal - Simple web service API allows analysis clients
(simple or complex) to operate in this
architecture - The Clarens portal hides the complexity of the
Grid services from the client, but can expose it
in as much detail as required for e.g.
monitoring. - Key features global scheduler, catalogs,
monitoring, Grid wide execution service
Discover services
HTTP, SOAP, XML-RPC
Grid scheduler/Queue
- Discovery,
- Acl management,
- Certificate based access
Clarens
Grid Services Web Server
Query for data
(2) Query for dataset
Host 2
(3) Submit analysis job(s) with dataset(s)
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Autonomous replication
Sphinx
(1) Discover catalogs, grid schedulers
Host 3
RefDB
(2) Query for dataset
Host 6
MCRunjob
Partially- Abstract Planner
Client
Virtual Data
MonALISA
ORCA
Applications
Data Management
Chimera
Monitoring
MOPDB
Replica
Host 4
Fully- Concrete Planner
FAMOS
BOSS
Catalog
ROOT
(1) Discover catalog, grid schedulers
POOL
(2) Query for dataset
Grid
Discover services
Host 7
Provenance Catelog
Query for data
Download data
Execution Priority Manager
VDT-Server
Multiple clients will query and submit jobs
Download data
Client code has no knowledge about location of
services, except for several urls for discovery
services
Grid Wide Execution Service
Client
Implementations, developed within physics and cs
community associated with GAE components
Discover service (e.g. Catalog)
Scheduling Push/Pull Model (GAE use case)
GAE backbone Clarens web service framework
GAE development (services)
- Pool file catalog. Developed at CERN
- Refdb/PubDB. Production database developed within
CMS experiment - BOSS. Uniform job submission layer developed in
collaboration with INFN - SPHINX. Grid scheduler developed at UFL
- CAVES. Analysis code sharing environment
developed at UFL - MCRunjob/MOP. Monte Carlo production submission
and tracking tool developed at FNAL - Phedex. Production transfer management
application for CMS - Information service. Stores key/value pairs to
describe environment. Developed in collaboration
with LHCb experiment - Core services (Clarens) Discovery,
Authentication, Proxy, Remote file access, Access
control management, Virtual Organization
management - Under development dcache, catalog, local manager
(job submission), global manager (scheduler) in
collaboration with CDF experiment.
Push model has limitations once the system
becomes resource limited
- Clarens A portal system providing a common
infrastructure for deploying Grid enabled web
services - Features
- Access control to services
- Session management
- Service discovery and invocation
- Virtual Organization management
- PKI based security
- Good performance (up to 1400 calls per second)
- Role in GAE
- Connects clients to Grid or analysis applications
- Acts in concert with other Clarens servers to
form a P2P network of service providers - Two implementations
- Python/C using Apache web server
- Java using Tomcat servlets
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
(3) Submit/pull job(s)
Grid scheduler /Queue
Web server
Combining push and pull to get better scalability
(2) Query resource status
(2) Query resource status
(2) Query resource status
http/https
monitors
Uniform job submission layer
Java client, ROOT (analysis tool), IGUANA (CMS
viz. tool), ROOT-CAVES client (analysis sharing
tool), any app that can make XML-RPC/SOAP calls
Clarens scalable web server
other servers
GRID Enabled Analysis User view of a
collaborative desktop
- Physics analysis requires varying levels of
interactivity, from instantaneous response to
background to batch mode - Requires adapting the classical Grid
batch-oriented view to a services-oriented
view, with tasks monitored and tracked
- Use Web Services, leveraging wide applicability
of commodity tools - Implement the Clarens Web Services layer as
mediator between authenticated clients and
services as part of the GAE architecture - Clarens presents a consistent analysis
environment to users, based on WSDL/SOAP or XML
RPCs with PKI based authentication for security
Service discovery
Clarens Grid Portal Secure cert-based access
to services through browser
External Services
External Services
Remote file access
Job submission
Catalog access