Title: Scientific Exploration at the High Energy Physics Frontier
1The Grid Analysis Environment for LHC Particle
Physics
LHC Data Grid Hierarchy
Grid Analysis Environment (GAE)
Scientific Exploration at the High Energy Physics
Frontier
PByte/sec
- The Acid Test for Grids crucial for LHC
experiments - Large, diverse, distributed community of users
- Support for 100s to 1000s of analysis tasks,
shared among dozen of sites - Widely varying task requirements and priorities
- Need for priority schemes, robust authentication
and security - Operates in a severely resource limited and
policy constrained global system - Dominated by collaboration policy and strategy
- Requires real-time monitoring task and workflow
tracking decisions often based on a global
system view - Where physicists learn to collaborate on analysis
across the country, and across world regions - Focus is on the LHC CMS experiment but
architecture and services can potentially be used
in other (physics) analysis environments
100-1500 MBytes/sec
Physics experiments consist of large
collaborations CMS and ATLAS each encompass 2000
physicists from approximately 150 institutes
(300-400 physicists in 30 institutes in the US)
CERN Center PBs of Disk Tape Robot
2.5-10 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
lt10 Gbps
HEP Challenges Frontiers of Information
Technology
- Rapid access to PetaByte/ExaByte data stores
- Secure, efficient, transparent access to
heterogeneous worldwide distributed computing and
data - A collaborative scalable distributed environment
for thousands of physicists to enable physics
analysis - Tracking the state and usage patterns of
computing and data resources, to make possible
rapid turnaround and efficient utilization of
resources
Institute
Institute
Institute
Institute
Physics data cache
0.1 to 10 Gbps
Tier 4
Workstations
The GAE Architecture
Structured Peer-to-Peer GAE Architecture
Web browser ROOT (analysis tool) Python Cojac
(detector viz.)/ IGUANA (cms viz tool)
Web Service Based Grid Enabled Analysis
Analysis Client
- The GAE, based on the Clarens web services
framework, easily allows a Peer-to-Peer
configuration to be built, with the associated
robustness and scalability features - Flexible allows easy creation, use and
management of complex VO structures - A typical Peer-to-Peer scheme would involve the
Clarens servers acting as Global Peers that
broker GAE client requests among all the Clarens
servers available worldwide
- Analysis clients talk standard protocols to the
Clarens Grid Portal - Simple web service API accomodates simple or
complex analysis clients - Clarens hides the complexity of the Grid services
from the client, but can expose it in as much
detail as required, e.g. for monitoring. - Features global scheduler, catalogs, monitoring,
Grid wide execution service
Analysis Client
Discover services
HTTP, SOAP, XML-RPC
- Discovery,
- Acl management,
- Certificate based access
Clarens
Grid Services Web Server
Query for data
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Autonomous replication
Sphinx
RefDB
MCRunjob
Partially- Abstract Planner
Virtual Data
MonALISA
ORCA
Applications
Data Management
Chimera
Monitoring
MOPDB
Replica
FAMOS
Fully- Concrete Planner
BOSS
ROOT
Catalog
POOL
Grid
Discover services
Query for data
Download data
Execution Priority Manager
VDT-Server
Download data
Grid Wide Execution Service
Client
Discover service (e.g. Catalog)
The Clarens Web Service Framework
HotGrid!
The Importance of Monitoring the MonALISA system
in GAE
- Clarens A portal system providing a common
infrastructure for deploying Grid enabled web
services - Features
- Access control to services
- Session management
- Service discovery and invocation
- Virtual Organization management
- PKI based security
- Good performance (over 1400 calls per second)
- Role in GAE
- Connects clients to Grid or analysis applications
- Acts in concert with other Clarens servers to
form a P2P network of service providers - Two implementations
- Python/C using Apache web server
- Java using Tomcat servlets
World Network Speed Record
101 Gbps
Clarens Services
GRID Enabled Analysis The Collaborative Desktop
GAE on Handhelds
MonALISA (monitor)
ROOT (analyse)
Clarens (access)
IGUANA (visualise)
Service discovery
This port of SLACs Java Analysis Studio to
PocketPC incorporates Grid Certificate based
access to data and compute resources, and uses
Clarens services.
Clarens Grid Portal Secure Certificate-based
access to services through browser
Job submission
Clarens provides a ROOT Plug-In that allows the
ROOT user to gain access to Grid services via the
portal, for example to access ROOT files at
remote locations
Remote file access
More information GAE web page
http//ultralight.caltech.edu/gaeweb/ Clarens web
page http//clarens.sourceforge.net MonaLisa
http//monalisa.cacr.caltech.edu/ SPHINX
http//www.griphyn.org/sphinx/Research/research.ph
p
This work is partly supported by the Department
of Energy as part of the Particle Physics
DataGrid project (DOE/DHEP and MICS) and be the
National Science Foundation (NFS/MPS and CISE).
Any opinions, findings, conclusions or
recommendations expressed in this material are
those of the authors and do not necessarily
reflect the views of the Department of Energy or
the National Science Foundation