Title: Application Services Work Group Report
1- Application Services Work Group Report
Frank van Lingen California Institute
of TechnologyUltraLight Meeting, NSFJanuary 4,
2006
2Application Work Group Core Team
- Frank van Lingen (Coordinator)
- Rick Cavanaugh (Physics Analysis)
- Dimitri Bourilkov (Physics Analysis)
- Jang Uk (Scheduling)
- Mandar Kulkarni (Scheduling)
- Laukik Chitnis (Scheduling)
- Iosif Legrand (Monitoring)
- Julian Bunn (GAETeraGrid)
- Conrad Steenberg (GAE)
- Michael Thomas (GAE)
- Philipe Galvez (VRVS)
Replaced by ..
3Network Usage in Scenarios
- Monte Carlo Production
- CPU intensive and generates large amounts of
data
- Network Distributed generated datasets need to
be merged
- User Analysis (Single User View)
- Analyzing large amounts of data (data sets)
- Network When site queue is to long, move data to
another site which enables a user to process it
- User Analysis (Multi User View)
- Many users analyzing large amounts of data
- Network If datasets become popular, and jobs
have to wait to process it, replicate it.
4Other Clients Web browser ROOT (analysis tool,ca
ves) Python (codesh) Cojac (detector viz.)/ IGU
ANA (cms viz tool)
Emerging Vision A distributed set of rich and
complex services to support analysis in a
distributed environment
-Clients talk standard protocols to Grid
Services Web Server, -Simple Web service API all
ows simple or complex analysis clients
-Clarens portal hides complexity
-Key features Global Scheduler, Catalogs,
Monitoring, Grid-wide Execution service.
Analysis Flight Deck JobMon Client JobStatus
Client
MCPS Client
Grid Services Web Server
Monitoring Clients MonALISA Clients
Clarens
Tier2 Site
MCPS
Workflow Execution
Workflow Definitions
Discovery
Runjob
JobStatus
Catalogs
Compute Site
Scheduler
DCache
Applications
Metadata
Storage
Fully- Abstract Planner
ROOT
FAMOS
Virtual Data
JobMon
Sphinx
Build services on web service frameworks such as
Clarens and provide end-2-end monitoring using
systems such as MonALISA
Partially- Abstract Planner
Data Management
ORCA
BOSS
Replica
estimators
Monitoring
MonALISA
Fully- Concrete Planner
MonALISA
steering
Network
MonALISA
Global Command Control
Reservation
Planning
Monitoring
BOSS
Execution Priority Manager
Grid Wide Execution Service
GAE Architecture
5GAE and UltralightMake the Network an Integrated
Managed Resource
Application Interfaces
- Unpredictable multi user analysis
- Overall demand typically fills the capacity of
the resources
- Real time monitor systems for networks, storage,
computing resources, E2E monitoring
Request Planning
Monitor
Network Planning
Network Resources
Support data transfers ranging from the
(predictable) movement of large scale (simulated)
data, to the highly dynamic analysis tasks
initiated by rapidly changing teams of scientist
6(Physics) Analysis on the GridMove from Existing
Components to a Coherent System
8
Client Application
1
2
Steering
Dataset service
- Catalogs to select datasets,
- Resource Application Discovery
- Schedulers guide jobs to resources
- Policies enable fair access to resources
- Robust (large size) data (set) transfer
7
3
Discovery
Catalogs
4
9
Planner/ Scheduler
Job Submission
Execution
6
Storage Management
5
5
Monitor Information
Data Transfer
Policy
- Feedback to users (e.g. status of their jobs)
- Crash recovery of components (identify and
restart)
- Provide secure authorized access to resources and
services.
Storage Management
Ultralight core data transfer, planning
scheduling, (sophisticated) policy management on
VO level, integration
7(No Transcript)
8Clarens Grid Toolkit
- Provide developers with a framework to develop
grid enabled web services.
- Grid portal for users
- Hide complexity of grid environment from users
- Standard Services
- Authentication, Authorization, Access Control
- File Access
- VO and User Management
- Proxy Management
- Python and Java framework
- Easier integration with emerging java
technologies
- More choice for service developers
- Also used by
- LambdaStation
- OSG Accounting
- HOTGrid (Astronomy portal)
9Clarens Java Client
- Richer GUI environment than HTML/Javascript
- Enable multiple service/server connections
- Concept of favorites
- Store state information
- Pluggable architecture
- Enables third party plugin development.
- Achievements this year
- Core GUI framework (including connection
management)
- File Service plugin
- Discovery Service plugin (under construction)
- Rudimentary plugins for estimators and scheduling
(under construction)
10Discovery
- Web Service Catalog, suited for dynamic grid
environment.
- Integrates with MonALISA
- Based on JClarens
- Achievements this year
- Software discovery service
- Associate key/value pairs to service and software
description
- Better scalability (minimized resource usage)
- UDDI backend (UDDIservice discovery standard)
- Work with EGEE project on standard discovery
interface
- Work with Globus on interop. with MDS (Discovery
Catalog)
- Part of OSG distribution
Clarens Discovery Servers (JINI Clients)
DS
SS
Clarens Servers
MonALISA JINI Network
Clients
SS
DS
11Monte Carlo Processing
- Started as an idea for how to allow users to make
small custom simulation samples.
- Enable remote grid submission and return of
results
- Separation of the user from the resources by
connecting through grid (user) interfaces
- Provide Access Control and Quotas
Based on the concept of Me, My Friends, and the
nonymous Grid
12Monte Carlo Processing
- Design for Monte Carlo processing but wider
applicable.
- Expose different workflows
- Users receive tracking number they use for job
status access (e.g. like Fedex and UPS tracking)
- Working on improvements for Monte Carlo
processing
- Sites pull (parts) of request for processing
using an agent like
- Submitting monitor information to MonALISA and
other repositories (e.g. BOSS) (support for job
tracking)
- Achievements this year
- Service backend with authorization and access
control.
- Simple workflow specification format.
- Auto generation of workflow specifications into a
web form.
- HTML/JavaScript front end for user interaction.
13SPHINX
14Estimators
- Schedulers selection of execution site based
on
- User deadlines and required quality of service.
- Quota requirements specified by user.
- ..
- Execution site will have its own set of
estimators
- Support making intelligent decisions on resource
selection by estimating
- Site Access (latency for submitting a job)
- Job runtime (If possible)
- Queue wait time
- File transfer time
- Achievements this year
- Runtime estimator (history based approach
- SDSC Data
- CMS Data
- Prime number computation jobs
- File transfer time estimator
- IPERF (intrusive and should be replaced)
- Queue time estimator (works on condor queue)
- Site access time estimator
- Integration with steering, job monitoring and
prototype scheduler services
15Data Transfer
- Redesign of CMS transfer tools (Phedex)
- Redesign of CMS event data model (EDM)
- Impacts the transfer of data
- Achievements this year
- Ultralight kernel based on FAST (See Network
talk)
- Monitoring network transfers (See Network
Services talk)
- Benchmarks of different transfer protocols (BBCP,
XRootD) (See network talk)
- Improvements on SRM/DCache (learning to tune and
install it)
- Getting expertise with new Phedex (benchmarks,
tuning)
- Feedback to experts on missing functionality.
- Integrate Storage/Transfer (SRM/Dcache/Phedex)
with network
16BOSS Integration (Execution Service)
- BOSS execution and monitor application used in
CMS
- Achievements this year
- Providing a service wrapper and GUI for BOSS
- Set of service APIs providing access to BOSS
- Ability to schedule tasks over web (through a
GUI)
- Execution takes places in a secure (sandbox)
environment
- Provides task control features, through BOSS
(kill, delete)
- Used in demonstration at DOSAR workshop in Sao
Paulo
17Jobmon
- With GRID job submission a large number of things
can go wrong
- resources (databases, storage) broken or
inaccessible
- User errors
- ..
- Quickly and efficient access needed to detect and
diagnose problems.
- (Secure) Access to (your) running jobs before
completion
- Read log files
- Kill jobs
- Developed for CDF, but also
- applicable in CMS
Achievements this year
18Many GAE Services Integrated with Network and
Monitor Services
- Sphinx Scheduler (UFL) Service based scheduler
- Job Submission BOSS (Collaboration with INFN)
- Caves (UFL) Analysis code and command sharing
environment
- Steering service. First prototype of steering
service
- Discovery Service
- Jobmon. Real time trouble shooting of a users
jobs (FNAL)
- Estimators. Providing estimates for schedulers
and other services on job execution, data
transfer,
Monitoring
Monitoring
Ultralight will focus on integration and
sophisticated automated decisions based on
monitor information
End-2-end monitoring
Other Synergistic Activities
- Monte Carlo Processing Service (Fermilab) SC05
0.13 Tbps challenge
- Other Science disciplines (Astronomy, Earth
Science) HotGRID
- L-Store collaboration. Utilize storage expertise
to complement network expertise.
- Lambda Station, Authorized programmability of
routers using MonALISA CLARENS
19OutlookTowards System Level Services
- Development of Java client plugins
- Include IM functionality for job interactivity (3
months)
- Integration with MonALISA clients (2 months)
- New plugins for managing access control to data
and services,
- Improving current framework and plugins (2-3
months)
- E2E error trapping and diagnosis cause and
effect
- Feedback of job information through MonALISA, via
JobMon and other sources. (aggregation of
information)
- Need a uniform mechanism to propagate and report
errors in Web Services. (3-4 months)
- Strategic Workflow re-planning
- Collaboration with CMS Monte Carlo Team
- Work on new production environment (3-4 months)
20OutlookTowards System Level Services
- Adaptive steering and optimization algorithms
- First step with steering service prototype and
estimators
- Work on refinement of estimator functions (2-3
months)
- Work with (CMS) analysis group
- Integrating current BOSS work (2-3 months)
- Work on (analysis) submission plugin for client
(3-4 months)
- Further propagate work into OSG (ongoing)
- Integration of network and storage
- benchmarks, tuning (3-4 months)
- test and integrate new CMS data management tools
with networks (3-4 months)
21Summary
- Ultralight Application Workgroup made a lot of
progress
- Integration of many GAE and Ultralight components
ongoing
- End-2-end monitoring through MonALISA
integration
- Move towards collaboration between storage and
networks resources
22Related Publications
23www.ultralight.org
Monitor Ultralight
WIKI
News
Related Publications