Title: SciDAC Center for Enabling Distributed Petascale Science
1SciDAC Center for Enabling Distributed
Petascale Science
- Argonne National Laboratory
- Fermi National Accelerator Laboratory
- Lawrence Berkeley National Laboratory
- University of Southern California
- University of Wisconsin
- www.cedps.net
- Jennifer Schopf, ANL
- jms_at_mcs.anl.gov
2The Petascale Data Challenge
U
U
U
- DOE facilities generatemany petabytes of data(2
petabytes all U. S. academic research
libraries!)
Remotedistributed users
U
U
- Remote users (at labs universities, industry)
need data!
U
Massive data
U
- Rapid, reliable accesskey to maximizingvalue of
B facilities
U
DOE facilities
3Bridging the Divide (1)Move Data to Users When
Where Needed
A
Deliver this 100 Terabytes to locations A, B,
C by 9am tomorrow
B
C
- Fast gt10,000x faster thanusual Internet
- Reliable recoverfrom many failures
- Predictable data arrives when scheduled
- Secure protect expensive resources data
- Scalable deal with manyusers much data
4Bridging the Divide (2)Allow Users to Move
ComputationNear Data
A
Perform mycomputation F ondatasets X, Y, Z
- Science servicesprovide analysisfunctions
neardata source
- Flexible easyintegration of functions
- Secure protect expensive resources data
- Scalable deal with manyusers much data
X
F
Y
Z
5Bridging the Divide (3)Troubleshoot
End-to-EndProblems
A
Why did my datatransfer (or remoteoperation)
fail?
B
C
- Identify diagnose failures performanceproblem
s
- Instrument includemonitoring points inall
system components - Monitor collect data inresponse to problems
- Diagnose identify thesource of problems
6Overview
- For Each Area
- Current Work
- An example in current use
- How to combine the tools for CEDPS
- Work with Applications
- Contributing to Globus
7Data Services in CEDPS
- Ann Chervenak, ISI, is the CEDPS Data lead
- annc_at_isi.edu
- And these slides are adapted from hers
- Develop tools and techniques for reliable,
high-performance, secure, and policy-driven
placement of data within a distributed science
environment - Data placement and distribution services that
implement different data distribution and
placement behaviors - Managed Object Placement Serviceenhancement to
todays GridFTPthat allows for management of - Space
- Bandwidth
- Connections
- Other resources needed to endpoints of data
transfers
8Existing Globus Data Services
- Tools for Efficient, Reliable Data Management
- GridFTP
- Fast, secure data transport
- The Reliable File Transfer Service (RFT)
- Data movement services for GT4
- The Replica Location Service (RLS)
- Distributed registry that records locations of
data copies - The Data Replication Service (DRS)
- Integrates RFT and RLS to replicate and register
files - The Data Access and Integration Service (DAIS)
- Service to access relational and XML database
9GridFTP
- A high-performance, secure, reliable data
transfer protocol optimized for high-bandwidth
wide-area networks - FTP with well-defined extensions
- Uses basic Grid security (control and data
channels) - Multiple data channels for parallel transfers
- Partial file transfers
- Third-party (direct server-to-server) transfers
- Reusable data channels
- Command pipelining
- GGF recommendation GFD.20
10GridFTP in GT4
Disk-to-disk onTeraGrid
- 100 Globus code
- No licensing issues
- Stable, extensible
- IPv6 Support
- XIO for different transports
- Striping ? multi-Gb/sec wide area transport
- Pluggable
- Front-end e.g., future WS control channel
- Back-end e.g., HPSS, cluster file systems
- Transfer e.g., UDP, NetBLT transport
11GridFTP Does NOT Require GSI
- All the GridFTP speed and features with the
following security options - Anonymous mode
- Clear Text Passwords
- GridFTP-SSH
- GridFTP-SSH - only need SSH keys on the server
- No certificates or CAs
- Keys already exist on most systems.
- SSH is used to form the Control Channel
- Only need ssh running on the server side.
- Standard login audit trails
12Reliable File Transfer
- Service that accepts requests for third-party
file transfers - Maintains state in a DB about ongoing transfers
- Recovers from RFT service failures
- Increased reliability because state is stored in
a database. - Service interface
- The client can submit the transfer request and
then disconnect and go away - Similar to a job scheduler for transfer job
- Two ways to check status
- Subscribe for notifications
- Poll for status (can check for missed
notifications)
13Reliable File TransferThird Party Transfer
RFT Client
- Fire-and-forget transfer
- Web services interface
- Many files directories
- Integrated failure recovery
- Has transferred 900K files
SOAP Messages
Notifications(Optional)
RFT Service
GridFTP Server
GridFTP Server
14The Globus Replica Location Service
- A Replica Location Service (RLS) is a distributed
registry that records the locations of data
copies and allows replica discovery - RLS maintains mappings between logical
identifiers and target names - Must perform and scale well support hundreds of
millions of objects, hundreds of clients - E.g., LIGO (Laser Interferometer Gravitational
Wave Observatory) Project - RLS servers at 10 sites
- Maintain associations between 11 million logical
file names 120 million physical file locations
15Replica Location Service
- Distributed registry
- Records the locations of data copies for replica
discovery - Maintains mappings between logical identifiers
and target names
- Local Replica Catalogs (LRCs) contain consistent
information about logical-to-target mappings - Replica Location Index (RLI) nodes aggregate
information about one or more LRCs - LRCs use soft state update mechanisms to inform
RLIs about their state relaxed consistency of
index - Optional compression of state updates reduces
communication, CPU and storage overheads - Membership service registers participating LRCs
and RLIs and deals with changes in membership
16Motivation for Higher-Level Data Management
Services
- Data-intensive applications need higher-level
data management services that integrate
lower-level Grid functionality - Efficient data transfer (GridFTP, RFT)
- Replica registration and discovery (RLS)
- Eventually validation of replicas, consistency
management, etc. - Goal is to generalize the custom data management
systems developed by several application
communities - Eventually plan to provide a suite of general,
configurable, higher-level data management
services - Globus Data Replication Service (DRS) is the
first of these services
17Data Replication Service
- Included in the GT4.0.2 release
- Design based on publication component of the LIGO
Lightweight Data Replicator system - Developed by Scott Koranda
- Client specifies (via DRS interface) which files
are required at local site - DRS uses
- Globus Delegation Service to delegate proxy
credentials - RLS to discover where replicas exist in the Grid
- Selection algorithm to choose among available
source replicas (provides a callout default is
random selection) - Reliable File Transfer (RFT) service to copy data
to site - Via GridFTP data transport protocol
- RLS to register new replicas
18NeST
- Software network storage appliance
- Provides guaranteed storage allocation
- Allocation units, called lots, provide a
guaranteed space for a period of time. - http//www.cs.wisc.edu/condor/nest/
19Stork
- Scheduling and management of data placement jobs
- Provides for multiple transfer mechanisms and
retries in the event of transient failures - Integrated into the Condor system
- Stork jobs can be managed with Condor's workflow
management software (DAGMan). - http//www.cs.wisc.edu/condor/stork/
20Storage Resource Manager (SRM)
- Provide protocol negotiation
- Dynamic transfer URL allocation
- Advanced space and file reservation
- Reliable replication mechanisms
- http//computing.fnal.gov/ccf/projects/SRM/
21dCache
- Manages individual disk storage nodes
- Makes them appears as a single storage space with
a single file system root - SRM v1 and v2 interface
- Supports GridFTP and other transports for
whole-file data movement - Includes a proprietary POSIX-like interface
(dcap) for random access to file contents - http//www.dcache.org/
22 Globus Data Tools in Production The
LIGO Project
- Laser Interferometer Gravitational Wave
Observatory - Data sets first published at Caltech
- Publication includes specification of metadata
attributes - Data sets may be replicated at up to 10 LIGO
sites - Sites perform metadata queries to identify
desired data - Pull copies of data from Caltech or other LIGO
sites - Customized data management system the
Lightweight Data Replicator System (LDR) - Built on top of Globus data tools tools GridFTP,
RLS
23 Globus Data Tools in Production The
Earth System Grid
- Climate modeling data (CCSM, PCM)
- Data management coordinated by ESG portal
- RLS, GridFTP
- Datasets stored at NCAR
- 64.41 TB in 397253 total files
- IPCC Data at LLNL
- 26.50 TB in 59,300 files
- Data downloaded 56.80 TB in 263,800 files
- Avg. 300GB downloaded/day
- All files registered and located using RLS, moved
among sites using GridFTP
24Data Services in CEDS
- Develop tools and techniques for reliable,
high-performance, secure, and policy-driven
placement of data within a distributed science
environment - Data placement and distribution services that
implement different data distribution and
placement behaviors - Managed Object Placement Serviceenhancement to
todays GridFTPthat allows for management of - Space
- Bandwidth
- Connections
- Other resources needed to endpoints of data
transfers - Services to move computation to data
25Layered Architecture
26Higher-Level Data Placement Services
- Decide where to place objects
and replicas in the distributed
Grid
environment - Policy-driven, based on needs of
application - Effectively creates a placement workflow that is
then passed to the Reliable Distribution Service
Layer for execution - Simplest push or pull-based service that places
explicit list of data items - Similar to existing DRS
- Metadata-based placement
- Decide where data objects are placed based on
results of metadata queries for data with certain
attributes - Example LIGO replication
27Higher-Level Data Placement Services
- N-Copies maintain N copies of data items
- Placement service checks existing replicas
- Creates/delete replicas to maintain N
- Keeps track of lifetime of allocated storage
space - Example UK QCDGrid
- Publication/Subscription
- Allows sites or clients to subscribe to topics of
interest - Data objects are placed or replicated as
indicated by these subscriptions - Question What higher-level placement policies
would be desirable for Fermi applications? - High energy physics
- Others
28Reliable Distribution Layer
- Responsible for carrying out the
distribution or placement plan
generated by higher-level service - Examples
- Reliable File Transfer Service
- U Wisconsin Stork
- LBNL Data Mover Light
- Provide feedback to higher level placement
services on the outcome of the placement workflow - Call on lower-level services to coordinate
29Managed Object PlacementService
- Building blocks
- Data Transfer Service
- GridFTP server, needs resource management
- Disk Space Manager
- Provides local storage allocation
- NeST storage appliance Provides storage and
connection management and bookkeeping - Stork data placement manager and matchmaker for
co-scheduling of connections between the
endpoints - dCache storage management (Fermi) improve
scalability and fault tolerance jointly develop
interfaces and interaction with GridFTP - Storage Resource Manager
- Connection Management incoming and outgoing
- Scheduler (C-Cache, Stork, RFT) includes queue
- Eventually interact with both endpoints of
transfer
30Science Services in CEDPS
- Kate Keahey, ANL, is the CEDPS Scalable Services
lead - keahey_at_mcs.anl.gov
- Some slides compliments of Kate
- Develop tools and techniques for construction,
operation, and provisioning of scalable science
services - Service construction tools that make it easy to
take application code (whether simulation, data
analysis, command line program, or library
function) and wrap it as a remotely accessible
service - Service provisioning tools that allow dynamic
management of the computing, storage, and
networking resources required to execute a
service, and the configuration of those resources
to meet application requirements. - Services to move computation to data
31PyGlobus
- Python implementation of the WS-Resource
framework - Includes support for WS-Addressing,
WS-Notification, WS-Lifetime management, and
WS-Security - Compatible with the GT4 Java WS Core
- A lightweight standalone container
- Automatic service startup on container start
- Basic API for resource persistence and recovery
- Support for wrapping legacy codes and command
line applications as Grid Services
32Virtual Workspace Project
- Virtual Workspace
- Abstraction of an execution environment
- Dynamically available to authorized clients
- Abstraction captures
- Resource quota for execution environment on
deployment - Software configuration aspects of the environment
- Workspace Service allows a Grid client to
dynamically deploy and manage workspaces - Built on
- Xen hypervisor an open source, efficient
implementation. - GT4 authentication and authorization mechanisms
33Recent Demonstrationwith STAR
- Problem STAR is a relatively complex code and
extremely hard to install - even if resources are
available, the users can't use them because there
is no easy way to automatically install the
application - Solution Put STAR in a VM, and use the
workspace service to dynamically deploy those
STAR VMs based on need - Users submit requests for STAR execution to those
nodes - Demod at SC06, biggest obstacle is not
technology but deployment Xen and the workspace
service is not available on many platforms
34Workspace Service Backstage
The VWS manages a set of nodes inside the TCB
(typically a cluster). This is called the node
pool.
Pool node
Pool node
Pool node
The workspace service has a WSRF frontend that
allows users to deploy and manage virtual
workspaces
VWS Service
Pool node
Pool node
Pool node
VWS Node
Each node must have a VMM (Xen) installed, along
with the workspace backend (software that
manages individual nodes)
Pool node
Pool node
Pool node
Image Node
Pool node
Pool node
Pool node
VM images are staged to a designated image
node inside the TCB
Trusted Computing Base (TCB)
35Scalable Science Services
- Service-enabling applications is too hard for
application developers - Community may already have a required data
analysis function, typically implemented as a
standalone (parallel or sequential) program or
library - Turning this existing implementation into a
service is arduous and time-consuming - Process involves knowledge about the mechanics of
service container implementation and Grid
mechanisms - Solution Formalize and automate this process in
service wrapping tools - Automate the process of embedding application
code into an Application Hosting Service (AHS)
36Service ConstructionApplication Hosting
Environment
- An AHS involves
- An application-specific service interface to
analysis code - A management interface that allows the service
provider to monitor and control the AHSs
execution - The AHS interacts with external policy decision
points (PDPs) for policy enforcement and
provisioners for service scalability
37Service Provisioningwith Variance
- If the load on a service varies significantly
over time, then the number of resources allocated
to the service must also vary - Solution Introduce provisioning tools that
- Allow a service provider to specify desired
performance levels, while other system components
- Monitor service behavior
- Dynamically add and remove resources applied to
service activity, so as to adapt to varying
service load
38Configuring and Discovering
- Execution environments are difficult to configure
and discover - Scientific applications often require specific,
customized environments - Variations in OS, middleware version, libs, etc,
and file system pose barriers to application
portability - Solution Use resource catalogs and virtual
machine technology to streamline service
deployment - Our resource catalogs will exploit schemas and
information providers describing the relevant
characteristics of an environment - Advertise these descriptions through MDS4 to
allow the application provisioner to discover and
select a set of platforms suitable for
application execution
39Time-Varying Requirements
- Dynamic community demands mean that the number
and type of science services required can vary
over time - Solution Allow for dynamic service deployment
- Mechanisms that can allow new instances of
services to be created on demand - Used to instantiate both application services and
data placement services - Based on current work with dynamic deployment of
Web Services, executable programs, and virtual
machines - Develop these mechanisms further to provide a
powerful and flexible service deployment
infrastructure that allows for the creation,
monitoring, and control of arbitrary services
40Troubleshooting in CEDPS
- Brian Tierney, LBNL, is the CEDPS TS lead
- BLTierney_at_lbl.gov
- Develop tools and techniques for failure
detection and diagnosis in distributed systems - Better logs and logging services to understand
application and service behavior - Better diagnostic tools to discover failures and
performance faults, and for notification of these
errors
41MDS4
- Basic Grid monitoring service
- Information providers translate data form a
variety of sources to a standard interface - Index service is a caching registry
- Trigger Service provides errors and warnings
42NetLogger
- Extremely light-weight approach to reliably
collecting monitoring events from multiple
distributed locations - Log file management tools
- store-and-forward with rollover and stale file
removal - Targeted for high-volume logs typical of
application instrumentation - Efficient in-memory summarization to reduce data
volume - Prototype anomaly detection tool
- locate missing workflow events based on a
predefined list of expected events
43ESG and Trigger Service
- Trigger Service monitors seven services across
five sites - 3 years experience
- Minimal load on the services
- Policy to prevent false positives
- Increased ability to detect and access cross-Grid
failures
44Troubleshooting and CEDPS
45Need For Unique IDs
- Tracking distributed activities that involve many
service and software layers - A single high-level request (e.g., distribute
these files) may involve many distinct
activities (e.g., reserve space, move file,
authenticate) at different sites - To diagnose failures or performance problems we
need to be able to identity and access just the
corresponding log records - Solution associate a globally unique identifier
with every activity and service invocation - Extend from previous demonstration work with
biology workflows where we transferred and logged
the workflows Activity ID at every step
46Logs and Log Management
- Logging and monitoring data is hard to find and
manage - Data scattered across sites and within a site
- No agreed on standards for what logs should
look like - Large volumes of data are possible
- A heavily loaded GridFTP server with all network
and disk I/O instrumented generates 1.1 GB of log
data per hour - Introduce Best Practices for logging
- Then implement for GT, Condor, and others
- Log collection service
- Work to deploy on OSG as first case
- Log management functions
- Provide for turning logging on and off, moving
log data back to the user for analysis, deleting
old log data, etc.
47Automatic Failure Detection
- Automated failure detection for infrastructure
services - Distributed systems often run many services with
little or no 24-7 support - Failures are discovered only when user tasks fail
- Solution Deploy monitoring information providers
to gather behavior data on resources and services - Use this data to warn system administrators of
faults and to study how fault behaviors change
over time. - Extend MDS4 Trigger Service
- Create a NetLogger-based event-driven monitoring
system to gather runtime data from services and
applications
48Performance Degradation Detection
- Performance degradation is often overlooked
- Many systems have long running services used by
different users at different times, and no single
group tracking behavior - Solution Develop and apply analysis functions to
archived background monitoring and event-driven
log data - Track executions dynamically and compare them to
past behaviors and service guarantees - When service behavior degrades or deadlines are
threatened the proper services or users can be
notified to take action - Develop analysis components to examine end-to-end
bottleneck analysis and detection, trend
analysis, and alarm generation
49Work with Applications
- Strong collaborations with DOE applications,
SciDAC software centers, and DOE facilities
50Work with Applications
- CEDPS, ESG, OSG starting to plan closer
cooperation as part of SciDAC-2 - Earth Systems Grid
- Data work
- Error and warning alpha tester
- Open Science Grid
- Data services for ATLAS and CMS
- Services work with STAR
- Logging service Alpha tester
- Second Wave
- LIGO (OSG) Data focus
- GADU Scalable services and data focus
- Fusion- FACETS Keith Jackson, Scalable services
focus - DANSE Keith Jackson, Scalable services focus
51CEDPS Senior Personnel
- PI Ian Foster, foster_at_mcs.anl.gov
- Project Manager Jennifer Schopf, jms_at_mcs.anl.gov
- Area Leads
- Data Ann Chervenak, annc_at_isi.edu
- Services Kate Keahey, keahey_at_mcs.anl.gov
- Troubleshooting, Brian Tierney, BLTierney_at_lbl.gov
- Site representatives
- ANL Jennifer Schopf, jms_at_mcs.anl.gov
- FNAL Gene Oleynik, oleynik_at_fnal.gov
- ISI Carl Kesselman, carl_at_isi.edu
- LBNL Keith Jackson, KRJackson_at_lbl.gov
- U Wisc Miron Livny, livny_at_cs.uwisc.edu
52Expanding the Communityof Globus Contributors
- Creation of the dev.globus community process
- Provides an open forum for discussion and
enhancement of current Globus - Enabling the integration of 20 new components
from the US and Europe as incubators
53Globus DevelopmentEnvironment
- Based on Apache Jakarta
- Individual development efforts organized as
projects - Consensus-based decision making
- Control over each project in the hands of its
most active and respected contributors
(committers) - Globus Management Committee (GMC) providing
overall guidance and conflict resolution
54Common Infrastructure
- Code repositories (CVS, SVN)
- Mailing lists
- -dev, -user, -announce, -commit
- Issue tracking (bugzilla)
- Including roadmap info for future development
- Wikis
- License (Apache 2)
- Known interactions for people accessing your
project
55Current Technology Projects
- Common runtime projects
- C Core Utilities, C WS Core, CoG jglobus, Core WS
Schema, Java WS Core, Python Core, XIO - Data projects
- Data Replication, GridFTP, OGSA-DAI, Reliable
File Transfer, Replica Location - Execution projects
- GRAM, GridWay, MPICH-G2
- Information services projects
- MDS4
- Security Projects
- C Security, CAS/SAML Utilities, Delegation
Service, GSI-OpenSSH, MyProxy
56Non-Technology Projects
- Distribution Projects
- Globus Toolkit Distribution
- Process was used for April 4.0.2 4.0.3 releases
- Documentation Projects
- GT Release Manuals
- Incubation Projects
- Incubation management project
- And any new projects wanting to join
57Incubator Process
- Entry point for new Globus projects
- Incubator Management Project (IMP)
- Oversees incubator process form first contact to
becoming a Globus project - Quarterly reviews of current projects
- Process being debugged by Incubator Pioneers
- http//dev.globus.org/wiki/Incubator/
Incubator_Process
58Escalation
Incubator Projects
B
Proposal
A
Any Grid Project
59Current Incubator Projectsdev.globus.org/wiki/Wel
come Incubator_Projects
- Distributed Data Management (DDM)
- Dynamic Accounts
- Grid Authentication and Authorization with
Reliably Distributed Services (GAARDS) - Grid Development Tools for Eclipse (GDTE)
- GridShib
- Grid Toolkit Handle System (gt-hs)
- Higher Order Component Service Architecture
(HOC-SA) - Introduce
- Local Resource Manager Adaptors (LRMA)
- Metrics
- MEDICUS
- OGCE
- Portal-based User Registration Service (PURSe)
- ServMark
- UCLA Grid Portal Software (UGP)
- Workflow Enactment Engine Project (WEEP)
- Cog Workflow
- Virtual Workspaces
60Weve Just hadOur First Escalation!
- GridWay Meta Scheduling Project
- Iganacio Llorente, Universided Complutense de
Madrid - Provides scheduling functionality similar to that
found on local DRM (Distributed Resource
Management) systems - Advanced scheduling capabilities on a Grid
consisting of Globus services - Dynamic discovery selection
- Opportunistic migration
- Support for the definition of new scheduling
policies - Detection and recovery from remote and local
failures - Straightforward deployment that does not require
new services apart from those provided by the
Globus Toolkit MDS, GRAM, GridFTP and RFT
61For More Information
- Jennifer Schopf
- jms_at_mcs.anl.gov
- http//www.mcs.anl.gov/jms
- CEDPS
- http//www.cedps.net
- Globus Main Website
- http//www.globus.org
- Dev.globus
- http//dev.globus.org