Title: Service-Oriented Science Scaling eScience Application
1Service-Oriented ScienceScaling eScience
Application Impact
Ian Foster Argonne National Laboratory University
of Chicago Univa Corporation
2Acknowledgements
- Carl Kesselman, with whom I developed many of
these slides - Bill Allcock, Charlie Catlett, Kate Keahey,
Jennifer Schopf, Frank Siebenlist, Mike Wilde _at_
ANL/UC - Ann Chervenak, Ewa Deelman, Laura Pearlman _at_
USC/ISI - Karl Czajkowski, Steve Tuecke _at_ Univa
- Numerous other fine colleagues
- NSF, DOE, IBM for research support
3ContextSystem-Level Science
Problems too large /or complex to tackle alone
4Seismic Hazard Analysis (T. Jordan SCEC)
Seismicity
Paleoseismology
Geologic structure
Local site effects
Faults
Seismic Hazard Model
Stress transfer
Rupture dynamics
Crustal motion
Seismic velocity structure
Crustal deformation
5SCEC Community Model
1
Standardized Seismic Hazard Analysis Ground
motion simulation Physics-based earthquake
forecasting Ground-motion inverse
problem Structural Simulation
2
3
Other Data Geology Geodesy
4
5
Unified Structural Representation
Invert
4
5
Faults Motions Stresses
Anelastic model
Ground Motions
AWM
SRM
RDM
FSM
3
2
Intensity Measures
Earthquake Forecast Model
Attenuation Relationship
1
AWP Anelastic Wave Propagation SRM Site
Response Model
FSM Fault System Model RDM Rupture Dynamics
Model
6System-Level Problem
Grid technology
7Science Takes a Village
- Teams organized around common goals
- People, resource, software, data, instruments
- With diverse membership capabilities
- Expertise in multiple areas required
- And geographic and political distribution
- No location/organization possesses all required
skills and resources - Must adapt as a function of the situation
- Adjust membership, reallocate responsibilities,
renegotiate resources
8Virtual Organizations
- From organizational behavior/management
- "a group of people who interact through
interdependent tasks guided by common purpose
that works across space, time, and
organizational boundaries with links strengthened
by webs of communication technologies" (Lipnack
Stamps, 1997) - The impact of cyberinfrastructure
- People ? computational agents services
- Communication technologies ? IT infrastructure,
i.e. Grid
The Anatomy of the Grid, Foster, Kesselman,
Tuecke, 2001
9Beyond Science SilosService-Oriented
Architecture
Function
Resource
- Decompose across network
- Clients integrate dynamically
- Select compose services
- Select best of breed providers
- Publish result as a new service
- Decouple resource service providers
10Service-Oriented SystemsThe Role of Grid
Infrastructure
Users
- Service-oriented applications
- Wrap applications as services
- Compose applicationsinto workflows
Composition
Workflows
Invocation
ApplnService
ApplnService
- Service-oriented Gridinfrastructure
- Provision physicalresources to support
application workloads
The Many Faces of IT as Service, Foster,
Tuecke, 2005
11Forming Operating (Scientific) Communities
- Define VO membership and roles, enforce laws
and community standards - I.e., policy
- Build, buy, operate, share community
infrastructure - Data, programs, services, computing, storage,
instruments - Service-oriented architecture
- Define and perform collaborative work
- Use shared infrastructure, roles, policy
- Manage community workflow
12Forming Operating (Scientific) Communities
- Define VO membership and roles, enforce laws
and community standards - I.e., policy
- Build, buy, operate, share community
infrastructure - Data, programs, services, computing, storage,
instruments - Service-oriented architecture
- Define and perform collaborative work
- Use shared infrastructure, roles, policy
- Manage community workflow
13Defining Community Membership and Laws
- Identify VO participants and roles
- For people and services
- Specify and control actions of members
- Empower members ? delegation
- Enforce restrictions ? federate policy
14Security Services Objectives
- Its all about policy
- Define a VOs operating rules
- Security services facilitate the enforcement
- Policy facilitates business objectives
- Related to goals/purpose of the VO
- Security policy often delicate balance
- Legislation may mandate minimum security
- More security ? Higher costs
- Less security ? Higher exposure to loss
- Risk versus Rewards
15Policy Challenges in VOs
- Restrict VO operations based on characteristics
of requestor - VO dynamics create challenges
- Intra-VO
- VO specific roles
- Mechanisms to specify/enforcepolicy at VO level
- Inter-VO
- Entities/roles in one VO notnecessarily defined
in another VO
Effective Access
Access granted by community to user
Policy of site to community
Site admission-control policies
16Core Security Mechanisms
- Attribute Assertions
- C asserts that S has attribute A with value V
- Authentication and digital signature
- Allows signer to assert attributes
- Delegation
- C asserts that S can perform O on behalf of C
- Attribute mapping
- A1, A2 Anvo1 ? A1, A2 Amvo2
- Policy
- Entity with attributes A asserted by C may
perform operation O on resource R
17Trust in VOs
- Do I believe an attribute assertion?
- Used to evaluate cost vs. benefit of performing
an operation - E.g., perform untrusted operation with extra
auditing - Look at attributes of assertion signer
- Rooting trust
- Externally recognized source, e.g., CA
- Dynamically via VO structure ? delegation
- Dynamically via alternative sources, e.g.,
reputation
18Security Services for VO Policy
- Attribute Authority (ATA)
- Issue signed attribute assertions (incl.
identity, delegation mapping) - Authorization Authority (AZA)
- Decisions based on assertions policy
- Use with message/transport level security
VOUser A
Delegation Assertion User B can use Service A
Resource Admin Attribute
VO AZA
VO ATA
VO-A Attr ? VO-B Attr
Mapping ATA
VO Member Attribute
VOUser B
VO Member Attribute
VO A Service
VO B Service
19Closing the Loop
Users
20Forming Operating Scientific Communities
- Define VO membership and roles, enforce laws
and community standards - I.e., policy
- Build, buy, operate, share community
infrastructure - Data, programs, services, computing, storage,
instruments - Service-oriented architecture
- Define and perform collaborative work
- Use shared infrastructure, roles, policy
- Manage community workflow
21Bootstrapping a VOby Assembling Services
- 1) Integrate services from other sources
- Virtualize external services as VO services
- 2) Coordinate compose
- Create new services from existing ones
Community
Content
Services Provider
Services
Capacity Provider
Capacity
Service-Oriented Science, Foster, 2005
22Providing VO Services(1) Integration from Other
Sources
- Negotiate servicelevel agreements
- Delegate and deploy capabilities/services
- Provision to deliver defined capability
- Configure environment
- Host layered functions
23Virtualizing Existing Services into a VO
- Establish service agreement with service
- E.g., WS-Agreement
- Delegate use to VO user
User B
User A
VO User
VO Admin
Existing Services
24Deploying New Services
Policy
Allocate/provision Configure Initiate
activity Monitor activity Control activity
Activity
Client
Environment
Resource provider
Interface
25Activities Can Be Nested
Client
Policy
Client
Client
Environment
Resource provider
Interface
26- Open Science Grid
- 50 sites (15,000 CPUs) growing
- 400 to gt1000 concurrent jobs
- Many applications CS experiments includes
long-running production operations - Up since October 2003 few FTEs central ops
Jobs (2004)
www.opensciencegrid.org
27EmbeddedResource Management
Client-side
VO Admin
Deleg
Deleg
GRAM
GRAM
Cluster Resource Manager
Headnode Resource Manager
VOUser
VOUser
Monitoring and control
VO Job
Deleg
GRAM
Cluster Resource Manager
Other Services
VO Scheduler
. . .
- VO admin delegates credentials to be used by
downstream VO services. - VO admin starts the required services.
- VO jobs comes in directly from the upstream VO
Users - VO job gets forwarded to the appropriate resource
using the VO credentials - Computational job started for VO
VO Job
28Providing VO Services(2) Coordination
Composition
- Take a set of provisioned services
- compose to synthesize new behaviors
- This is traditional service composition
- But must also be concerned with emergent
behaviors, autonomous interactions - See the work of the agent PlanetLab communities
Brain vs. Brawn Why Grids and Agents Need Each
Other," Foster, Kesselman, Jennings, 2004.
29The Globus-BasedLIGO Data Grid
LIGO Gravitational Wave Observatory
Birmingham
Replicating gt1 Terabyte/day to 8 sites gt40
million replicas so far MTBF 1 month
www.globus.org/solutions
30Data Replication Service
- Pull missing files to a storage system
Data Location
Data Movement
GridFTP
Local ReplicaCatalog
Replica LocationIndex
Reliable File Transfer Service
Local Replica Catalog
GridFTP
Replica LocationIndex
Data Replication
List of required Files
Data Replication Service
Design and Implementation of a Data Replication
Service Based on the Lightweight Data Replicator
System, Chervenak et al., 2005
31Composing Resources Composing Services
GridFTP
LRC
GridFTP
Deploy service
DRS
Deploy container
VO Services
JVM
Deploy virtual machine
VM
VM
Procure hardware
Physical machine
Provisioning, management, and monitoring at all
levels
32Decomposition EnablesSeparation of Concerns
Roles
S1
User
S2
D
S3
33Community Commons
- What capabilities are available to VO?
- Membership changes, state changes
- Require mechanisms to aggregate and update VO
information
MORE
The age of information
A
A
A
VO-specific indexes
S
Information
FRESH
S
S
S
34Monitoring and Discovery Services
Clients (e.g., WebMDS)
GT4 Container
WS-ServiceGroup
MDS-Index
Registration WSRF/WSN Access
GT4 Cont.
GT4 Container
MDS-Index
MDS-Index
RFT
35Service-Oriented SystemsThe Role of Grid
Infrastructure
Users
- Service-oriented applications
- Wrap applications as services
- Compose applicationsinto workflows
Composition
Workflows
Invocation
ApplnService
ApplnService
- Service-oriented Gridinfrastructure
- Provision physicalresources to support
application workloads
The Many Faces of IT as Service, Foster,
Tuecke, 2005
36Forming Operating Scientific Communities
- Define VO membership and roles, enforce laws
and community standards - I.e., policy
- Build, buy, operate, share community
infrastructure - Data, programs, services, computing, storage,
instruments - Service-oriented architecture
- Define and perform collaborative work
- Use shared infrastructure, roles, policy
- Manage community workflow
37Collaborative Work
Executed
Executing
Query
Executable
Not yet executable
What I Did
What I Am Doing
Edit
What I Want to Do
Execution environment
Schedule
Time
38Managing Collaborative Work
- Process as workflow, at different scales, e.g.
- Run 3-stage pipeline
- Process data flowing from expt over a year
- Engage in interactive analysis
- Need to keep track of
- What I want to do (will evolve with new
knowledge) - What I am doing now (evolve with system config.)
- What I did (persistent a source of information)
39Trident The GriPhyNVirtual Data System
Workflow spec
Create Execution Plan
Grid Workflow Execution
Statically Partitioned DAG
VDL Program
DAGman DAG
Virtual Data catalog
DAGman Condor-G
Dynamically Planned DAG
Job Planner
Job Cleanup
Virtual Data Workflow Generator
Local planner
Abstract workflow
40Functional MRI Analysis
Workflow courtesy James Dobson, Dartmouth Brain
Imaging Center
41Functional MRI Mapping Brain Function using
Grid Workflows
             ltgt
42Functional MRI Virtual Data Queries
- Which transformations can process a subject
image? - Q xsearchvdc -q tr_meta dataType
subject_image input - A fMRIDC.AIRalign_warp
- List anonymized subject-images for young
subjects - Q xsearchvdc -q lfn_meta dataType subject_image
- privacy anonymized subjectType
young - A 3472-4_anonymized.img
- Show files that were derived from patient image
3472-3 - Q xsearchvdc -q lfn_tree 3472-3_anonymized.img
- A 3472-3_anonymized.img
- 3472-3_anonymized.sliced.hdr
- atlas.hdr
- atlas.img
-
- atlas_z.jpg
- 3472-3_anonymized.sliced.img
43QuarkNet Leveraging Trident for Science
Education
44PUMAAnalysis of Metabolism
PUMA Knowledge Base Information about proteins
analyzed against 2 million gene sequences
Analysis on Grid Involves millions of BLAST,
BLOCKS, and other processes
Natalia Maltsev et al. http//compbio.mcs.anl.gov/
puma2
45AstronomyA Small Montage Workflow
1200 node workflow, 7 levels
Mosaic of M42 created on TeraGrid
46Summary (1)Community Services
- Community roll, city hall, permits, licensing
police force - Assertions, policy, attribute authorization
services - Directories, maps
- Information services
- City services power, water, sewer
- Deployed services
- Shops, businesses
- Composed services
- Day-to-day activities
- Workflows, visualization
- Tax board, fees, economic considerations
- Barter, planned economy, eventually markets
47Summary (2)
- Community based science will be the norm
- Requires collaborations across sciences
including computer science - Many different types of communities
- Differ in coupling, membership, lifetime, size
- Must think beyond science stovepipes
- Increasingly the community infrastructure will
become the scientific observatory - Scaling requires a separation of concerns
- Providers of resources, services, content
- Small set of fundamental mechanisms required to
build communities
48For More Information
- Globus Alliance
- www.globus.org
- NMI and GRIDS Center
- www.nsf-middleware.org
- www.grids-center.org
- Infrastructure
- www.opensciencegrid.org
- www.teragrid.org
- Background
- www.mcs.anl.gov/foster
2nd Edition www.mkp.com/grid2