Title: Brain Meets Brawn: Why Grid and Agents Need Each Other
1Brain Meets Brawn Why Grid and Agents Need Each
Other
- Ian Foster
- Argonne National Laboratory
- University of Chicago
- Globus Alliance
In collaboration with Nick Jennings Carl
Kesselman
http//www.aamas2004.org/proceedings/003-foster_i_
grid.pdf
2What is the Grid?
- The Grid is an international project that
looks in detail at a terrorist cell operating on
a global level and a team of American and British
counter-terrorists who are tasked to stop it
Gareth Neame, BBC's head of drama
3Well, Not Exactly!
- The Grid is an international project that
looks in detail at scientific collaborations
operating on a global level and a team of
computer scientists who are tasked to enable it
At least, thats where it started
4Open Distributed Systems
The two need each other!
5Overview
- Wheres the muscle?
- What Grid is about
- The need for brains
- The importance of automation
- Bringing the two together
- Working with the Open Science Grid
- Research problems
6Overview
- Wheres the muscle?
- What Grid is about
- The need for brains
- The importance of automation
- Bringing the two together
- Working with the Open Science Grid
- Research problems
7Why the Grid?Origins Revolution in Science
- Pre-Internet
- Theorize /or experiment, aloneor in small
teams publish paper - Post-Internet
- Construct and mine large databases of
observational or simulation data - Develop simulations analyses
- Access specialized devices remotely
- Exchange information within distributed
multidisciplinary teams
8Origins in Science
9(No Transcript)
10Why the Grid?New Driver Revolution in Business
- Pre-Internet
- Central data processing facility
- Post-Internet
- Enterprise computing is highly distributed,
heterogeneous, inter-enterprise (B2B) - Business processes increasingly computing-
data-rich - Outsourcing becomes feasible ? service providers
of various sorts - Growing complexity need formore efficient
management
11Grid Hypeand Products
12Common Requirements
- Dynamically link resources/services
- From collaborators, customers, eUtilities,
(members of evolving virtual organization) - Into a virtual computing system
- Dynamic, multi-faceted system spanning
institutions and industries - Configured to meet instantaneous needs, for
- Multi-faceted QoX for demanding workloads
- Security, performance, reliability,
13The Grid
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
- Enable integration of distributed resources
- Using general-purpose protocols infrastructure
- To achieve better-than-best-effort service
14Protocols InfrastructureOpen Standards
Software
- Standardized interoperable mechanisms for
secure reliable - Authentication, authorization, policy,
- Representation management of state
- Initiation management of computation
- Data access movement
- Communication notification
- Good quality open source implementations to
accelerate adoption development - E.g., Globus Toolkit
15Grid InfrastructureOpen Software and Standards
Increased functionality, standardization
Custom solutions
1990
1995
2000
2005
2010
16WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification(Resource
identity, lifetime, inspection, subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
17Web Servicesand Stateful Resources
- State appears in almost all applications
- Data in a purchase order
- Current usage agreement for resources
- Metrics associated with work load on a server
- Web services can model, access and manage state
in many different ways - Ad-hoc, per-application approaches
- OGSI/WSRF proposes a standard approach
18Why Standardize An Approach?
- Building systems by composition of heterogeneous
components demands that we standardize common
patterns - Approach to resource identification
- Lifetime management interfaces
- Inspection monitoring interfaces
- Base fault representation
- Service and resource groups
- Notification
- And many more
- Standards encourage tooling code re-use
- Build services more quickly reliably
19WSRF WS-Notification
- Naming and bindings (basis for virtualization)
- Every resource can be uniquely referenced, and
has one or more associated services for
interacting with it - Lifecycle (basis for fault resilient state
management) - Resources created by services following factory
pattern - Resources destroyed immediately or scheduled
- Information model (basis for monitoring
discovery) - Resource properties associated with resources
- Operations for querying and setting this info
- Asynchronous notification of changes to
properties - Service Groups (basis for registries collective
svcs) - Group membership rules membership management
- Base Fault type
20Grid in PracticeEarthquake Engineering Example
Secure, reliable, on-demand access to
data, software, people, and other
resources (ideally all via a Web Browser!)
21How it Really Happens(with the Globus Toolkit)
ComputeServer
GlobusGRAM
SimulationTool
ComputeServer
GlobusGRAM
WebBrowser
CHEF
Globus IndexService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Application Developer 2
Off the Shelf 9
Globus Toolkit 4
Grid Community 4
Database service
GlobusDAI
CHEF ChatTeamlet
GlobusMCS/RLS
Database service
GlobusDAI
MyProxy
Database service
GlobusDAI
CertificateAuthority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
22NEESgridMultisite OnlineSimulation Test(July
2003)
Illinois (simulation)
Colorado
Illinois
23Overview
- Wheres the muscle?
- What Grid is about
- The need for brains
- The importance of automation
- Bringing the two together
- Working with the Open Science Grid
- Research problems
24- Grid2003 An Operational Grid
- 28 sites (2100-2800 CPUs) growing
- 400-1300 concurrent jobs
- 7 substantial applications CS experiments
- Running since October 2003
Korea
http//www.ivdgl.org/grid2003
25Open Science Grid Components
- Computers storage at 28 sites (to date)
- 2800 CPUs
- Uniform service environment at each site
- Globus Toolkit provides basic authentication,
execution management, data movement - Pacman installation system enables installation
of numerous other VDT and application services - Global virtual organization services
- Certification registration authorities, VO
membership services, monitoring services - Client-side tools for data access analysis
- Virtual data, execution planning, DAG management,
execution management, monitoring - IGOC iVDGL Grid Operations Center
26(Very Small)ExampleWorkflows
Genome sequence analysis
Sloan digital sky survey
Physics data analysis
27Secure, Robust Grid Infrastructure (Mostly )
Intelligent Agents Also Play a Vital Role! E.g.
28The Need for AutomationCritical if Grid is to
Scale
- Who contributes gets to consume what?
- Policy negotiation, enforcement, auditing
- How do I schedule jobs data movement?
- Adaptive scheduling
- Who can be trusted to do what?
- Community membership, reputation, trust
negotiation, intrusion detection - Why do things fail?
- Failure detection, problem determination, fault
isolation, system adaptation
29Adaptive Scheduling
- Adaptive placementof data computation
- Experiments on Grid3
(K. Ranganathan)
30AdaptiveUnstructured Multicast
Application overlay
D
E
C
A
B
D
Base overlay
E
C
A
B
D
Physical topology
E
C
A
B
UMM A dynamically adaptive, unstructured
multicast overlay M. Ripeanu et al.
31WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification(Resource
identity, lifetime, inspection, subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
32Agreement Negotiation
- All interesting interactions will be based on
agreements between requestors services - Encode a negotiated quality of experience
- Challenge is to establish the framework within
which agreements can be - Established in an oversubscribed environment
- Transformed, composed, decomposed
- Managed like any other resource
- Evolved as a result of faults
- WS-Agreement is the next step on this path
- FIPA Request Interaction Protocol?
33WS-Agreement Model
34WS-Agreement Applicability
- WS-Agreement to be used/specialized by specs that
define domain-specific services - Data management services
- Storage provisioning, transfer mgt, replication
- Job/execution management services
- Cluster provisioning, image/service deployment,
job reservation and submission, specialized
services, etc. - Network provisioning services
- Optical path, firewall traversal, etc.
- Application- and domain-specific services
35Bringing it All Together
- Scenario Resource management scheduling
Grid Scheduleris a Web Service
Grid Jobs and tasks are also modeled using
WS-Resources and Resource Properties
WS-Resource used to model physical processor
resources
Service Level Agreement is modeled as a
WS-Resource
WS-Notification can be used to inform the
scheduler when processor utilization changes
Lifetime of SLA Resource tied to the duration of
the agreement
WS-Resource Properties project processor
status (like utilization)
36Overview
- Wheres the muscle?
- What Grid is about
- The need for brains
- The importance of automation
- Bringing the two together
- Working with the Open Science Grid
- Research problems
37Grid Two Reasons to Care
- Grid technology can ease creation of secure,
robust, scalable agent systems - E.g., Globus Toolkit
- Grid deployments can serve as testbeds for agent
concepts algorithms - E.g., Grid3 is evolving to Open Science Grid with
1000s of CPUs dozens of sites - ? Run ultra-large agent systems
- ? Help tackle fundamental problems trust, fault
detection, adaptation, negotiation, etc.
38Working on the
- What it gives you
- Large collection of distributed resources
- Standard services
- Mechanisms for deploying further software
- What you can do
- Use for large-scale agent experiments
- Define and deploy new services e.g., intrusion
detection, fault determination - Define challenge problems to motivate development
of adaptive techniques
3910 Research Challenges (1)
- Service architecture
- Robust foundation for autonomous behaviors
- Trust negotiation management
- Expressing reasoning about trust
- System management troubleshooting
- Autonomic management of large systems
- Negotiation
- Establishing, monitoring, evolving agreements
- Service composition
- Describing, discovering, composing services
4010 Research Challenges (2)
- VO formation management
- Lifecycle issues
- System predictability
- Guarantees of emergent behavior
- Human-computer collaboration
- Interactions in hybrid teams
- Evaluation
- Benchmarks, challenge problems, testbeds
- Semantic integration
- Ontology definition, schema mediation
41Summary
- Agents Grid share a common interest in robust,
scalable open distributed systems - Complementary foci in work to date
- Grid secure robust infrastructure (brawn)
- Agents autonomous problem solvers (brain)
- Both brain and brawn required for progress
- Specific proposals for convergence
- Use Grids as testbeds for agent technology
- Coordinated attack on open problems
42For More Information
- Globus Alliance
- www.globus.org
- Global Grid Forum
- www.ggf.org
- Open Science Grid
- www.opensciencegrid.org
- Background information
- www.mcs.anl.gov/foster
- GlobusWORLD 2005
- Feb 7-11, Boston
2nd Edition www.mkp.com/grid2