Title: (Or: What do Wombats and Grid have in common?)
1Grid Middleware Principles, Practice and
Potential
- (Or What do Wombats and Grid have in common?)
- UK OGSA Evaluation Project
- (UCL, Imperial, Newcastle, Edinburgh)
- UCL Project Members Paul Brebner, Wolfgang
Emmerich - University College London
- P.Brebner_at_cs.ucl.ac.uk
2 What do Wombats and Grid have in common?
- A They are secretive and misunderstood creatures?
- B They live in complex underground burrows?
- C You wouldnt want to meet one in a confined
space in the dark? - D All of the above?
?
3 Grid Abstract
- Principles
- What are the principles of Grid middleware?
- Practice (and pitfalls)
- How easy is it to use in practice? What are the
pitfalls? - Potential
- What potential does Grid middleware have to
- (1) provide insight into different ways of using
Service Oriented Architectures, and - (2) support automatic deployment and debugging?
4 Grid Principles
- Principles
- What are the principles of Grid middleware?
- Practice (and pitfalls)
- How easy is it to use in practice? What are the
pitfalls? - Potential
- What potential does Grid middleware have to
- (1) provide insight into different ways of using
Service Oriented Architectures, and - (2) support automatic deployment and debugging?
5 Grid Principles cluster, enterprise, internet
6 Grid Principles Grid vs Enterprise
- Whats the difference between Grid and
Enterprise? (Typical generalisations) - Grid
- Crosses firewalls and organisational boundaries
- Resource and code focussed
- scientist has some code, and wants to execute it
on as many resources as possible, to solve ever
bigger problems - Developer, deployer and user may be the same
person
7 Grid Principles Grid
User wants Infinite resources, scalability,
monitoring
Code
Organisations want Fair sharing, ease of
maintenance?
Data
New Data
Code
Data
8 Grid Principles Grid vs Enterprise
- Enterprise
- Code developed, deployed and maintained by
enterprises behind firewall - Exposed as web services for intra and inter
organisational interoperability - Users dont develop or deploy code
9 Grid Principles Enterprise
User wants Response time, availability
Enterprise wants Interoperability, scalability, s
ecurity
Query or Transaction
Service developer
Response
10 Grid Principles Grid vs Enterprise
- Grid (User view)
- I have some code, make it run fast for me.
- Concerns Finding resources, platform
portability, deploying, running and monitoring
jobs, security, data management. - Enterprise (Enterprise owner view)
- I have some business logic exposed as Web service
ensure internal and external users get required
QoS. - Concerns QoS, interoperability, transactional,
performance/scalability, security, multiple
applications sharing services.
11 Grid Principles Just another component model?
- Inspight of these differences, they have
something in common - OGSI has J2EE origins
- What does it mean to ship a J2EE-based Grid
environment, something that can deliver
OGSI-compliant services? It means that you
provide a server programming environment that
makes it very easy for service writers to
implement services that conform to the set of
standards that are OGSI. - Containers, lifecycle management
- Goal Easy to write services and interoperability
at interface level
12 Grid Principles OGSA vs OGSI
13 Grid Principles OGSA without OGSI
14 Grid Principles OGSA and ?
?
15 Grid Principles - Architecture
- J2EE n-tiered architecture
16 Grid Principles - Architecture
OGSA semi-layered, or sum of services
17 Grid Principles - Architecture
GT3 (core) server side components
18 Grid Principles OGSA Services
- Infrastructure services
- Execution Management services
- Data Services
- Resource Management Services
- Security Services
- Self-Management Services
- Information Services
19 Grid Principles J2EE cf OGSI
Feature J2EE OGSI
Containers Multiple (4) One
Components Multiple One (inheritance)
Roles Explicit Implicit
Implementations Many 1-2 (sort of)
Component purpose Presentation/Business logic/persistence High-level grid services
20 Grid Principles - State
- Treatment of stateful instances?
- J2EE has stateful session and entity beans
- CMP Entity beans lifecycle management
(passivation/activation/pooling), caching, and
automatic persistence support - Typically accessed via Stateless Session Beans or
MDBs - GT3 has stateful instances (created by Factories)
- Accessed via SOAP and handles
- No automatic passivation/activation or
persistence
21 Grid Principles - Roles
- J2EE
- Component developer
- Application assembler
- Deployer
- System Administrator
- Not to mention product and tool providers, system
architect, and database designer and
administrator, etc - Many products provide distributed/remote tool
support
22 Grid Principles - Roles
- Grid?
- Increasing number of roles in practice
- But, no explicit definition of Grid roles, and
- Poor tool support for cross-organisational
support of roles
23 Grid Principles - Deployment
- Treatment of deployment?
- J2EE has explicit deployment role, and typically
good tool support for remote deployment - Support for product independent deployment
(JSR-88 since J2EE 1.4) - GT3 has built-in support for remote
code/executable deployment (staging), but none
for remote service deployment
24 Grid Principles Confusion/alternatives
- How is Globus intended to be used?
- 1 Science as first-order services
- Middleware for building and hosting Grid
Applications, by exposing science code as Grid
services. - 2 High-level grid services
- Middleware for building a set of high level Grid
services, composed to provide new Grid
functionality. Science isnt first-order service,
but executed and managed by Grid services.
25 Grid Principles Science services or Grid
services
Science services Directly callable, described
Client
1
Emc2
26 Grid Principles Science services or Grid
services
Science services Directly callable, described
Client
1
DA2BC2
Emc2
27 Grid Principles Science services or Grid
services
Science services Directly callable,
described discoverable
Client
Science Indirectly callable, not directly
described or discoverable
2
1
DA2BC2
Execution
DA2BC2
Data
Emc2
E mc2
28 Grid Practice
- Principles
- What are the principles of Grid middleware?
- Practice (and pitfalls)
- How easy is it to use in practice? What are the
pitfalls? - Potential
- What potential does Grid middleware have to
- (1) provide insight into different ways of using
Service Oriented Architectures, and - (2) support automatic deployment and debugging?
29 Grid Practice What to evaluate?
- OGSA gt OGSI gt GT3.2 Grid SOA exemplar
- Initially evaluate installation, configuration,
and security - Then performance and scalability, deployment,
architectural choices, etc. - Whats the point? What are we trying to learn?
- What are some of the s/w engineering and
architectural issues surrounding Grid
infrastructure? Across organisational boundaries? - What improvements are required before it is
suitable for production environments?
30 Grid Practice Realistic test-bed
- Heterogeneous platforms
- Linux, Solaris, Windows
- Cross-organisational
- Four nodes
- Independently administered
- Firewalls and access restrictions
- Security
- UK e-Science CA
31 Grid Practice Incremental
- Start with Core Package (Just container and basic
services e.g. container registry service) - Add Security
- Then try All Services
- Simple enough in theory
- Relationship between packages not well understood
- Java and non-Java components
- Poor integration between some parts
32 Grid Practice single node
GT3
Install
Install
OS/HW
33 Grid Practice single node
Configure
GT3
Install
Install
OS/HW
34 Grid Practice single node
Deploy
Configure
GT3
Install
Install
OS/HW
35 Grid Practice single node
Run
Deploy
Configure
GT3
Install
Install
OS/HW
36 Grid Practice Multiple sites
GT3
37 Grid Practice Multiple sites
GT3
GT3
GT3
GT3
38 Grid Practice Multiple sites
Interoperate
GT3
GT3
GT3
GT3
39 Grid Practice Multiple sites
Secure
Interoperate
GT3
GT3
GT3
GT3
GT3
GT3
40 Grid Practice Multiple sites
Manage
Secure
Interoperate
GT3
GT3
GT3
GT3
GT3
GT3
41 Grid Practice What we found
- Port number management (conflicts, discovery)
- Host access (requirements and site policies)
- Remote visibility of installation, container,
services (what, configuration, version) - Installation by System Administrators (role
division, extra effort) - Tomcat or Test container (different
configuration) - Linux is the only well supported platform
- Exponential increase in testing complexity as
number of nodes increases.
42 Grid Practice Security
- Grid Security Infrastructure (GSI)
- X.509 certificates
- Mutual authentication (client/host)
- Proxy certificates (delegation and single
sign-on) - Authentication (Who are you?)
- Secure Message (Basic)
- Secure Conversation
- Signing or Encryption (prevent unauthorised
altering/reading) - Authorisation (Who is authorised to use
container, factory, service, method) - Gridmap file (Access Control List maps Grid to
Local identifies)
43 Grid Practice Security
- In theory just have to
- obtain (and update) host, client, and CA
certificates - convert
- install
- configure (server, client side, container,
services, etc) - generate (and update) proxies.
- However, parts of All Services package also
needed.
44 Grid Practice Security
- Interactions between security for multiple
installations - Essential to test non-secure interoperability
first - Windows client-side security
- Testing and viewing security configuration
- Debugging secure calls
- Client side security is programmatic
- Security management scalability
- Construction and maintenance of user accounts and
grid-map file entries.
45 Grid Practice Security
- Interactions between security for multiple
installations - For testing may want
- multiple versions, or duplicates (with different
configurations) of same versions. - One container with no security, and another
container with security - May want test/production environments
46 Grid Practice Security
- Essential to test non-secure interoperability
first - Trying to test interoperability and security
simultaneously wasnt fun
47 Grid Practice Security
- Windows client-side security
- Not obvious exactly what parts of Globus are
needed for client side code with security (no
client side security package).
48 Grid Practice Security
- Testing and viewing security configuration
- View/edit and check security configuration for
containers and services - Confusion about hierarchical security settings
- Virtual Organisations, clusters, servers,
containers, factories, services, methods, and
instances. - Remotely
- Validate security deployment before run-time
49 Grid Practice Security
- Debugging secure calls (or any stateful service)
- Proxy interceptor approach (e.g. TCPMON) wont
work with stateful services - As grid handle returned to client contains the
port number of the instance, not the proxy - But proxies are an important design pattern for
SOAs - GT4/WS-RF may be different
- Handle resolvers, WS-Addressing and
WS-RenewableReferences
50 Grid Practice Security
- Client side security is programmatic
- Client side code modifications required to call
services/methods with required protocols - Should be declarative
- Sensitive to server side security credentials
51 Grid Practice Security
- Security management scalability
- Construction and maintenance of user accounts and
grid-map file entries. - For each server, each user needs an account, and
an entry in the container gridmap file (mapping
client certificate to account) - May also need service specific gridmap files
- Not scalable for large numbers of users, servers,
services. - Revocation of certificates, host certificate
expiry problem - Alternatives?
- Tool support
- Role based authentication
- Shared accounts or certificates (probably evil)
52 Grid Practice - Performance
- First approach (initial results)
- Scientific benchmark (SciMark2.0) modified to
measure throughput, and invoked as a Stateful
Grid Service - Metric is Calls Per Minute (CPM) one unit of
work. - No large-scale data movement, just SOAP
parameters and result, and computation/memory
load. - Good performance and scalability
- Minimal overhead cf standalone benchark
- Security has minimal overhead
- Sustained 4200 jobs an hour throughput
- Problem with client side timeouts as response
times increase
53 Grid Practice - Performance
Tomcat Fastest 3.6s (Edinburgh) Slowest 25s
(UCL)
54 Grid Practice - Performance
95 of predicted maximum throughput
55 Grid Practice - Performance
- Tomcat vs Test container
- No difference on 3 out of 4 nodes
- But 67 faster on one node (Newcastle, slowest
Intel box) - Attachments will work with GT3 and Tomcat
- But not with security
- Limit of 1GB (DIME)
- Bug in Axis doesnt clean up temporary files.
56 Grid Practice - Performance
- Stateful instances visible externally can be
problematic - Intermittent unreliability
- On some runs, 1 exception in 300 calls
(reliability of .9967) - But non-repeatable, SOAP/network related?
- What is the safe response to exceptions? Cant
just retry. - Possible to kill container (relies on clients
being well behaved) - By invoking same instance/method more than once.
- By consuming container resources
- But instances can be passivated/activated in
theory - Could be used to enable fine-grain (per instance)
control over resource usage.
57 Grid Practice - Pitfalls
- Production quality Grid middleware needs (What
this bike needs is ) - Support for
- Remote
- location independent
- cross-organisational
- multiple role scenarios
- Such as
58 Grid Practice - Pitfalls (continued)
- Platform independent, automatic, installation.
- Tool support for configuration and deployment
creation, validation, viewing and editing. - Management console for grid, nodes, globus
packages, containers and services. - Remote deployment and management of services.
- Remote distributed debugging of grid
installations, services, and applications. - Tool support, and more scalable processes for
security.
59 Grid Potential
- Principles
- What are the principles of Grid middleware?
- Practice (and pitfalls)
- How easy is it to use in practice? What are the
pitfalls? - Potential
- What potential does Grid middleware have to
- (1) provide insight into different ways of using
Service Oriented Architectures, and - (2) support automatic deployment and debugging?
60 Grid Potential Architectural alternatives
- Evaluate the two approaches in more detail
- Science exposed as services, vs science code
managed by higher level grid services. - Explore alternative mechanisms for
- Executing science code
- Load balancing and scheduling/resource management
- Directory services (service and resource
discovery) - Data movement (e.g. SOAP Attachments vs GridFTP)
61 Grid Potential Architectural evaluation
- Evaluation approach
- Loosely based on ATAM mechanisms
- Clarify the role of different GT3 mechanisms, and
quantify pros/cons - Two versions of application
- Evaluate with
- Architecture
- Roles
- Scenarios (to quantify quality attributes)
62 Grid Potential Architectural evaluation
- Pick a number of roles of interest
- Define attributes of interest, and scenarios to
exercise and measure them - Deployment
- Consistency of deployment, and time to deploy
- Debugging
- Ability to locate root cause of problem and
rectify - Security admin
- Cost/time to secure increasing number of
clients/nodes - Grid owner
- Scalability and ease of management
63 Grid Potential Architectural evaluation
- Hypothesis
- Both approaches to using Grid are identical
- But wont be surprised by some differences e.g.
scalability, discovery, deployment - Problems with
- MDS3 (Directory and resource discovery service)
working with aggregated service data across sites - GridFTP
- Wrapping Science code with MMJFS
64 Grid Potential - Deployment
- How to install and configure Grid infrastructure
and services - scalably and securely? - Install GT3 infrastructure and security manually
- MMJFS allows executable code to be staged
automatically (But not services - could provide a
deployment service). - Install bootstrapping code, and then install and
deploy all other code and security automatically. - Using SmartFrog (HP) in the lab, and then
test-bed. - Firewalls, platform specific configurations, user
sand-boxing, configuring GT3 security remotely,
and trust with System Administrators are open
issues.
65 Grid Potential Deployment Speculation
- Explicit deployment-flows?
- In Enterprise applications are increasingly
represented as work-flows. - Good for distributed execution, and
comprehensibility. - What if deployment plans are also represented
explicitly as flows (deployment-flows)? - Some work on work-flow aware resource management
(for Grid). - Deployment-flows could even be auto-magically
generated from work-flows, and executed to ensure
resources are deployed correctly JIT for
work-flow execution.
66 Grid Potential Deployment Speculation
- For example
- Work-flow with two tasks
- 1st task requires 10 nodes, 2nd task 100 nodes.
- Produce deployment-flow which is interleaved with
work-flow to - Deploy 1st service for first task to 10, and
start execution - Deploy 2nd service to 100 nodes concurrent with
execution of 1st task, and ready for execution of
2nd.
67 Grid Potential Deployment Speculation
Execute T1
Execute T2
T1 x 10
T2 x 100
S1
S1
S1
S1
S1
S1
S1
S1
S1
S1
S1
S1
Could also include un-deployment
S1
S2
S2
S2
S2
S2
S2
Deploy S1 x 10
Deploy S2 x 100
68 Grid Potential - Deployment Debugging
- Debugging distributed systems is tricky
- Need better support for cross-cutting
non-functional concerns such as deployment and
debugging. - (One) problem with debugging services is not
knowing the context of errors (to aid diagnosis
or cure) a service is just a black box with an
interface. - Deployment aware debugging
- Starting from functional work-flows, generate
deployment-flows, which are executed prior to, or
concurrent with, functional work-flows. - This ensures that deployment is done consistently
and automatically with respect to application
execution. - If failure in functional work-flow, then
corresponding deployment-flow is examined to
determine likely causes, and parts are
re-executed. - Failure in deployment-flow can also possibly be
managed.
69 Grid Potential - Deployment Debugging
- Three phases of Debugging
- Debug deployment
- Relies on deployment infrastructure and
deployment-flows - What works locally or on one node may not work
remotely, or identically on all nodes without
modification, and deployment framework itself may
be an extra cause of failure - Debug/trace application infrastructure to get
working initially - Relies on visibility/transparency of deployed and
running infrastructure and application - Ideally want integrated (active), or at least
proxy/sniffer (passive), debugging (profiling,
tracing, stepping) support. - Debug working application upon failure
- But multiple failure modes
- Has application infrastructure been analysed
and/or tested for them all? - Can diagnosis and rectification be done anyway?
70 Grid Potential - Deployment Debugging
- Backtrack through deployment steps (Like peeling
an onion) - Some steps will need to be reversed, and then
redone correctly - Manage dependent, redundant, and inconsistent
operations - This approach may fix an (interesting) sub-class
of problems - Those which can be fixed by simply redoing (or
replicating) (part of) the installation, E.g. - Intermittent failure of container or services
- Resource starvation or overload deploy services
to more resources - Security problems that can be fixed with
reconfiguration or refresh of certificates/proxies
. - But not
- network, or all configuration and security/access
problems. - Or Enterprise Web services (from a user
perspective, as users cant deploy)
71 Grid Potential - Deployment Debugging
Execute T1
Execute T2
T1 x 10
Failure!
S1
S1
S1
S1
S1
S1
?
S1
S1
S1
S1
S1
S1
S1
S2
S2
S2
S2
S2
S2
Deploy S1 x 10
Deploy S2 x 100
Redploy S2 on failed node
72 Grid Potential - Deployment Debugging
- Whats still needed?
- Connection between executing client code and
deployment infrastructure - Ability to reason about relationship between
work-flow/client failures, deployment-flows and
grid infrastructure, diagnose failure causes, and
plan solutions - Ideally want applications and deployment
represented explicitly as flows work and
deployment flows. - Could possibly infer work-flow and therefore
deployment-flow from running system in the
absence of explicit information? - Justification is the problem significant, and
how far does this solution go?
73 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
74 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
- Not
75 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
- Not (quite)
76 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
- Not (quite) the
77 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
- Not (quite) the End
78 UK OGSA Evaluation Project
- Thank you ?
- Email P.Brebner_at_cs.ucl.ac.uk
- After November Paul.Brebner_at_csiro.au
- Not (quite) the End
79 Postscript The Secret Life of Grid?
- Our experiences Evaluating Grid technology
reminds me of an Australian book (The Secret
Life of Wombats) about a school boy who used to
sneak out of his dormitory after everyone was
asleep to go wombatting. He spent his nights
secretly crawling down Wombat burrows with a
flashlight a potentially lethal activity (not
just from cave-ins, as wombats are ferocious when
cornered!) and wrote copious notes resulting in
a substantial increase in knowledge of these
mysterious and often misunderstood creatures.
80 Postscript The Secret Life of Grid?
- Our experiences Evaluating Grid technology
reminds me of an Australian book (The Secret
Life of Wombats) about a school boy who used to
sneak out of his dormitory after everyone was
asleep to go wombatting. He spent his nights
secretly crawling down Wombat burrows with a
flashlight a potentially lethal activity (not
just from cave-ins, as wombats are ferocious when
cornered!) and wrote copious notes resulting in
a substantial increase in knowledge of these
mysterious and often misunderstood creatures.
UK OGSA Evaluation Project Report 1.0 Evaluation
of Globus Toolkit 3.2 (GT3.2) Installation http//
sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
81 Postscript The Secret Life of Grid?
- Our experiences evaluating grid technology
reminds me of an Australian book (The Secret
Life of Wombats) about a school boy who used to
sneak out of his dormitory after everyone was
asleep to go wombatting. He spent his nights
secretly crawling down Wombat burrows with a
flashlight a potentially lethal activity (not
just from cave-ins, as wombats are ferocious when
cornered!) and wrote copious notes resulting in
a substantial increase in knowledge of these
mysterious and often misunderstood creatures.
The End
UK OGSA Evaluation Project Report 1.0 Evaluation
of Globus Toolkit 3.2 (GT3.2) Installation http//
sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc