Title: History
1History
- For years, a few whacky computer scientists have
been trying to help other scientists use
distributed computing. - Interactive simulation (climate modeling)
- Very large-scale simulation and analysis (galaxy
formation, gravity waves, battlefield simulation) - Engineering (parameter studies, linked component
models) - Experimental data analysis (high-energy physics)
- Image and sensor analysis (astronomy, climate
study, ecology) - Online instrumentation (microscopes, x-ray
devices, etc.) - Remote visualization (climate studies, biology)
- Engineering (large-scale structural testing,
chemical engineering) - In these cases, the scientific problems are big
enough that they require people in several
organizations to collaborate and share computing
resources, data, instruments.
2What Types of Problems?
- Your system administrators cant agree on a
uniform authentication system, but you have to
allow your users to authenticate once (using a
single password) then use services on all
systems, with per-user accounting. - You need to be able to offload work during peak
times to systems at other companies, but the
volume of work theyll accept changes from
day-to-day.
3What Types of Problems?
- You and your colleagues have 6000 datasets from
the past 50 years of studies that you want to
start sharing, but no one is willing to submit
the data to a centrally-managed storage system or
database. - You need to run 24 experiments that each use six
large-scale physical experimental facilities
operating together in real time.
4Two Cardinal Rules of the Grid
- You cant rely on homogeneity.
- Impossible to achieve in the real world.
- STRATEGY - Plan on dealing with diverse systems
and use mechanisms to manage heterogeneity. - You cant rely on trust.
- Severely limits participation.
- STRATEGY - Provide a security model that can
express complicated social networks. - STRATEGY - Use full disclosure when making
requests (who is requesting, authorizing, and
authenticating the request) and give service
owners and service hosts tools to enforce local
policies.
5Challenging Applications
- The applications that Grid technology is aimed at
are not easy applications! - The reason these things havent been done before
is because people believed it was too hard to
bother trying. - If youre trying to do these things, youd better
be prepared for it to be challenging. - Grid technologies are aimed at helping to
overcome the challenges. - They solve some of the most common problems
- They encourage standard solutions that make
future interoperability easier - They were developed as parts of real projects
- In many cases, they benefit from years of lessons
from multiple applications - Ever-improving documentation, installation,
configuration, training
6Earth System Grid
- Goal Give climate scientists easier access to
the distributed data and resources that they
require to perform their research. - Developed new technologies for (1) creating and
operating "filtering servers" capable of
performing sophisticated analyses, and (2)
delivering results to users.
7Collaborative Engineering NEES
U.Nevada Reno
www.neesgrid.org
8Laser Interferometer Gravitational Wave
Observatory
- Goal Observe gravitational waves predicted by
theory. - Three physical detectors in two locations. (Plus
GEO detector in Germany.) - Ten data centers for data analysis.
- Collaborators in 40 institutions on at least
three continents.
9Cancer Biology
10NSFs TeraGrid
- TeraGrid DEEP Integrating NSFs most powerful
computers (60 TF) - 2 PB Online Data Storage
- National data visualization facilities
- Worlds most powerful network (national
footprint) - TeraGrid WIDE Science Gateways Engaging
Scientific Communities - 90 Community Data Collections
- Growing set of community partnerships spanning
the science community. - Leveraging NSF ITR, NIH, DOE and other science
community projects. - Engaging peer Grid projects such as Open Science
Grid in the U.S. as peer Grids in Europe and
Asia-Pacific. - Base TeraGrid CyberinfrastructurePersistent,
Reliable, National - Coordinated distributed computing and information
environment - Coherent User Outreach, Training, and Support
- Common, open infrastructure services
UC/ANL
PSC
PU
NCSA
IU
ORNL
UCSD
UT
- A National Science Foundation Investment in
Cyberinfrastructure - 100M 3-year construction (2001-2004)
- 150M 5-year operation enhancement (2005-2009)
Slide courtesy of Ray Bair, Argonne National
Laboratory
11What End Users Need
Secure, reliable, on-demand access to
data, software, people, and other
resources (ideally all via a Web Browser!)
12How it Really Happens
ComputeServer
SimulationTool
ComputeServer
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
ChatTool
DataCatalog
Database service
CredentialRepository
Database service
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
13How it Really Happens
- Implementations are provided by a mix of
- Application-specific code
- Off the shelf tools and services
- Tools and services from the Globus Toolkit
- Tools and services from the Grid community
(compatible with GT) - Glued together by
- Application development
- System integration
14Globus Philosophy
- Globus was first established as an open source
project in 1996 - The Globus Toolkit is open source to
- Allow for inspection
- for consideration in standardization processes
- Encourage adoption
- in pursuit of ubiquity and interoperability
- Encourage contributions
- harness the expertise of the community
- The Globus Toolkit is distributed under the
(BSD-style) Apache License version 2
15dev.globus
- Governance model based on Apache Jakarta
- Consensus based decision making
- Globus software is organized as several dozen
Globus Projects - Each project has its own Committers responsible
for their products - Cross-project coordination through shared
interactions and committers meetings - A Globus Management Committee
- Overall guidance and conflict resolution
16http//dev.globus.org
Guidelines(Apache Jakarta) Infrastructure(CVS,
email,bugzilla, Wiki) Projects Include
17Open Source ! Free time
- Globus development is well-funded.
- The open source model facilitates contributions.
- NSF and DOE sponsor Globus development at several
institutions via multiple grants, totaling
gt5M/yr. - Non-U.S. science agencies also contribute to
Globus development. - Corporations also sponsor developers.
- NSF explicitly funds Globus improvements.
- CDIGS Community-Driven Improvements to Globus
Software
18Globus Technology Areas
- Core runtime
- Infrastructure for building new services
- Security
- Apply uniform policy across distinct systems
- Execution management
- Provision, deploy, manage services
- Data management
- Discover, transfer, access large data
- Monitoring
- Discover monitor dynamic services
19Globus Software dev.globus.org
Globus Projects
OGSA-DAI
GT4
MPICH- G2
Data Rep
Replica Location
Java Runtime
MyProxy
Delegation
GridWay
GridFTP
MDS4
CAS
C Runtime
GSI- OpenSSH
Incubator Mgmt
Reliable File Transfer
GRAM
Python Runtime
C Sec
GT4 Docs
Incubator Projects
Cog WF
GAARDS
Virt WkSp
MEDICUS
OGRO
GDTE
UGP
GridShib
Dyn Acct
Gavia JSC
DDM
Metrics
LRMA
HOC-SA
PURSE
Introduce
WEEP
Gavia MS
SGGC
ServMark
Security
Execution Mgmt
Info Services
Common Runtime
Other
Data Mgmt
20What Is the Globus Toolkit?
- The Globus Toolkit is a collection of solutions
to problems that frequently come up when trying
to build collaborative distributed applications. - Heterogeneity
- To date (v1.0 - v4.0), the Toolkit has focused on
simplifying heterogenity for application
developers. - We are increasingly including more vertical
solutions that implement typical application
patterns. - Security
- The Grid Security Infrastructure (GSI) allows
collaborators to share resources without blind
trust. - Standards
- Our goal has been to capitalize on and encourage
use of existing standards (IETF, W3C, OASIS,
GGF). - The Toolkit also includes reference
implementations of new/proposed standards in
these organizations.
21Whats In the Globus Toolkit?
- A Grid development environment
- Develop new OGSA-compliant Web Services
- Develop applications using Java or C/C Grid
APIs - Secure applications using basic security
mechanisms - A set of basic Grid services
- Job submission/management
- File transfer (individual, queued)
- Database access
- Data management (replication, metadata)
- Monitoring/Indexing system information
- Tools and Examples
- The prerequisites for many Grid community tools
22Leveraging Existingand Proposed Standards
- SSL/TLS v1 (from OpenSSL) (IETF)
- LDAP v3 (from OpenLDAP) (IETF)
- X.509 Proxy Certificates (IETF)
- GridFTP v1.0 (GGF)
- OGSI v1.0 (GGF)
- WSRF (OASIS)
- And others on the road to standardization
- DAI, WS-Agreement, WSDL 2.0, WSDM, SAML, XACML
23Areas of Competence
- Connectivity Layer Solutions
- Service Management (WS Core)
- Monitoring/Discovery (WS Core)
- Security (GSI and WS-Security)
- Communication (XIO)
- Resource Layer Solutions
- Computing / Processing Power (GRAM)
- Data Access/Movement (GridFTP, OGSA-DAI)
- In development Telecontrol (GTCP)
- Collective Layer Solutions
- Data Management (RLS, DRS, RFT, OGSA-DAI)
- Monitoring/Discovery (Index, Trigger, Archiver
services) - Security (CAS, MyProxy)
24How To Use the Globus Toolkit
- By itself, the Toolkit has surprisingly limited
end user value. - Theres very little user interface material
there. - You cant just give it to end users (scientists,
engineers, marketing specialists) and tell them
to do something useful! - The Globus Toolkit is useful to application
developers and system integrators. - Youll need to have a specific application or
system in mind. - Youll need to have the right expertise.
- Youll need to set up prerequisite
hardware/software. - Youll need to have a plan.
25An Ecosystem of Grid Software
- There isnt a Grid software kit for everybody
(yet). - Varying requirements
- Experimentation and learning
- Reluctance to invest in a static solution
- There are many tools that work well together.
- Results of successful projects
- Reusable solutions
- Implication Integrate it yourself (for now).
- Provides considerable flexibility
- Requires expertise and effort
- Reminder These are ambitious applications!
26Methodology
- Building a Grid system or application is
currently an exercise in software integration. - Define user requirements
- Derive system requirements or features
- Survey existing components
- Identify useful components
- Develop components to fit into the gaps
- Integrate the system
- Deploy and test the system
- Maintain the system during its operation
- This should be done iteratively, with many loops
and eddies in the flow.
27Globus User Community
- Large diverse
- 10s of national Grids, 100s of applications,
1000s of users probably much more - Every continent except Antarctica
- Applications ranging across many sciences
- Dozens (at least) of commercial deployments
- Successful
- Many production systems doing real work
- Many applications producing real results
- Smart, energetic, demanding
- Constant stream of new use cases tools
28GlobalCommunity
29Examples ofProduction Scientific Grids
- APAC (Australia)
- China Grid
- China National Grid
- DGrid (Germany)
- EGEE
- NAREGI (Japan)
- Open Science Grid
- Taiwan Grid
- TeraGrid
- ThaiGrid
- UK Natl Grid Service
30The Importance of Community
- All Grid technology is evolving rapidly.
- Web services standards
- Grid interfaces
- Grid implementations
- Grid hosting services (ASP, SSP, etc.)
- Community is important!
- Best practices (GGF, OASIS, etc.)
- Open source (Linux, Axis, Globus, etc.)
- Application of community standards is vital.
- Increases leverage
- Mitigates (a bit) effects of rapid evolution
- Paves the way for future integration/partnership