Title: The Current Status and Future Direction of Geospatial Grid
1The Current Status and Future Direction of
Geospatial Grid
- Liping Di
- Laboratory for Advanced Information Technology
and Standards (LAITS) - George Mason University (GMU)
- lpd_at_rattler.gsfc.nasa.gov
- ldi_at_mason.gmu.edu
- http//laits.gmu.edu
2The Grid Technology
- The Grid technology is developed for secured
computational resource sharing and coordinated
problem solving in dynamic, multi-institutional
virtual organizations. - Computer CPU cycles
- Storage
- Networks.
- Data, Information, algorithms, software,
services. - Human expertise.
- It was originally motivated and supported from
sciences and engineering requiring high-end
computing, for sharing geographically distributed
high-end computing resources. - The core of the technology is the the open source
middleware called Globus Toolkit. - The latest version of Globus is version 3.0 which
implements the Open Grid Service Architecture
(OGSA)
3What Grid Provides
- Enabling new large-scale scientific research and
applications through the coordinated use of
geographically distributed resources - E.g., distributed collaboration, data access and
analysis, distributed computing - Persistent infrastructure for Grid computing
- E.g., certificate authorities and policies,
protocols for resource discovery/access
4Why Now
- The key infrastructure for Grid is the high-end
computing resouces. And the key enabling
technology is the high-speed network. - Moores law and highly functional end-systems
- Ubiquitous Internet Þ universal connectivity
- Network exponentials produce dramatic changes in
geometry and geography - 9-month doubling double Moores law!
- 1986-2001 x340,000 2001-2010 x4000?
- New modes of working and problem solving
emphasize teamwork, computation - New business models and technologies facilitate
outsourcing
5The Layered Grid Architecture
6Grid Funding
- US government puts about 100m/year for Grid
research (40) and infrastructure building (60).
- NASA is putting approximately 10 million per
year - DOEs Office of Science is putting at least
7M/yr into Grid software development, deployment
of the DOE Science Grid, and several major Grid
application integration projects (high energy
physics, earth sciences, fusion energy) - NSF is putting 10-20M/yr into Grid software
development and several major Grid application
integration projects e.g. - National Earthquake Engineering Systems Grid
(bring all major US earthquake engineering
instruments onto a Grid) - National Virtual Observatory (a Grid application
to provide uniform access to all major astronomy
datasets) - NSF is putting 50M/yr into its new Grid based
supercomputer centers (Distributed Terascale
Facility) - UK eScience Grid is building a UK-wide science
Grid (50M/yr ) - European Union Data Grid (high energy physics)
7M/yr,EU GridLab (numerical relativity) 3M/yr,
others. - China, Japan.
- Currently most of Grid projects are in areas of
Physics, Astronomy, Bioinformatics etc. - Small part of those funding is used in the
geospatial Grid research and development.
7Why Grid is useful to the Geospatial community?
- Geospatial community is responsible for
collection, management, processing, archive,
distribution, and applications of geospatial data
and information. - Because of importance of geospatial data, many
public and private organizations have been
engaged in geospatial data and information
activities. As such, Geospatial data and
associated computational resources are naturally
distributed. - The multi-discipline nature of geospatial
research and applications requires the integrated
analysis of huge volume of multi-source data from
multiple data centers. This requires sharing of
both data and computing powers among data
centers. - Most geospatial modeling and applications are
both data and computational intensive. - Therefore, Grid is an ideal technology for the
geospatial community.
8Geospatial Grids
- Geospatial Grids in this presentation include
- Geospatial data and information Grids which
emphasizes on data access and information
services on large, distributed geospatial data
archives. - Geospatial computational Grids, mainly on
coordinating the computational resources for
large-scale Earth science modeling and
applications. e.g., climate modeling. - Or combination of both.
- Geospatial Grids are considered as the extensions
and domain-specific applications of the
fundamental Grid Technology in the geospatial
disciplines. - The power sources for geospatial data,
information, and knowledge.
9Why Needs the geospatial extensions of Grid
- Geospatial data and information are significantly
different from those in other disciplines. - Very complex and diverse.
- Formats, projection, resolutions.
- Hyper-dimensions spatial, temporal, spectral,
thematic. - Raster vs. vectors
- Large data volume
- more than 80 of data human beings has collected
is spatial data. - The geospatial community has developed a set of
standards specifically for geospatial data and
information that users have been familiar with.
(e.g., OGC, ISO, FGDC). - The core Grid technology is developed for general
sharing of computational resources and not aware
of the specialty of geospatial data. - In order to make Grid technology applicable to
geospatial data, we have to do the geospatial
domain-specific extensions.
10The Current Status of Geospatial Grids
- The geospatial community started to look into the
potentials of the Grid technology about three
years ago. - Dozens of geospatial Grid projects are going on
worldwide - US, EU, Japan, China, CEOS
- Most of the projects are in the stage of learning
the Grid technology and evaluating its
applicability to the geospatial domain. - A few prototype applications are being developed.
- Some projects are working on extending Grid for
meeting the specific requirements of geospatial
applications. - No large-scale multi-organization, operational
geospatial Grids are existing today.
11Selected Geospatial Grid Projects
12Selected Geospatial Grid Projects
13The CEOS Grid Testbed
- Committee on Earth Observation Satellites (CEOS)
is an inter-government organization responsible
for coordinating the international civilian Earth
observation (EO) activities. - All major space agencies in the world are members
of CEOS - The CEOS Working Group on Information Systems and
Services (WGISS) are responsible for exploring
and coordinating the data and information
technologies for EO. - WGISS started to explore the potentials of Grid
technology for EO in September 2001. - The CEOS Grid Testbed was established in
September 2002. - Currently the testbed consists of six Grid
prototypes supported by individual space
agencies.
14Objectives of the CEOS Grid Testbed
- Objective 1 (Phase I) Establish a CEOS Grid
Technology Core Testbed with at least three major
participating agency nodes. The purpose of the
Grid Technology Core is to establish an immediate
Grid capability base (including technologies,
pilot applications, and knowledgeable people)
within the participating CEOS agencies --
Finished. - Objective 2 (Phase II) Demonstrate at least
three CEOS Grid-enabled Applications, each
involving at least one CEOS agency and partner
site which could include other CEOS agencies.
The purpose of the Grid-enabled Application
Demonstrations is to show proof of concept,
evaluate benefits, and obtain lessons learned,
from infusion of Grid technologies from the
Technology Core into real CEOS agency information
systems and applications on going, to be
finished by the end of 2003. - Objective 3 (Phase III) Infuse applicable Grid
technologies into member agency information
systems and into at least one WGISS Test Facility
(WTF). The outcome of this Grid technology
infusion would be to create a persistent CEOS
Grid that would be available to support future
CEOS agency initiatives. - The ultimate goal of the CEOS Grid testbed is to
establish a CEOS Data Grid that covers most of
CEOS member agencies to support major
international science and EO initiatives.
15Applications in CEOS Grid testbed
16Applications in CEOS Grid testbed
17Future Directions-Research
- Both Grid and Semantic Web community adopted
Service-Oriented Architecture (SOA). - OGSA represents the next step in evolution of
Grid technology. - The research directions in geospatial Grids
- Extend the Grid functions to
- Make them spatially aware.
- Provide community-specified interfaces, data
models, metadata-implement geospatial standards. - Handle the complexity and diversity of geospatial
data. - Scale-up to handle petabytes of geospatial data.
- Enable geospatial services under OGSA
- The virtual geospatial datasets-geo-object,
geo-tree, geospatial service modules, geospatial
models. - Development of geospatial service modules and
models - Service-based dynamic Geospatial model
construction and execution under OGSA. - Semantic/intelligent geospatial Grid
- intelligent geospatial Grids that can
automatically generate answers to users
questions. - Ontology-driven automatic query decomposition and
service model/workflow construction is the most
promising approach.
18Future Direction-Implementation
- Establishment of large-scale operational
geospatial Grids for supporting operational
sharing of geospatial data, information and
resources at agency, national, or international
levels. - NASA EOS data pool Grid.
- National Geospatial Data Gridto be proposed very
soon. - EU data Grid.
- The CEOS Grid.
- With the current on-going research, establishment
of such geospatial Grids will be technically
feasible within two years. - However, the implementation is not just a
technical issue, it is also a - policy issue- such as data policy.
- organization issue.
- Resource issue.
- Operational issue.
19Lessons Learned for Building Large-Scale Grids
- The following slides are mainly from presentation
given by Williams Johnston of NASA IPG. - Those lessons-learned are mainly from building
large-scale Grids in disciplines other than
geospatial ones, but should be useful for
building large-scale geospatial Grid. - Establishing good working relationships among all
of the people involved is essential. - Successful Grids involve almost as much sociology
as technology. - Deploying Operational Human Infrastructure as
early as possible - Establish an Engineering Working Group that
involves the Grid deployment teams at each site - Set up liaisons with the systems administrators
for all systems that will be involved. - Identify the computing and storage resources to
be incorporated into the Grid.
20Lessons Learned for Building Large-Scale Grids
- Establishing good working relationships among all
of the people involved is essential. - Successful Grids involve almost as much sociology
as technology. - Deploying Operational Human Infrastructure as
early as possible - Establish an Engineering Working Group that
involves the Grid deployment teams at each site - Set up liaisons with the systems administrators
for all systems that will be involved. - Identify the computing and storage resources to
be incorporated into the Grid.
21Grid Information System
- Plan for a GIS/GIIS sever at each distinct site
with significant resources - this is important in order to avoid single points
of failure - Structure of the GIIS is one of the basic scaling
issues for Grids
22Cross Site Trust
- Set up or identify a CA to issue Grid user
identity certificates the basis of the GSI - the basic trust management mechanism
- The Certificate Policy Statement codifies how you
will run your CA and to whom you will issue
certificates - cross site trust is based on this
- Dont try and invent your own CPS!
- Look at ESnet CP (envisage.es.net) and Grid Forum
CP
23Defining / Understanding the Extent of Your Grid
- The boundaries of a Grid are primarily
determined by two factors - what CAs you trust
- this is explicitly configured in each Globus
environment - however there is no guarantee that every
resourcein what you think is your Grid trusts
the same set of CAs i.e. each resource
potentially has a different space of users - in fact, this will be the norm if the resources
are involved in multiple virtual organizations as
they frequently are in the high energy physics
experiment communities - how you scope the searching of the GIS/GIISs
- this depends on the model that you choose for
structuring your directory services
24Maintaining Local Control
- Establish the conventions for the Globus mapfile
- maps user Grid identities to system UIDs
- this is the basic local control / authorization
mechanism for each individual compute and storage
platform
25Take Good Care of the Users as Early as Possible
- Establish a Grid/Globus application specialist
group - they should be running sample jobs as soon as the
prototype-production system is operational - they should serve as the interface between users
and the Globus system administrators to solve
Globus related application problems - Identify early users and have the Grid/Globus
application specialists assist them in getting
jobs running on the Grid