Title: CEOS IDN Task Team
1CEOS IDN Task Team
2IDN Agenda 9 May 2002
- IDN Minutes from Darmstadt and IDN Profile are at
http//idn.ceos.org - Data Policy
- IDN Metrics (from GCMD node)
- IDN Content History
- Content Strategies
- IDN Keywords
- Authoring Tools
- MD8 Status
- Break
3IDN Agenda 9 May 2002
- MD8 Status continued
- MD8 Software Waiting List
- IDN Collaborations
- Collaborations Operational Portals
- IDNs Use of ZOPE for Communications
- MD9 and ISO 19115
- Lorant Czaran on ISO 19115
- Issues/Concerns
4Data Policy
5Data Policy Issues
- Global Change Research Policy Statements from the
Executive Office of the President - OSTP in 1991 - U.S. Global Change Research Program requires an
early and continuing commitment to the
establishment, maintenance, validation,
description, accessibility and distribution of
high-quality, long-term data sets. - Full and open sharing of the full suite of global
data sets for all global change researchers is a
fundamental objective. - Preservation of data needed for long-term global
change research is required. - Data archives must include easily accessible
information about the data holdings, including
quality assessments, supporting ancillary
information, and guidance and aids for locating
and obtaining data. - National and international standards should be
used to the greatest extent possible for media
and for processing and communication of global
data sets. - Data should be provided at the lowest possible
cost to global change researchers in the interest
of full and open access to data. - For those programs in which selected principal
investigators have initial periods of exclusive
data use, data should be made openly available as
soon as they become widely useful.
6Data Policy Issues
- National Academy of Sciences (U.S.)
- National Research Council
- U.S. National Committee for CODATA
- CODATA 2002 - Frontiers of Scientific and
Technical Data (29 September - 3 October) - CGED Dr. Anne Linn
- Dr. Bernard Minster, Chairman
- Upcoming workshop on Carbon Cycle data.
- International Policies
7Combined NSC, DPC, NEC Climate Change Policy
Panel (Program Review)
Committee on Climate Change Science and
Technology Integration Chair Secretary of
Commerce, Vice-Chair, Secretary of
Energy Executive Director OSTP Director
Interagency Working Group on Climate Change
Science and Technology Chair Deputy/UnderSecretar
y of DOE, Vice Chair Deputy/UnderSecretary of
DOC Secretary OSTP AD for Climate Science
Technology
Climate Change Science Program Office Director
Commerce Detailee
Climate Change Technology Program Department of
Energy
8IDN Metrics (from GCMD node)
9Ten Years of DIFs May 1992-March 2002
10New DIFs
11DIFs by TOPIC
12GCMD Population by NodeMarch 2002
13Unique Hosts
142000-2001 Web Usage by Domain
15Controlled Keyword Search
162000-2001 Parameter Searches
17Total DIF Retrievals
18Total DIF Retrievals
19Redirects to Other Data
- Top redirects from DIFs
- 2001 2000
- NASA data/web pages
631 244 - NOAA data/web pages 355 389
- EOSDIS DAAC data/web pages 349 290
- USGS data/web pages
329 972 - CDIAC data/web pages 40
90 - CCRS/CEONET/GeoConnections 34 n/a
- International data/web pages 118 193
- Other data/web pages (various) 466
2185
20 Unique Hosts Since Jan 01
21 Web Page Hits Since Jan 01
22Decline in Usage?
- GCMD web usage has tended to be flat over the
past year. - Prior to 9-11, usage was showing a 1.6 increase
for the year. - Since 9-11, usage has declined. Overall GCMD
usage has declined by 3 from the past year. - Numeric domains have increased by 18 over 2000,
but .gov domains have declined by almost 47
since 2000. - FGDC Clearinghouse changed filtering of Isite
queries. - Is decline due to increased information available
(information saturation), 9-11, decline in
interest on climate issues, other factors?
23Decline in Usage?
- Web page hits have increased since Jan 2001,
while unique hosts has decreased. Possible
reasons - Domain contraction more users on fewer hosting
domains. AOL has 13.58 of global ISP market
more gov agencies using single domain (e.g.,
usgs.gov). - More users behind firewalls.
- More hits are by robots? (we block them from DIFs
but not web pages). - Fewer users are making more hits.
24Who Links to the GCMD?
- GCMD is 1 on Google search for global change
- Week of April 1, 2002
- Top 10 sites that link to GCMD (from Google)
- PODAAC
- NSIDC
- WWW Virtual Library Meteorology
- AADC Metadata page
- LBNL Energy Crossroads Climate Change Page
- WHOI COFDL Laboratory
- The Weather Pointers Page
- NOAA/PFEL
- Yahoo! Environment and Pollution (French)
- NASADAACS page
- Quaternary Web Resources (Colby College)
25Who Links to the GCMD?
- Googles top 10 sites that link to GCMD (pt 1)
- (Week of April 15, 2002)
- Google-ranked sites that are most often linked
with links to GCMD - GES DAAC Direct Links to MODIS Data
- http//acdisx.gsfc.nasa.gov/data/dataset/MODIS/nof
rills.html - GES DAAC MODIS Overview
- http//daac.gsfc.nasa.gov/MODIS/overview.shtml
- BakerHughes Industry Links-Labs, Research, Gov
- http//www.bakerhughes.com/bakerhughes/resources/l
abs.htm - PCI Geomatics Industry Links
- http//www.pcigeomatics.com/corpinfo/ind_links.htm
l - VGL Data Links
- http//www.umich.edu/vgl/booksdata/data.html
- SeaWiFS Evaluation Products
- http//daac.gsfc.nasa.gov/data/dataset/SEAWIFS/06_
New_Products/
26Who Links to the GCMD?
- Googles top 10 sites that link to GCMD (pt 2)
- DAAC Alliance ProductsServices Page
- http//nasadaacs.eos.nasa.gov/data/path12.html
- Harvard University Environment and Sustainable
Development - http//www.cid.harvard.edu/esd/esdlinks/esdlinks.h
tml - Blackwell Publishers - Geospatial Datasets
- http//www.blackwellpublishers.co.uk/geog/data.asp
- RPI/Rensselaer Research Libraries
- http//www.lib.rpi.edu/dept/library/html/resources
/subjects/science/earth.html - RSMAS Library Internet Resources
- http//www.rsmas.miami.edu/support/lib/library_lin
ks.html
27IDN Content History
28Content Strategies
29Past Content Struggle
- Non-existent to poor authoring tools.
- Inadequate operations facility for interacting
with the database. - Science Coordinators had to gather all data set
info through intensive, laborious process. - Little interest or cooperation by data centers or
data set producers. - Result Prior to 1994 - 3 DIFs/month/coordinator
were written.
30Present Content Strategy
- Make improved authoring tools available.
- Provide effective operations facility for
validation/loading entries into database. - Provide capability to update on the spot.
- Unsolicited entries now arriving - from data set
producers, data center personnel, portal
representatives, other international and
interagency groups. - (Although still time-consuming to gather all
information.) - Result 35.3 DIFs/coordinator/month (April 2001
- March 2002)
31Future Content Strategy
- Make further improvements of authoring tools.
- Further enhance Operations Client and QA facility
for quality control and loading of entries. - Provide ownership of entries through portals and
distributed nodes, and thus expect more
contributions from partners. - Distribute final validation QA function beyond
GCMD node - providing even more sense of
ownership and responsibility. - Increase interest - sometimes initially by
software developers and later by content
providers.
32Reasons for MD8 Operations Client A Client
Users Perspective
- One person performed all database administration
tasks - Increased interest by partners to write and share
metadata - Clumsy text-based interface introduced errors and
increased maintenance
33How MD8 Has Changed Our Mode of Operation for
the Better
- Database administration tasks shared by science
coordinators. - Decreased time between submission of metadata and
its entry into the database. - Allows users to perform tasks that previously
required knowledge of command line Oracle SQL. - Eases the process of managing personnel and
valids. - Graphical User Interface.
34Operations (OPS) Loading Metadata Records
35Operations (OPS) Extracting content
from the database
36Content Strategy - Using the QA Ops
- You gotta know when to hold em,
- Know when to load em,
- Know when to walk away,
- Know when to run.
- You never count your DIFs when theyre only in
the table... - Therell be time enough for countin when the
loadins done. - You gotta know when to bold em,
- Know when to fold em, ...
37IDN Keywords
38Data Center Bucket Revision
- Original list created for HCIL Interface.
- Buckets not adequate for Science Keyword
Interface. - Overlapping Buckets
- Minimal Quality Control of Original Buckets
- Science coordinators created new bucket list.
- Staff is in the process of matching each Data
Center valid to a new bucket.
39Data Center Bucket Revision
- Old Buckets
- Commercial
- DOC
- DOD
- DOE
- DOI
- EPA
- Federal Agencies
- Institutions
- International
- International Agencies
- NASA
- NOAA
- Non-Profit Organizations
- NSF Regional Agencies
- Universities
- USDA USGS
- World Data Centers
- NEW Buckets
- Academic
- Commercial
- Consortia/Institutions
- Multinational
- Non-Government Agencies
- Non-US Government
- US Federal Agencies
- US State and Regional Agencies
40Data Center Bucket Revision
- US Federal Agencies
- DOC NASA
- DOD NSF
- DOE USDA
- DOI USGS
- EPA
- DOT
41Keyword Changes
- Guiding Principles Follow the Rules!
- Earth science parameters are a 4-tier controlled
vocabulary for indexing and retrieving metadata. - Parameter hierarchy includes a 5th level
uncontrolled detailed variable. - CATEGORY gt TOPIC gt TERM gt VARIABLE gt detailed
variable - Example
- EARTH SCIENCE gt Solid Earth gt Geochemistry gt
Chemical Weathering
42Keyword Process
- Keywords requiring modification can usually be
modified through database operations so that - all DIFs affected are modified at the same time.
- New keywords are simply added to the database and
to the list of controlled keywords available in
tools and interfaces. - Usually manual process to ensure existing DIFs
are indexed with the new keyword.
43Summary of Science Keyword Changes
- Added 54 new Variables and 4 new Terms
- Modified 39 Variable and 2 Terms
- Currently 1199 Variables in GCMD
- Modified Marine Geophysics and Bathymetry Terms
and Variables - Many keywords were not being used or could be
re-classified under better Terms - Modified Terrestrial Ecosystem Variables from
singular to plural (e.g., forest to forests) - Modified Marine Sediments Variables
- Suggested by C. Moore at NOAA/NGDC/MGG
- Change Term Solar-Terrestrial Interactions to
Sun-Earth Interactions - Sun-Earth was more recognizable Term
- More compatible with home page redesign - took up
less real estate in keyword hierarchy.
44Keywords Added
- Added Marine Biology, Marine Geochemistry, Marine
Tectonics, Marine Volcanism , and Sea Surface
Topography Terms and Variables to Oceans - Added Land Use/Land Cover Term and Variables to
Human Dimensions - Added Geomorphology Term and Variables to Solid
Earth - Added Natural Hazards Term and Variables to Human
Dimensions - Added Aquatic Habitat and Demersal Habitat
Variables to Biosphere - Added Forest Science/Conservation Variables to
Biosphere (Canada) - Added Snow Chemistry (NSIDC)
45Who Suggested Keyword Changes in 2001?
- GCMD Staff
- EOSDIS DAAC/DAAC Alliance data providers
- MSFC/GHRC
- NSIDC DAAC
- GSFC DAAC
- SEDAC
- ORNL DAAC
- ECS Science Office
- NOAA/NGDC (marine geophysics)
- Canada/CCRS (forest science)
- IODE (marine biology, oceans)
46Community Usage of GCMD Keywords
- CEOS Interoperability Protocol (CIP)
- uses Category gt Topic gt Term
- EOSDIS Data Gateway (EDG)
- uses Topic gt Term gt Variable
- EOSDIS Core System (ECS)
- uses all 5 levels , including detailed variable
- Other Communities using GCMD Keywords
- FGDC (although not required) many agencies using
FGDC metadata use GCMD keywords as theme
thesaurus - Canada and GeoConnections
- Mercury
- U. Cal. Natural Reserve System
- NOAA
- Semantic web
- NASAs Visible Earth (part of Earth Observatory)
- DODS
47Keyword Process
- ECS and EDG Notification Policy
- ECS and EDG are notified of GCMD-approved science
keyword changes prior to implementation - Process gives ECS and EDG time to notify science
and data teams as to potential software changes.
New keywords added to the GCMD are usually not a
problem. Modification of existing keywords is
more problematic.
48Authoring Tools
49Authoring Tools
- Current Authoring Tools include
- DIFbuilder
- DIFbuildlet
- ModDIFbuilder
- SERFbuilder
- ModSERFbuilder
- ESIP DIFbuilder
- JCADM DIFbuilder
- Usage of the Authoring Tools has increased from
outside partners (DAACs, GLOBEC and AMD)
50DIF Authoring Tools CY01 - Present
51SERF Authoring Tools CY01 - Present
52Monthly DIFBuilder Usage
53ModDIF Builder Monthly Usage
54DIFBuildlet Monthly Usage
55SERFBuilder Monthly Usage
56ModSERFBuilder Monthly Usage
57MD8 Status
- 3 tiers Client, Server, and Database
- Local Database Agents
58Status of MD8 Server
- Functional Since Late 2001
59MDServer
- Provides a mechanism for remote clients to
interoperate with GCMD using RMI protocol.
Application Programmers Interface (API)
Document API Create, modify, and remove Documents
Query API Retrieve documents based on entry identifier, object identifier,or query expression.
Valids API Insert, modify, remove, retrieve valids
Personnel API Insert, modify, remove, retrieve and merge personnel.
Incoming Queue API Insert, modify, retrieve Incoming Items.
60HTTP Protocol
- Provides a mechanism for remote clients to
interoperate with GCMD using HTTP protocol
Servlets
get_entry_ids.py Retrieve a set of entry identifiers given a specified query.
getdif.py Retrieve a DIF given its entry identifer or object identifier.
getdifs.py Retrieve a set of DIFs given a specified query.
get_valids.py Retrieve a set of valids given its type.
getdifs_by_personnel.py Retrieve a set of personnel given the first, middle, and last name.
61Improvements After Beta
- Further restructuring of Operations (OPS)
- Quality Control (QA GUI)
- Operations Help written in HTML.
- Better support for distributed loading
- Metrics
62Improvements After MD8 Active
- Load testing revealed
- Refinement slower than expected.
- Title display slower than expected.
- Robots accessing DIFs and each display in the
DIF. - Servlets never die.
- Tomcat 4.0 performed poorly under heavy load.
- Improvements
- Added another level of caching to the servlets.
- Improved/optimized some algorithms.
- Added META tag to DIFs.
- Files, streams, sockets, etc. MUST be closed.
- Reverted back to Tomcat 3.3.
63MD8 Database Extension Local Database Agent
64What is the Local Database Agent (LDA)
- A major component of MD8 that links Earth science
databases around the world - Captures content updates to the local database
using triggers and shares content - Peer-to-peer connectivity to other databases
- Minimal impact to the local DB activities
65Reasons for a Local Database Agent
- Enable distributed input from CEOS partners
- Easily share metadata information among nodes
- Facilitate and manage the metadata population
- Reduce maintenance related to data exchange among
nodes - Build a sense of community among the CEOS
partners by linking them together
66Data Sharing Network
GCMD
JCADM
UNEP
Announcer Scheduler
MD8
67Local Database Agent
Network
Local Database Agent
New Content
Schedule Table
Announcer
LDA Server
Scheduler
Trigger Table
Local DB
GCMD Node
68Data Ingest
69LDA Announcer/Scheduler
DIF sent from UNEP is loaded
Announcer
Incoming Queue Manager
Is item from a remote node? YES
70Evolution of LDA Architecture Options
- Distributed peer-to-peer with auto-commit All
nodes talk to all other nodes - Downside - Could get into a sticky
situation when multiple nodes are down at
different times. - - Initialisation and
synchronization are very complex. - - QA must be performed after
propagation to all nodes - GCMD centralized with auto-commit All updates
from other LDA's go to GCMD where they are
automatically committed. - Downside - GCMD is single point of
failure. - - If GCMD trumps, GCMD
will need to update all the nodes - - QA must be performed after
propagation to all nodes - GCMD centralized as QA maintainer All updates
from other LDA's go to GCMD where they are
validated, QA'd and then broadcasted to the
other nodes. - Downside - GCMD is a single point of
failure for propagating content between nodes - - Latency in DIF
propagation due to QA process.
71Advantages of the Final LDA Architecture
- Nodes can still update their own system while
validation is pending thus maintaining autonomy
of their local system. - Most closely models the current mode of GCMD
operation. - Cleanest method of quality control from a
code/design point of view. - It makes concurrency of updates easier to deal
with by limiting possible race conditions and
deadlock issues because GCMD can now manage these
issues.
72Evolution of the LDA Design
- Replaced Java serialized objects with XML
messages - Removed the requirement for an auto-commit
- Improved ease of synchronization
- Removed InstantDB and replaced with Oracle
- Improved componentization and modularity
- Improved network fault tolerance by threading the
Announcer - All remote database updates now propagate to GCMD
- Improved node registration process
73LDA Test Plan
- Create/Read/Update/Delete DIFs, SERFs, and Valids
- Boundary conditions
- Initialization
- Synchronization
- Load testing
- Load and delete the same DIF before scheduler
runs - Merge Personnel
- Simultaneous loading (2 cases Remote nodes and
local node) - Network/Server Failure Scenarios
74OPS-LDA Demo
- Update a DIF
- Load the DIF
- Watch it propagate to another node
- Load the same DIF on the remote node
75MD8 Software Waiting List
76MD8 Software Waiting List
- In order to prioritize the installation of MD8,
an MD8 Waiting List was created. - Sites were divided by MD8-Oracle/MD8-Isite and by
resources available at the site. - Current sites installing MD8-Oracle AADC (JCADM)
and UNEP Nairobi. - Current site installing MD8-Isite CONAE and
CNES.
77Wait List
- Priority IDN Nodes
- Antarctic Coordinating Node
- UNEP/GRID Budapest
- Asian Coordinating Node - NASDA
- ESA Coordinating Node
- Australian Cooperating Node - CSIRO
- NOAA Cooperating Node
- Dutch Cooperating Node NEOnet
- Argentinas Cooperating Node CONAE
78Wait List
- Priority IDN Nodes (continued)
- French Cooperating Node (CNES)
- Brazilian Cooperating Node (INPE)
- German Cooperating Node (DLR)
- Canadian Cooperating Node - no request
- San Diego Supercomputing Center ?
- Israels Cooperating Node
- Russian Cooperating Node (Space Research
Institute) - UNEP/GRID, Nairobi
79Wait List
- Others Interested
- AWI (Alfred Wegener Institute for Polar and
Marine Research Manfred Reinke), - Australian Institute of Marine Science (AIMS)
- Goddards Data Assimilation Office (DAO)
- The EPA (Ross Lunetta at Research Triangle).
- Korean Oceanographic Data Center
- World Data Center, Sydney (Michael Wang)
80IDN Collaborations
- Please give your report on your individual Node
status.
81Collaborations Operational Portals
82Federation of Earth Science Information Partners
(ESIP)
- GCMD population of ESIP products and services
- ESIP Type 1 1935 DIFs 41 SERFs
- ESIP Type 2 700 DIFs 8 SERFs
- ESIP Type 3 17 DIFs 6 SERFs
- Total 2652 DIFs 55
SERFs -
- of GCMD 24 18
- holdings
83GCMD Federation Interoperability
MD Server
LDA
Loader
Mercury
Local Database
XML DIFs
Mercury Extractor
DIF Peer
84DAAC Alliance Metrics
- At request of V. Griffin, GCMD became responsible
for tracking publicly-available DAAC products. - GCMD works closely with SPSO to track products.
- At end of FY01, there were 1,553 DAAC DIFs
representing 1,656 products. - As of 3/31/02, there were 1,573 DAAC DIFs
representing 1,676 products (accounting for
deletions and replacement DIFs).
85World Data Center Portal
- Request received from Dave Clark (NOAA/NGDC) to
create a World Data Center portal. - Portal prototype was quickly prepared and was
presented by Dave Clark at WDC Task team meeting
August 2001. - WDC-related DIFs need to be modified and new WDC
metadata created with assistance from WDC.
86DODS portalhttp//gcmd.gsfc.nasa.gov/Data/portals
/dods/
- The DODS portal is 1 of 12 portals created as a
virtual subset of GCMDs content. - There are a total of 4 ocean related portals
within GCMD DODS, GLOBEC, GOSIC (GOOS), and
RSMAS - Each ocean portal contains a subset of the 3648
ocean records held in the GCMD database based on
their project.
87DODS Portal Usage StatisticsGCMD Recently
provided a means of tracking statistics for the
DODS portal by enhancing the link used to
retrieve datasets
- Currently statistics show increasing usage of
the portal
- DODS provides a link to the DODS Portal from
their - website (http//www.unidata.ucar.edu/packages/dod
s/index.html)
88 DODS Portal
- DODS portal users can search for data via
the Keyword Search or the Free text search
http//gcmd.gsfc.nasa.gov/Data/portals/dods/freete
xt/ft_search.html
http//gcmd.gsfc.nasa.gov/Data/portals/dods/
89DODS Portal Search results
- Within the keyword search results page, an
abbreviated form of the URL_Content_Type follows
the title of each dataset in the DODS portal. -
http//gcmd.nasa.gov/Data/po
rtals/dods/ - The DODS portal uses a similar method to identify
the URLs within the DODS data set menu list to
ensure that users will be able to locate data set
content using either website.
http//www.unidata.ucar.edu/cgi-bin/dods/datasets/
datasets.cgi?xmlfilenamedatasets.xml
90DODS/GCMD Future
- Collaborate with DODS in their effort to create a
new client application that provides an interface
using the GCMD servlets. - Continue to populate the DODS portal with new
datasets. - Encourage the DODS community to write new
datasets descriptions through the use of a
specialized DIFBuilder tool. - Goddard DAAC recently expressed interest in
populating the DODS portal
91RSMAShttp//gcmd.nasa.gov/Data/portals/rsmas/
- University of Miamis Rosenstiel School of Marine
Atmospheric Science is currently working with
GCMD to create new records - Currently there are 25 records within this portal
- Future plans have been discussed with RSMAS to
incorporate all ocean related data sets within
the GCMD.
92GLOBEC Portal
- http//gcmd.gsfc.nasa.gov/Data/portals/globec/
GLOBEC Portal home page
GLOBEC Portal Enhanced Search page
93GLOBEC Portal
- GLOBEC data managers use the GCMD as a secure
way of preserving a record of the results and
achievements of the GLOBEC program - GLOBECs data policy includes the adoption of
the DIF format as the recommended format for all
data set descriptions. - Over 112 records have been loaded into the GLOBEC
portal (via DIFBuilder, metadata creation tool)
94GLOBEC Portal Usage Statistics
952002 MEDI Subcommittee Meeting Brief Summary
- Presentation to MEDI subcommittee by Monica
Holland included - An overview of Metadata Tools reviewed/frequently
used by GCMD - (GCMD Builder tools MATT, SMMS, and MEDI)
- GCMD Ocean keywords and Body of Water Location
keywords - Review of GCMD Contributions to MEDI since 1st
Meeting (Oostende, Belgium) - MEDI Software Tool Evaluation
- GCMD MD8 Portals (Ocean related portals)
- Presentation by Lola Olsen included
- Current Status of MD8
- ISO field requirements
- Future GCMD plans LDA, Zope, MD9
962002 MEDI Subcommittee Meeting Feedback
- Potential collaboration from MEDI subcommittee
member from KODC - Requested GCMD presentation(s) and introduction
slides about GCMD for a development meeting for
Korea Ocean Science Information Inventory System
(KOSI). - Received a request for ocean related valids from
Greg Reed (MEDI Chairman) - Source, Sensor, Project, Locations, Keywords, and
Data Centers - Received 3 updated NOAA Supplementals
- (D.CollinsNational Oceanographic Data Center,
MEDI subcommittee member) - Also noted during the meeting
- Decided to add new keyword to Oceans
- EARTH SCIENCE gt Oceans gt Agricultural
Aquatic Sciences gt Aquaculture - Noticed GCMD sensor valids should be reviewed
- Additional information about MEDI available
online http//ioc.unesco.org/iode/contents.php?id
24
97Global Observing System (G3OS) and CLIVAR
- Created portal for each component of G3OS
- Global Ocean Observing System (GOOS)
- Global Terrestrial Observing System (GTOS)
- Global Climate Observing System (GCOS)
- Free-text G3OS search has option to search across
all G3OS components through GOSIC portal. - CLIVAR portal created at request of K. Bouton in
anticipation of additional data sets.
98Portal Experiences
- Partners want customized keyword interfaces that
only show keywords relevant to their discipline. - Some partners have need for specialized map
projections such as polar or orthographic
projections. - Some partners have expressed need for an option
to search the entire GCMD database (e.g. a
toggle option was added for GLOBEC portal).
99BRD Collaboration
- Funded since 1996 to create metadata
- 1996-1999 - National Biological Information
Infrastructure (NBII) - 1999-present - Biological Data Profile
- Assist in sessions to train future metadata
creators such as at Smithsonian and NASA - Help scientists create their own metadata or do
it for them (Biological Resources Division,
National Park Service, The Nature Conservancy) - Put metadata into DIF format
100BRD Collaboration
- 16 US EPA Columbia River Basin Biota Database
Abstracts - 16 US EPA Columbia River Basin Sediment Database
Abstracts - 1 Prediction of Thistle Infected Areas in
Badlands National Park using a GIS model
- 30 Oregon District datasets from
- http//oregon.usgs.gov/pubs_dir/onlin
e_list.htm -
101BRD Collaboration
- 8 Databases including Global Invasive Species
Database - 18 Species 2000 databases
- 16 Expert Taxonomic Identification CDs
- 16 Food and Agriculture Organization databases
- 7 BRD datasets from archive of funded projects
- 1 Design and Implementation of Metadata for
Indian Fungi
102BRD Collaboration
- 1 Detroit River Candidate Sites for Habitat
Protection and Remediation
- 11 TOXNET Cluster of Databases
- 1 Amphibian Research and Monitoring Initiative
Lower Mississippi River Basin - 1 Multiscale Habitat Evaluation of Amphibians in
the Lower Mississippi River Alluvial Valley
103BRD Collaboration
- 16 Northern Prairie Wildlife Research Center
bibliographies and datasets from - http//www.npwrc.usgs.gov/resource/research.htm
- 15 Bor Forest Island fire ecology datasets
- 1 E.V. Komarek Fire Ecology Database
- 16 Patuxent Wildlife Research Center software
products
104IDNs Use of Zope for Communications
105What Is ZOPE?
- ZOPE is an open-source web application server
developed by Digital Creations. - Uses DTML Document Template Markup Language
(server side scripting language). example. - Instead of DTML ZOPE will use ZPT Zope Page
Template. example. - ZOPE objects include DTML documents, DTML
Methods, images, files, folders, page templates - ZOPE Products can be imported to enhance a
website.
106Zope Management Interface
- Interface used to customize Zope (browser).
- Interface can be used to add users, set site
permissions, import/export Zope objects from
other machine, add/modify/delete objects. - Objects being used for header and footer
consistency standard_html_header and
standard_html_footer (DTML), or (ZPT).
107Example of Content Management Interface
108Content Management Framework
- CMF is a Zope Product used to quickly create
websites portals. - CMF allows site to have
- User log-in, but anonymous user has site access.
- Allows site management so that submitted content
from user can be reviewed by manager. - Site customizable with skins. (demo in ZPT).
109Example of Website Using CMF
110Example of IDN Demo Site Using CMF
111Customization For Website
- Use log-in option? Allow anyone to register or
hide the Join link? - Layout and colors of IDN homepage and site? Leave
default layout? - Use on site?
- From CMF News, site search.
- Other Zope products portal forum (ZDiscussion
and ZDBase) polls and surveys (PMPSurvey)
sitemaps.
112IDN CMF Website Users
- Anonymous user view of site.
- Logged-in user view of site.
- Log-in and go to MyStuff and add a document.
- Add a News item.
- Change preferences to change view of site.
(skins). - Logged-in as content manager.
- Go to Folder contents and click on an objects
title to view the display status. Click on
Publish link if the object is in Private
status.
113Logged-in User
114Content Manager
115DEMO
116Example of DTML
Results
117Result of DTML
118Example of ZPT
119Result of ZPT
120MD9
121Balance New Development WithInitiating New Nodes
Into Distributed Network And Assuring Their
Proper Functioning
- -Installations of MD8 and LDAs at Nodes.
- -Test Functionality of LDAs.
- -Test Scalability of LDAs.
122Reasons for an MD9/10
- Too Many Data Set Descriptions? No way.
- Build in additional refinement criteria as
population increases to improve limit result set. - GCMD database now holds over 11,000 entries.
Reaching critical mass for effective searching. - Current refinement implementation is OK and is
widely used, but - Need better refinement criteria to
- Refine by Temporal Resolution
- Refine by Geospatial Resolution
- Refine by multiple keyword.
123Reasons for an MD9/10
- ISO 19115 (geospatial) and ISO 19119 (services)
Metadata - Core mandatory ISO 19115 metadata fields
mapped to existing DIF fields - ISO
MD9 DIF - Citation Title
Dataset_Citation Dataset Title - Citation Date Dataset_CitationDatase
t_Release_Date - Dataset language
Data_Set_Language - Dataset topic category N/A
- Abstract Summary (R)
- Metadata Contact Personnel DIF
Author - Metadata date stamp
DIF_Creation_Date
124Mandatory ISO fields that are optional in DIF
- Dataset Release Date
- Metadata Author
- Dataset Language
- Metadata Creation Date
125 DIF Fields not in ISO
- Publication
- Publication Place
- Publisher
- Sensor (Instrument)
- Source (Platform, like satellite)
- Minimum/Maximum Altitude and Depth
126 DIF Fields not in ISO
- Temporal Resolution
- Project (Campaign)
- Originating Center
- Data_Center_URL
127DIF fields not in ISO
- Multimedia Sample URL
- Multimedia Caption
- Related_URL
- IDN_Node
- DIF Revision History
- Future_DIF_Review_Date
128ISO 19119 - Services Metadata
- Not a ISO international standard - document still
in review (as DIS, Draft International Standard). - Many of the same required fields for ISO 19115.
- ISO
MD9 SERF - Service Type
Service_Citation Title - Service reference date
Service_citationRelease_date - Service language
Service_language - Provider Name Service Provider
- Service Contact personnel SERF Author,
Technical Contact - Distributed Computing Platforms Use Constraints
129Reasons for an MD9/10
- Population, Accuracy, and Currency
- DIFs/SERFs Improved authoring tools will lower
the barrier for creation by external users.
DIFs/SERFs/Supplementals. - Supplemental descriptions Needs update
capabilities within display. Widely used, but
needs attention in population, accuracy and
currency. - Nodes Need local/stand-alone customized tools
- Earth science links Access to links need
improvement add Thunderstone search improve
categories.
130Increase Population DOCbuilder
- Feature
Reasoning - Use object-oriented Code reuse.
architecture. - Rewrite current Perl Platform
independence and code in Java/Jython.
maintenance reduction. - Support XML, but make Extensibility and
easier for transparent to user.
information exchange
(transportability). - Create three versions Support
multiple environments - Stand-alone application, where
such a tool could be Web application, Java
applet used.
131Increase Population DOCbuilder
- Feature
Reasoning - Integrate with MD8 Code
reuse, added components (eg., validator).
functionality. - Support multiple document Code reuse,
flexibility. types (DIF, SERF, Supp),as well as
different look and feels (DIF, ISO, FGDC, etc.). - Allow for easy customization in Tight
integration with terms of look and feel.
Portals.
132Reasons for an MD9/10
- Moving from DTD to XML Schema
- Defines the legal building blocks of an XML
document. - Reasons for replacing DTD with XML Schema
- Written in XML, allowing the use of tools like
DOM and XSL - Extensible to future additions
- Supports more data types (comparable to those in
databases, programming languages) - Specifies occurrences and requirements more
precisely - Supports namespaces (can include gt1 schema in XML
doc) - Specifies the model of the document more closely
to the actual representation - DIF Schema already written however, not yet
implemented.
133Reasons for an MD9/10
- Improved Geographic Search
- Use SOAP offerings?
- Clients can make request for service.
- Use MEDIs SVG tool?
- The SVG tool is customized as a part of the MEDI
package that is compatible with the DIF metadata
format. - Use Polar Projection Search Applets?
- Modify existing code from Global Land Information
System (USGS/GLIS) to meet our requirements.
134MEDI Tool SVG Graph
Scalable Vector Graphics (SVG) is an XML-based
language for Web graphics from the World Wide Web
Consortium (W3C). Currently the SVG Adobe
plug-in 3.0 is only supported by Internet
Explorer (does not function correctly with
Netscape)
- Spatial Polygon types
- Box
- Polygon
- Line
- Circle
- Point
135Polar Projection Search Applets
- USGS gave permission to use code
- Downloaded tool
- Unable to install for effective use
- In contact with Tools developer but not
promising - Start from scratch?
136Reasons for an MD9/10
- Better Search Engines
- Are there better text search engines than Isite?
- Isite allows only simplified searching compared
to most Internet search engines. - Isite allows only AND, OR Boolean operations that
must be explicitly typed in the search box. No
advanced features are implemented. Refinements
are not possible. - Pros Isite is freely available. No license
problems with distribution as part of MD. Useful
for FGDC participation. Implements Z39.50
protocol.
137Reasons for an MD9/10Better Search Engines
- Google Search Appliance
- Hardware/software costly
- Compusult
- Commercial software. Z39.50 compliant used by
GeoConnections - Blue Angel
- Commercial software. Z39.50 compliant used by
Mercury - Many search tools are available
- http//www.searchtools.com/tools/tools.html
- XML text search engines http//www.searchtools.co
m/info/xml-resources.html - Z39.50 and metaserach engines http//www.searchto
ols.com/info/metasearch.html - Issues How would a COTS free-text search engine
affect our IDN partners? Are the above search
engines better than Isite?
138Google Search Appliance
- Same Effective Algorithm Used for Text Searches
- But, How Can It Be Distributed to the Nodes?
(Isite is Open Source) - Package Includes Software and Hardware and 2
Years Total Support - Hardware Installation May Present Security Issues
- Cost 20K (up to 40K)
139Reasons for an MD9/10
- Xpath is now a Standard.
- Xquery Embeds Xpath.
- Manages 2 levels down.
- Replace GCMDs Query Language With XPath
140Reasons for an MD9/10
- Re-evaluate Parent-Child Implementation
- Users would like to get back to parents from
children. - Free-text implementation needs to be improved.
141Take Country Out of Address
142Reasons for an MD9/10
- Explore better ways to combine free-text and
controlled keyword searches. - Currently, users can only search using free-text
or controlled keywords from the home page - not
both. - Users can combine free-text with a TOPIC search
(e.g. free-text and ATMOSPHERE) - but users
cannot combine or refine VARIABLE queries by
free-text.
143Reasons for an MD9
- Free-text enhancements
- Free-text searches cannot retrieve parent
DIF/SERF when a child DIF/SERF is found (the
Parent_DIF or Parent_SERF field is not linked). - Cannot navigate through DIF/SERF display returned
through free-text (as can be done in keyword
search.) - Fields within DIF/SERF (e.g., Parameters) are not
linked when retrieved through free-text like they
are in keyword search.
144Reasons for an MD9/10Direct Access to Data and
Resources
Part VIII, F, 1
- Web services - the programmatic interfaces made
available for application to application
communication. - Use SOAP (Simple Object Access Protocol) to
access Web services. - XML/HTTP-based protocol for accessing services,
objects and servers in a platform-independent
manner. - Allows clients to make requests to services.
- Libraries available for many programming
languages. - GCMD applications can work in conjunction with
Web services to gain additional functionality
(ex get a lat/long bounding box from a country
name). - GCMD can be a Web service in its own right.
145Overview
146Capitalize on Projects Winning External Funding
with Proposals Based on the GCMD
- Thesaurus Integration
- Semantic Web
147(No Transcript)
148GCMD Keywords and the Semantic Web
- GCMD keywords to be used as a basis for
developing an ontology for the Earth science
disciplines.
149SWEET Architecture
150Issues/Concerns
151Issues/Concerns
- Requesting Newsletter articles for next meeting
- Latest newsletter was sent to IDN April 19, 2002.
- Articles included
- UWG meeting
- Next CEOS meeting
- AADC Node status
- synchronization with Catalogue Interoperability
Protocol (CIP) - MD8 Operations Client
- proposed MD9 Write-A-DIF
- Add New Fields
- Metadata Limits