CEOS IDN Task Team - PowerPoint PPT Presentation

1 / 146
About This Presentation
Title:

CEOS IDN Task Team

Description:

Title: Collaborations: Operational Portals Author: R. Cordova Last modified by: R. Cordova Created Date: 5/14/2002 7:18:10 PM Document presentation format – PowerPoint PPT presentation

Number of Views:371
Avg rating:3.0/5.0
Slides: 147
Provided by: R321
Category:
Tags: ceos | idn | task | team | z3950

less

Transcript and Presenter's Notes

Title: CEOS IDN Task Team


1
CEOS IDN Task Team
  • May 9, 2002

2
IDN Agenda 9 May 2002
  • IDN Minutes from Darmstadt and IDN Profile are at
    http//idn.ceos.org
  • Data Policy
  • IDN Metrics (from GCMD node)
  • IDN Content History
  • Content Strategies
  • IDN Keywords
  • Authoring Tools
  • MD8 Status
  • Break

3
IDN Agenda 9 May 2002
  • MD8 Status continued
  • MD8 Software Waiting List
  • IDN Collaborations
  • Collaborations Operational Portals
  • IDNs Use of ZOPE for Communications
  • MD9 and ISO 19115
  • Lorant Czaran on ISO 19115
  • Issues/Concerns

4
Data Policy
5
Data Policy Issues
  • Global Change Research Policy Statements from the
    Executive Office of the President - OSTP in 1991
  • U.S. Global Change Research Program requires an
    early and continuing commitment to the
    establishment, maintenance, validation,
    description, accessibility and distribution of
    high-quality, long-term data sets.
  • Full and open sharing of the full suite of global
    data sets for all global change researchers is a
    fundamental objective.
  • Preservation of data needed for long-term global
    change research is required.
  • Data archives must include easily accessible
    information about the data holdings, including
    quality assessments, supporting ancillary
    information, and guidance and aids for locating
    and obtaining data.
  • National and international standards should be
    used to the greatest extent possible for media
    and for processing and communication of global
    data sets.
  • Data should be provided at the lowest possible
    cost to global change researchers in the interest
    of full and open access to data.
  • For those programs in which selected principal
    investigators have initial periods of exclusive
    data use, data should be made openly available as
    soon as they become widely useful.

6
Data Policy Issues
  • National Academy of Sciences (U.S.)
  • National Research Council
  • U.S. National Committee for CODATA
  • CODATA 2002 - Frontiers of Scientific and
    Technical Data (29 September - 3 October)
  • CGED Dr. Anne Linn
  • Dr. Bernard Minster, Chairman
  • Upcoming workshop on Carbon Cycle data.
  • International Policies

7
Combined NSC, DPC, NEC Climate Change Policy
Panel (Program Review)
Committee on Climate Change Science and
Technology Integration Chair Secretary of
Commerce, Vice-Chair, Secretary of
Energy Executive Director OSTP Director
Interagency Working Group on Climate Change
Science and Technology Chair Deputy/UnderSecretar
y of DOE, Vice Chair Deputy/UnderSecretary of
DOC Secretary OSTP AD for Climate Science
Technology
Climate Change Science Program Office Director
Commerce Detailee
Climate Change Technology Program Department of
Energy
8
IDN Metrics (from GCMD node)
9
Ten Years of DIFs May 1992-March 2002
10
New DIFs
11
DIFs by TOPIC
12
GCMD Population by NodeMarch 2002
13
Unique Hosts
14
2000-2001 Web Usage by Domain
15
Controlled Keyword Search
16
2000-2001 Parameter Searches
17
Total DIF Retrievals
18
Total DIF Retrievals
19
Redirects to Other Data
  • Top redirects from DIFs
  • 2001 2000
  • NASA data/web pages
    631 244
  • NOAA data/web pages 355 389
  • EOSDIS DAAC data/web pages 349 290
  • USGS data/web pages
    329 972
  • CDIAC data/web pages 40
    90
  • CCRS/CEONET/GeoConnections 34 n/a
  • International data/web pages 118 193
  • Other data/web pages (various) 466
    2185

20
Unique Hosts Since Jan 01
21
Web Page Hits Since Jan 01
22
Decline in Usage?
  • GCMD web usage has tended to be flat over the
    past year.
  • Prior to 9-11, usage was showing a 1.6 increase
    for the year.
  • Since 9-11, usage has declined. Overall GCMD
    usage has declined by 3 from the past year.
  • Numeric domains have increased by 18 over 2000,
    but .gov domains have declined by almost 47
    since 2000.
  • FGDC Clearinghouse changed filtering of Isite
    queries.
  • Is decline due to increased information available
    (information saturation), 9-11, decline in
    interest on climate issues, other factors?

23
Decline in Usage?
  • Web page hits have increased since Jan 2001,
    while unique hosts has decreased. Possible
    reasons
  • Domain contraction more users on fewer hosting
    domains. AOL has 13.58 of global ISP market
    more gov agencies using single domain (e.g.,
    usgs.gov).
  • More users behind firewalls.
  • More hits are by robots? (we block them from DIFs
    but not web pages).
  • Fewer users are making more hits.

24
Who Links to the GCMD?
  • GCMD is 1 on Google search for global change
  • Week of April 1, 2002
  • Top 10 sites that link to GCMD (from Google)
  • PODAAC
  • NSIDC
  • WWW Virtual Library Meteorology
  • AADC Metadata page
  • LBNL Energy Crossroads Climate Change Page
  • WHOI COFDL Laboratory
  • The Weather Pointers Page
  • NOAA/PFEL
  • Yahoo! Environment and Pollution (French)
  • NASADAACS page
  • Quaternary Web Resources (Colby College)

25
Who Links to the GCMD?
  • Googles top 10 sites that link to GCMD (pt 1)
  • (Week of April 15, 2002)
  • Google-ranked sites that are most often linked
    with links to GCMD
  • GES DAAC Direct Links to MODIS Data
  • http//acdisx.gsfc.nasa.gov/data/dataset/MODIS/nof
    rills.html
  • GES DAAC MODIS Overview
  • http//daac.gsfc.nasa.gov/MODIS/overview.shtml
  • BakerHughes Industry Links-Labs, Research, Gov
  • http//www.bakerhughes.com/bakerhughes/resources/l
    abs.htm
  • PCI Geomatics Industry Links
  • http//www.pcigeomatics.com/corpinfo/ind_links.htm
    l
  • VGL Data Links
  • http//www.umich.edu/vgl/booksdata/data.html
  • SeaWiFS Evaluation Products
  • http//daac.gsfc.nasa.gov/data/dataset/SEAWIFS/06_
    New_Products/

26
Who Links to the GCMD?
  • Googles top 10 sites that link to GCMD (pt 2)
  • DAAC Alliance ProductsServices Page
  • http//nasadaacs.eos.nasa.gov/data/path12.html
  • Harvard University Environment and Sustainable
    Development
  • http//www.cid.harvard.edu/esd/esdlinks/esdlinks.h
    tml
  • Blackwell Publishers - Geospatial Datasets
  • http//www.blackwellpublishers.co.uk/geog/data.asp
  • RPI/Rensselaer Research Libraries
  • http//www.lib.rpi.edu/dept/library/html/resources
    /subjects/science/earth.html
  • RSMAS Library Internet Resources
  • http//www.rsmas.miami.edu/support/lib/library_lin
    ks.html

27
IDN Content History
28
Content Strategies
29
Past Content Struggle
  • Non-existent to poor authoring tools.
  • Inadequate operations facility for interacting
    with the database.
  • Science Coordinators had to gather all data set
    info through intensive, laborious process.
  • Little interest or cooperation by data centers or
    data set producers.
  • Result Prior to 1994 - 3 DIFs/month/coordinator
    were written.

30
Present Content Strategy
  • Make improved authoring tools available.
  • Provide effective operations facility for
    validation/loading entries into database.
  • Provide capability to update on the spot.
  • Unsolicited entries now arriving - from data set
    producers, data center personnel, portal
    representatives, other international and
    interagency groups.
  • (Although still time-consuming to gather all
    information.)
  • Result 35.3 DIFs/coordinator/month (April 2001
    - March 2002)

31
Future Content Strategy
  • Make further improvements of authoring tools.
  • Further enhance Operations Client and QA facility
    for quality control and loading of entries.
  • Provide ownership of entries through portals and
    distributed nodes, and thus expect more
    contributions from partners.
  • Distribute final validation QA function beyond
    GCMD node - providing even more sense of
    ownership and responsibility.
  • Increase interest - sometimes initially by
    software developers and later by content
    providers.

32
Reasons for MD8 Operations Client A Client
Users Perspective
  • One person performed all database administration
    tasks
  • Increased interest by partners to write and share
    metadata
  • Clumsy text-based interface introduced errors and
    increased maintenance

33
How MD8 Has Changed Our Mode of Operation for
the Better
  • Database administration tasks shared by science
    coordinators.
  • Decreased time between submission of metadata and
    its entry into the database.
  • Allows users to perform tasks that previously
    required knowledge of command line Oracle SQL.
  • Eases the process of managing personnel and
    valids.
  • Graphical User Interface.

34
Operations (OPS) Loading Metadata Records
35
Operations (OPS) Extracting content
from the database
36
Content Strategy - Using the QA Ops
  • You gotta know when to hold em,
  • Know when to load em,
  • Know when to walk away,
  • Know when to run.
  • You never count your DIFs when theyre only in
    the table...
  • Therell be time enough for countin when the
    loadins done.
  • You gotta know when to bold em,
  • Know when to fold em, ...

37
IDN Keywords
38
Data Center Bucket Revision
  • Original list created for HCIL Interface.
  • Buckets not adequate for Science Keyword
    Interface.
  • Overlapping Buckets
  • Minimal Quality Control of Original Buckets
  • Science coordinators created new bucket list.
  • Staff is in the process of matching each Data
    Center valid to a new bucket.

39
Data Center Bucket Revision
  • Old Buckets
  • Commercial
  • DOC
  • DOD
  • DOE
  • DOI
  • EPA
  • Federal Agencies
  • Institutions
  • International
  • International Agencies
  • NASA
  • NOAA
  • Non-Profit Organizations
  • NSF Regional Agencies
  • Universities
  • USDA USGS
  • World Data Centers
  • NEW Buckets
  • Academic
  • Commercial
  • Consortia/Institutions
  • Multinational
  • Non-Government Agencies
  • Non-US Government
  • US Federal Agencies
  • US State and Regional Agencies

40
Data Center Bucket Revision
  • US Federal Agencies
  • DOC NASA
  • DOD NSF
  • DOE USDA
  • DOI USGS
  • EPA
  • DOT

41
Keyword Changes
  • Guiding Principles Follow the Rules!
  • Earth science parameters are a 4-tier controlled
    vocabulary for indexing and retrieving metadata.
  • Parameter hierarchy includes a 5th level
    uncontrolled detailed variable.
  • CATEGORY gt TOPIC gt TERM gt VARIABLE gt detailed
    variable
  • Example
  • EARTH SCIENCE gt Solid Earth gt Geochemistry gt
    Chemical Weathering

42
Keyword Process
  • Keywords requiring modification can usually be
    modified through database operations so that
  • all DIFs affected are modified at the same time.
  • New keywords are simply added to the database and
    to the list of controlled keywords available in
    tools and interfaces.
  • Usually manual process to ensure existing DIFs
    are indexed with the new keyword.

43
Summary of Science Keyword Changes
  • Added 54 new Variables and 4 new Terms
  • Modified 39 Variable and 2 Terms
  • Currently 1199 Variables in GCMD
  • Modified Marine Geophysics and Bathymetry Terms
    and Variables
  • Many keywords were not being used or could be
    re-classified under better Terms
  • Modified Terrestrial Ecosystem Variables from
    singular to plural (e.g., forest to forests)
  • Modified Marine Sediments Variables
  • Suggested by C. Moore at NOAA/NGDC/MGG
  • Change Term Solar-Terrestrial Interactions to
    Sun-Earth Interactions
  • Sun-Earth was more recognizable Term
  • More compatible with home page redesign - took up
    less real estate in keyword hierarchy.

44
Keywords Added
  • Added Marine Biology, Marine Geochemistry, Marine
    Tectonics, Marine Volcanism , and Sea Surface
    Topography Terms and Variables to Oceans
  • Added Land Use/Land Cover Term and Variables to
    Human Dimensions
  • Added Geomorphology Term and Variables to Solid
    Earth
  • Added Natural Hazards Term and Variables to Human
    Dimensions
  • Added Aquatic Habitat and Demersal Habitat
    Variables to Biosphere
  • Added Forest Science/Conservation Variables to
    Biosphere (Canada)
  • Added Snow Chemistry (NSIDC)

45
Who Suggested Keyword Changes in 2001?
  • GCMD Staff
  • EOSDIS DAAC/DAAC Alliance data providers
  • MSFC/GHRC
  • NSIDC DAAC
  • GSFC DAAC
  • SEDAC
  • ORNL DAAC
  • ECS Science Office
  • NOAA/NGDC (marine geophysics)
  • Canada/CCRS (forest science)
  • IODE (marine biology, oceans)

46
Community Usage of GCMD Keywords
  • CEOS Interoperability Protocol (CIP)
  • uses Category gt Topic gt Term
  • EOSDIS Data Gateway (EDG)
  • uses Topic gt Term gt Variable
  • EOSDIS Core System (ECS)
  • uses all 5 levels , including detailed variable
  • Other Communities using GCMD Keywords
  • FGDC (although not required) many agencies using
    FGDC metadata use GCMD keywords as theme
    thesaurus
  • Canada and GeoConnections
  • Mercury
  • U. Cal. Natural Reserve System
  • NOAA
  • Semantic web
  • NASAs Visible Earth (part of Earth Observatory)
  • DODS

47
Keyword Process
  • ECS and EDG Notification Policy
  • ECS and EDG are notified of GCMD-approved science
    keyword changes prior to implementation
  • Process gives ECS and EDG time to notify science
    and data teams as to potential software changes.
    New keywords added to the GCMD are usually not a
    problem. Modification of existing keywords is
    more problematic.

48
Authoring Tools
49
Authoring Tools
  • Current Authoring Tools include
  • DIFbuilder
  • DIFbuildlet
  • ModDIFbuilder
  • SERFbuilder
  • ModSERFbuilder
  • ESIP DIFbuilder
  • JCADM DIFbuilder
  • Usage of the Authoring Tools has increased from
    outside partners (DAACs, GLOBEC and AMD)

50
DIF Authoring Tools CY01 - Present
51
SERF Authoring Tools CY01 - Present
52
Monthly DIFBuilder Usage
53
ModDIF Builder Monthly Usage
54
DIFBuildlet Monthly Usage
55
SERFBuilder Monthly Usage
56
ModSERFBuilder Monthly Usage
57
MD8 Status
  • 3 tiers Client, Server, and Database
  • Local Database Agents

58
Status of MD8 Server
  • Functional Since Late 2001

59
MDServer
  • Provides a mechanism for remote clients to
    interoperate with GCMD using RMI protocol.

Application Programmers Interface (API)
Document API Create, modify, and remove Documents
Query API Retrieve documents based on entry identifier, object identifier,or query expression.
Valids API Insert, modify, remove, retrieve valids
Personnel API Insert, modify, remove, retrieve and merge personnel.
Incoming Queue API Insert, modify, retrieve Incoming Items.
60
HTTP Protocol
  • Provides a mechanism for remote clients to
    interoperate with GCMD using HTTP protocol

Servlets
get_entry_ids.py Retrieve a set of entry identifiers given a specified query.
getdif.py Retrieve a DIF given its entry identifer or object identifier.
getdifs.py Retrieve a set of DIFs given a specified query.
get_valids.py Retrieve a set of valids given its type.
getdifs_by_personnel.py Retrieve a set of personnel given the first, middle, and last name.
61
Improvements After Beta
  • Further restructuring of Operations (OPS)
  • Quality Control (QA GUI)
  • Operations Help written in HTML.
  • Better support for distributed loading
  • Metrics

62
Improvements After MD8 Active
  • Load testing revealed
  • Refinement slower than expected.
  • Title display slower than expected.
  • Robots accessing DIFs and each display in the
    DIF.
  • Servlets never die.
  • Tomcat 4.0 performed poorly under heavy load.
  • Improvements
  • Added another level of caching to the servlets.
  • Improved/optimized some algorithms.
  • Added META tag to DIFs.
  • Files, streams, sockets, etc. MUST be closed.
  • Reverted back to Tomcat 3.3.

63
MD8 Database Extension Local Database Agent
64
What is the Local Database Agent (LDA)
  • A major component of MD8 that links Earth science
    databases around the world
  • Captures content updates to the local database
    using triggers and shares content
  • Peer-to-peer connectivity to other databases
  • Minimal impact to the local DB activities

65
Reasons for a Local Database Agent
  • Enable distributed input from CEOS partners
  • Easily share metadata information among nodes
  • Facilitate and manage the metadata population
  • Reduce maintenance related to data exchange among
    nodes
  • Build a sense of community among the CEOS
    partners by linking them together

66
Data Sharing Network
GCMD
JCADM
UNEP
Announcer Scheduler
MD8
67
Local Database Agent
Network
Local Database Agent
New Content
Schedule Table
Announcer
LDA Server
Scheduler
Trigger Table
Local DB
GCMD Node
68
Data Ingest
69
LDA Announcer/Scheduler
DIF sent from UNEP is loaded
Announcer
Incoming Queue Manager
Is item from a remote node? YES
70
Evolution of LDA Architecture Options
  • Distributed peer-to-peer with auto-commit All
    nodes talk to all other nodes
  • Downside - Could get into a sticky
    situation when multiple nodes are down at
    different times.
  • - Initialisation and
    synchronization are very complex.
  • - QA must be performed after
    propagation to all nodes
  • GCMD centralized with auto-commit All updates
    from other LDA's go to GCMD where they are
    automatically committed.
  • Downside - GCMD is single point of
    failure.
  • - If GCMD trumps, GCMD
    will need to update all the nodes
  • - QA must be performed after
    propagation to all nodes
  • GCMD centralized as QA maintainer All updates
    from other LDA's go to GCMD where they are
    validated, QA'd and then broadcasted to the
    other nodes.
  • Downside - GCMD is a single point of
    failure for propagating content between nodes
  • - Latency in DIF
    propagation due to QA process.

71
Advantages of the Final LDA Architecture
  • Nodes can still update their own system while
    validation is pending thus maintaining autonomy
    of their local system.
  • Most closely models the current mode of GCMD
    operation.
  • Cleanest method of quality control from a
    code/design point of view.
  • It makes concurrency of updates easier to deal
    with by limiting possible race conditions and
    deadlock issues because GCMD can now manage these
    issues.

72
Evolution of the LDA Design
  • Replaced Java serialized objects with XML
    messages
  • Removed the requirement for an auto-commit
  • Improved ease of synchronization
  • Removed InstantDB and replaced with Oracle
  • Improved componentization and modularity
  • Improved network fault tolerance by threading the
    Announcer
  • All remote database updates now propagate to GCMD
  • Improved node registration process

73
LDA Test Plan
  • Create/Read/Update/Delete DIFs, SERFs, and Valids
  • Boundary conditions
  • Initialization
  • Synchronization
  • Load testing
  • Load and delete the same DIF before scheduler
    runs
  • Merge Personnel
  • Simultaneous loading (2 cases Remote nodes and
    local node)
  • Network/Server Failure Scenarios

74
OPS-LDA Demo
  • Update a DIF
  • Load the DIF
  • Watch it propagate to another node
  • Load the same DIF on the remote node

75
MD8 Software Waiting List
76
MD8 Software Waiting List
  • In order to prioritize the installation of MD8,
    an MD8 Waiting List was created.
  • Sites were divided by MD8-Oracle/MD8-Isite and by
    resources available at the site.
  • Current sites installing MD8-Oracle AADC (JCADM)
    and UNEP Nairobi.
  • Current site installing MD8-Isite CONAE and
    CNES.

77
Wait List
  • Priority IDN Nodes
  • Antarctic Coordinating Node
  • UNEP/GRID Budapest
  • Asian Coordinating Node - NASDA
  • ESA Coordinating Node
  • Australian Cooperating Node - CSIRO
  • NOAA Cooperating Node
  • Dutch Cooperating Node NEOnet
  • Argentinas Cooperating Node CONAE

78
Wait List
  • Priority IDN Nodes (continued)
  • French Cooperating Node (CNES)
  • Brazilian Cooperating Node (INPE)
  • German Cooperating Node (DLR)
  • Canadian Cooperating Node - no request
  • San Diego Supercomputing Center ?
  • Israels Cooperating Node
  • Russian Cooperating Node (Space Research
    Institute)
  • UNEP/GRID, Nairobi

79
Wait List
  • Others Interested
  • AWI (Alfred Wegener Institute for Polar and
    Marine Research Manfred Reinke),
  • Australian Institute of Marine Science (AIMS)
  • Goddards Data Assimilation Office (DAO)
  • The EPA (Ross Lunetta at Research Triangle).
  • Korean Oceanographic Data Center
  • World Data Center, Sydney (Michael Wang)

80
IDN Collaborations
  • Please give your report on your individual Node
    status.

81
Collaborations Operational Portals
82
Federation of Earth Science Information Partners
(ESIP)
  • GCMD population of ESIP products and services
  • ESIP Type 1 1935 DIFs 41 SERFs
  • ESIP Type 2 700 DIFs 8 SERFs
  • ESIP Type 3 17 DIFs 6 SERFs
  • Total 2652 DIFs 55
    SERFs
  • of GCMD 24 18
  • holdings

83
GCMD Federation Interoperability
MD Server
LDA
Loader
Mercury
Local Database
XML DIFs
Mercury Extractor
DIF Peer
84
DAAC Alliance Metrics
  • At request of V. Griffin, GCMD became responsible
    for tracking publicly-available DAAC products.
  • GCMD works closely with SPSO to track products.
  • At end of FY01, there were 1,553 DAAC DIFs
    representing 1,656 products.
  • As of 3/31/02, there were 1,573 DAAC DIFs
    representing 1,676 products (accounting for
    deletions and replacement DIFs).

85
World Data Center Portal
  • Request received from Dave Clark (NOAA/NGDC) to
    create a World Data Center portal.
  • Portal prototype was quickly prepared and was
    presented by Dave Clark at WDC Task team meeting
    August 2001.
  • WDC-related DIFs need to be modified and new WDC
    metadata created with assistance from WDC.

86
DODS portalhttp//gcmd.gsfc.nasa.gov/Data/portals
/dods/
  • The DODS portal is 1 of 12 portals created as a
    virtual subset of GCMDs content.
  • There are a total of 4 ocean related portals
    within GCMD DODS, GLOBEC, GOSIC (GOOS), and
    RSMAS
  • Each ocean portal contains a subset of the 3648
    ocean records held in the GCMD database based on
    their project.

87
DODS Portal Usage StatisticsGCMD Recently
provided a means of tracking statistics for the
DODS portal by enhancing the link used to
retrieve datasets
  • Currently statistics show increasing usage of
    the portal
  • DODS provides a link to the DODS Portal from
    their
  • website (http//www.unidata.ucar.edu/packages/dod
    s/index.html)

88
DODS Portal
  • DODS portal users can search for data via
    the Keyword Search or the Free text search

http//gcmd.gsfc.nasa.gov/Data/portals/dods/freete
xt/ft_search.html
http//gcmd.gsfc.nasa.gov/Data/portals/dods/
89
DODS Portal Search results
  • Within the keyword search results page, an
    abbreviated form of the URL_Content_Type follows
    the title of each dataset in the DODS portal.

  • http//gcmd.nasa.gov/Data/po
    rtals/dods/
  • The DODS portal uses a similar method to identify
    the URLs within the DODS data set menu list to
    ensure that users will be able to locate data set
    content using either website.

http//www.unidata.ucar.edu/cgi-bin/dods/datasets/
datasets.cgi?xmlfilenamedatasets.xml
90
DODS/GCMD Future
  • Collaborate with DODS in their effort to create a
    new client application that provides an interface
    using the GCMD servlets.
  • Continue to populate the DODS portal with new
    datasets.
  • Encourage the DODS community to write new
    datasets descriptions through the use of a
    specialized DIFBuilder tool.
  • Goddard DAAC recently expressed interest in
    populating the DODS portal

91
RSMAShttp//gcmd.nasa.gov/Data/portals/rsmas/
  • University of Miamis Rosenstiel School of Marine
    Atmospheric Science is currently working with
    GCMD to create new records
  • Currently there are 25 records within this portal
  • Future plans have been discussed with RSMAS to
    incorporate all ocean related data sets within
    the GCMD.

92
GLOBEC Portal
  • http//gcmd.gsfc.nasa.gov/Data/portals/globec/

GLOBEC Portal home page
GLOBEC Portal Enhanced Search page
93
GLOBEC Portal
  • GLOBEC data managers use the GCMD as a secure
    way of preserving a record of the results and
    achievements of the GLOBEC program
  • GLOBECs data policy includes the adoption of
    the DIF format as the recommended format for all
    data set descriptions.
  • Over 112 records have been loaded into the GLOBEC
    portal (via DIFBuilder, metadata creation tool)

94
GLOBEC Portal Usage Statistics
95
2002 MEDI Subcommittee Meeting Brief Summary
  • Presentation to MEDI subcommittee by Monica
    Holland included
  • An overview of Metadata Tools reviewed/frequently
    used by GCMD
  • (GCMD Builder tools MATT, SMMS, and MEDI)
  • GCMD Ocean keywords and Body of Water Location
    keywords
  • Review of GCMD Contributions to MEDI since 1st
    Meeting (Oostende, Belgium)
  • MEDI Software Tool Evaluation
  • GCMD MD8 Portals (Ocean related portals)
  • Presentation by Lola Olsen included
  • Current Status of MD8
  • ISO field requirements
  • Future GCMD plans LDA, Zope, MD9

96
2002 MEDI Subcommittee Meeting Feedback
  • Potential collaboration from MEDI subcommittee
    member from KODC
  • Requested GCMD presentation(s) and introduction
    slides about GCMD for a development meeting for
    Korea Ocean Science Information Inventory System
    (KOSI).
  • Received a request for ocean related valids from
    Greg Reed (MEDI Chairman)
  • Source, Sensor, Project, Locations, Keywords, and
    Data Centers
  • Received 3 updated NOAA Supplementals
  • (D.CollinsNational Oceanographic Data Center,
    MEDI subcommittee member)
  • Also noted during the meeting
  • Decided to add new keyword to Oceans
  • EARTH SCIENCE gt Oceans gt Agricultural
    Aquatic Sciences gt Aquaculture
  • Noticed GCMD sensor valids should be reviewed
  • Additional information about MEDI available
    online http//ioc.unesco.org/iode/contents.php?id
    24

97
Global Observing System (G3OS) and CLIVAR
  • Created portal for each component of G3OS
  • Global Ocean Observing System (GOOS)
  • Global Terrestrial Observing System (GTOS)
  • Global Climate Observing System (GCOS)
  • Free-text G3OS search has option to search across
    all G3OS components through GOSIC portal.
  • CLIVAR portal created at request of K. Bouton in
    anticipation of additional data sets.

98
Portal Experiences
  • Partners want customized keyword interfaces that
    only show keywords relevant to their discipline.
  • Some partners have need for specialized map
    projections such as polar or orthographic
    projections.
  • Some partners have expressed need for an option
    to search the entire GCMD database (e.g. a
    toggle option was added for GLOBEC portal).

99
BRD Collaboration
  • Funded since 1996 to create metadata
  • 1996-1999 - National Biological Information
    Infrastructure (NBII)
  • 1999-present - Biological Data Profile
  • Assist in sessions to train future metadata
    creators such as at Smithsonian and NASA
  • Help scientists create their own metadata or do
    it for them (Biological Resources Division,
    National Park Service, The Nature Conservancy)
  • Put metadata into DIF format

100
BRD Collaboration
  • 16 US EPA Columbia River Basin Biota Database
    Abstracts
  • 16 US EPA Columbia River Basin Sediment Database
    Abstracts
  • 1 Prediction of Thistle Infected Areas in
    Badlands National Park using a GIS model
  • 30 Oregon District datasets from
  • http//oregon.usgs.gov/pubs_dir/onlin
    e_list.htm



101
BRD Collaboration
  • 8 Databases including Global Invasive Species
    Database
  • 18 Species 2000 databases
  • 16 Expert Taxonomic Identification CDs
  • 16 Food and Agriculture Organization databases
  • 7 BRD datasets from archive of funded projects
  • 1 Design and Implementation of Metadata for
    Indian Fungi

102
BRD Collaboration
  • 1 Detroit River Candidate Sites for Habitat
    Protection and Remediation
  • 11 TOXNET Cluster of Databases
  • 1 Amphibian Research and Monitoring Initiative
    Lower Mississippi River Basin
  • 1 Multiscale Habitat Evaluation of Amphibians in
    the Lower Mississippi River Alluvial Valley

103
BRD Collaboration
  • 16 Northern Prairie Wildlife Research Center
    bibliographies and datasets from
  • http//www.npwrc.usgs.gov/resource/research.htm
  • 15 Bor Forest Island fire ecology datasets
  • 1 E.V. Komarek Fire Ecology Database
  • 16 Patuxent Wildlife Research Center software
    products

104
IDNs Use of Zope for Communications
105
What Is ZOPE?
  • ZOPE is an open-source web application server
    developed by Digital Creations.
  • Uses DTML Document Template Markup Language
    (server side scripting language). example.
  • Instead of DTML ZOPE will use ZPT Zope Page
    Template. example.
  • ZOPE objects include DTML documents, DTML
    Methods, images, files, folders, page templates
  • ZOPE Products can be imported to enhance a
    website.

106
Zope Management Interface
  • Interface used to customize Zope (browser).
  • Interface can be used to add users, set site
    permissions, import/export Zope objects from
    other machine, add/modify/delete objects.
  • Objects being used for header and footer
    consistency standard_html_header and
    standard_html_footer (DTML), or (ZPT).

107
Example of Content Management Interface
108
Content Management Framework
  • CMF is a Zope Product used to quickly create
    websites portals.
  • CMF allows site to have
  • User log-in, but anonymous user has site access.
  • Allows site management so that submitted content
    from user can be reviewed by manager.
  • Site customizable with skins. (demo in ZPT).

109
Example of Website Using CMF
110
Example of IDN Demo Site Using CMF
111
Customization For Website
  • Use log-in option? Allow anyone to register or
    hide the Join link?
  • Layout and colors of IDN homepage and site? Leave
    default layout?
  • Use on site?
  • From CMF News, site search.
  • Other Zope products portal forum (ZDiscussion
    and ZDBase) polls and surveys (PMPSurvey)
    sitemaps.

112
IDN CMF Website Users
  • Anonymous user view of site.
  • Logged-in user view of site.
  • Log-in and go to MyStuff and add a document.
  • Add a News item.
  • Change preferences to change view of site.
    (skins).
  • Logged-in as content manager.
  • Go to Folder contents and click on an objects
    title to view the display status. Click on
    Publish link if the object is in Private
    status.

113
Logged-in User
114
Content Manager
115
DEMO
116
Example of DTML
Results
117
Result of DTML
118
Example of ZPT
119
Result of ZPT
120
MD9
  • Future Challenges

121
Balance New Development WithInitiating New Nodes
Into Distributed Network And Assuring Their
Proper Functioning
  • -Installations of MD8 and LDAs at Nodes.
  • -Test Functionality of LDAs.
  • -Test Scalability of LDAs.

122
Reasons for an MD9/10
  • Too Many Data Set Descriptions? No way.
  • Build in additional refinement criteria as
    population increases to improve limit result set.
  • GCMD database now holds over 11,000 entries.
    Reaching critical mass for effective searching.
  • Current refinement implementation is OK and is
    widely used, but
  • Need better refinement criteria to
  • Refine by Temporal Resolution
  • Refine by Geospatial Resolution
  • Refine by multiple keyword.

123
Reasons for an MD9/10
  • ISO 19115 (geospatial) and ISO 19119 (services)
    Metadata
  • Core mandatory ISO 19115 metadata fields
    mapped to existing DIF fields
  • ISO
    MD9 DIF
  • Citation Title
    Dataset_Citation Dataset Title
  • Citation Date Dataset_CitationDatase
    t_Release_Date
  • Dataset language
    Data_Set_Language
  • Dataset topic category N/A
  • Abstract Summary (R)
  • Metadata Contact Personnel DIF
    Author
  • Metadata date stamp
    DIF_Creation_Date

124
Mandatory ISO fields that are optional in DIF
  • Dataset Release Date
  • Metadata Author
  • Dataset Language
  • Metadata Creation Date

125
DIF Fields not in ISO
  • Publication
  • Publication Place
  • Publisher
  • Sensor (Instrument)
  • Source (Platform, like satellite)
  • Minimum/Maximum Altitude and Depth

126
DIF Fields not in ISO
  • Temporal Resolution
  • Project (Campaign)
  • Originating Center
  • Data_Center_URL

127
DIF fields not in ISO
  • Multimedia Sample URL
  • Multimedia Caption
  • Related_URL
  • IDN_Node
  • DIF Revision History
  • Future_DIF_Review_Date

128
ISO 19119 - Services Metadata
  • Not a ISO international standard - document still
    in review (as DIS, Draft International Standard).
  • Many of the same required fields for ISO 19115.
  • ISO
    MD9 SERF
  • Service Type
    Service_Citation Title
  • Service reference date
    Service_citationRelease_date
  • Service language
    Service_language
  • Provider Name Service Provider
  • Service Contact personnel SERF Author,
    Technical Contact
  • Distributed Computing Platforms Use Constraints

129
Reasons for an MD9/10
  • Population, Accuracy, and Currency
  • DIFs/SERFs Improved authoring tools will lower
    the barrier for creation by external users.
    DIFs/SERFs/Supplementals.
  • Supplemental descriptions Needs update
    capabilities within display. Widely used, but
    needs attention in population, accuracy and
    currency.
  • Nodes Need local/stand-alone customized tools
  • Earth science links Access to links need
    improvement add Thunderstone search improve
    categories.

130
Increase Population DOCbuilder
  • Feature
    Reasoning
  • Use object-oriented Code reuse.
    architecture.
  • Rewrite current Perl Platform
    independence and code in Java/Jython.
    maintenance reduction.
  • Support XML, but make Extensibility and
    easier for transparent to user.
    information exchange
    (transportability).
  • Create three versions Support
    multiple environments
  • Stand-alone application, where
    such a tool could be Web application, Java
    applet used.

131
Increase Population DOCbuilder
  • Feature
    Reasoning
  • Integrate with MD8 Code
    reuse, added components (eg., validator).
    functionality.
  • Support multiple document Code reuse,
    flexibility. types (DIF, SERF, Supp),as well as
    different look and feels (DIF, ISO, FGDC, etc.).
  • Allow for easy customization in Tight
    integration with terms of look and feel.
    Portals.

132
Reasons for an MD9/10
  • Moving from DTD to XML Schema
  • Defines the legal building blocks of an XML
    document.
  • Reasons for replacing DTD with XML Schema
  • Written in XML, allowing the use of tools like
    DOM and XSL
  • Extensible to future additions
  • Supports more data types (comparable to those in
    databases, programming languages)
  • Specifies occurrences and requirements more
    precisely
  • Supports namespaces (can include gt1 schema in XML
    doc)
  • Specifies the model of the document more closely
    to the actual representation
  • DIF Schema already written however, not yet
    implemented.

133
Reasons for an MD9/10
  • Improved Geographic Search
  • Use SOAP offerings?
  • Clients can make request for service.
  • Use MEDIs SVG tool?
  • The SVG tool is customized as a part of the MEDI
    package that is compatible with the DIF metadata
    format.
  • Use Polar Projection Search Applets?
  • Modify existing code from Global Land Information
    System (USGS/GLIS) to meet our requirements.

134
MEDI Tool SVG Graph
Scalable Vector Graphics (SVG) is an XML-based
language for Web graphics from the World Wide Web
Consortium (W3C). Currently the SVG Adobe
plug-in 3.0 is only supported by Internet
Explorer (does not function correctly with
Netscape)
  • Spatial Polygon types
  • Box
  • Polygon
  • Line
  • Circle
  • Point

135
Polar Projection Search Applets
  • USGS gave permission to use code
  • Downloaded tool
  • Unable to install for effective use
  • In contact with Tools developer but not
    promising
  • Start from scratch?

136
Reasons for an MD9/10
  • Better Search Engines
  • Are there better text search engines than Isite?
  • Isite allows only simplified searching compared
    to most Internet search engines.
  • Isite allows only AND, OR Boolean operations that
    must be explicitly typed in the search box. No
    advanced features are implemented. Refinements
    are not possible.
  • Pros Isite is freely available. No license
    problems with distribution as part of MD. Useful
    for FGDC participation. Implements Z39.50
    protocol.

137
Reasons for an MD9/10Better Search Engines
  • Google Search Appliance
  • Hardware/software costly
  • Compusult
  • Commercial software. Z39.50 compliant used by
    GeoConnections
  • Blue Angel
  • Commercial software. Z39.50 compliant used by
    Mercury
  • Many search tools are available
  • http//www.searchtools.com/tools/tools.html
  • XML text search engines http//www.searchtools.co
    m/info/xml-resources.html
  • Z39.50 and metaserach engines http//www.searchto
    ols.com/info/metasearch.html
  • Issues How would a COTS free-text search engine
    affect our IDN partners? Are the above search
    engines better than Isite?

138
Google Search Appliance
  • Same Effective Algorithm Used for Text Searches
  • But, How Can It Be Distributed to the Nodes?
    (Isite is Open Source)
  • Package Includes Software and Hardware and 2
    Years Total Support
  • Hardware Installation May Present Security Issues
  • Cost 20K (up to 40K)

139
Reasons for an MD9/10
  • Xpath is now a Standard.
  • Xquery Embeds Xpath.
  • Manages 2 levels down.
  • Replace GCMDs Query Language With XPath

140
Reasons for an MD9/10
  • Re-evaluate Parent-Child Implementation
  • Users would like to get back to parents from
    children.
  • Free-text implementation needs to be improved.

141
Take Country Out of Address
142
Reasons for an MD9/10
  • Explore better ways to combine free-text and
    controlled keyword searches.
  • Currently, users can only search using free-text
    or controlled keywords from the home page - not
    both.
  • Users can combine free-text with a TOPIC search
    (e.g. free-text and ATMOSPHERE) - but users
    cannot combine or refine VARIABLE queries by
    free-text.

143
Reasons for an MD9
  • Free-text enhancements
  • Free-text searches cannot retrieve parent
    DIF/SERF when a child DIF/SERF is found (the
    Parent_DIF or Parent_SERF field is not linked).
  • Cannot navigate through DIF/SERF display returned
    through free-text (as can be done in keyword
    search.)
  • Fields within DIF/SERF (e.g., Parameters) are not
    linked when retrieved through free-text like they
    are in keyword search.

144
Reasons for an MD9/10Direct Access to Data and
Resources
Part VIII, F, 1
  • Web services - the programmatic interfaces made
    available for application to application
    communication.
  • Use SOAP (Simple Object Access Protocol) to
    access Web services.
  • XML/HTTP-based protocol for accessing services,
    objects and servers in a platform-independent
    manner.
  • Allows clients to make requests to services.
  • Libraries available for many programming
    languages.
  • GCMD applications can work in conjunction with
    Web services to gain additional functionality
    (ex get a lat/long bounding box from a country
    name).
  • GCMD can be a Web service in its own right.

145
Overview
146
Capitalize on Projects Winning External Funding
with Proposals Based on the GCMD
  • Thesaurus Integration
  • Semantic Web

147
(No Transcript)
148
GCMD Keywords and the Semantic Web
  • GCMD keywords to be used as a basis for
    developing an ontology for the Earth science
    disciplines.

149
SWEET Architecture
150
Issues/Concerns
151
Issues/Concerns
  • Requesting Newsletter articles for next meeting
  • Latest newsletter was sent to IDN April 19, 2002.
  • Articles included
  • UWG meeting
  • Next CEOS meeting
  • AADC Node status
  • synchronization with Catalogue Interoperability
    Protocol (CIP)
  • MD8 Operations Client
  • proposed MD9 Write-A-DIF
  • Add New Fields
  • Metadata Limits
Write a Comment
User Comments (0)
About PowerShow.com