Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003 - PowerPoint PPT Presentation

About This Presentation
Title:

Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003

Description:

Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003 Mark Schildhauer, Ph.D. Director of Computing, NCEAS – PowerPoint PPT presentation

Number of Views:300
Avg rating:3.0/5.0
Slides: 48
Provided by: Matt205
Learn more at: https://www.esa.org
Category:

less

Transcript and Presenter's Notes

Title: Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003


1
Ecological Informatics Challenges and
BenefitsPresentation to ESA Visions
CommitteeMarch 31, 2003
Mark Schildhauer, Ph.D. Director of Computing,
NCEAS
http//knb.ecoinformatics.org http//seek.ecoinfor
matics.org
2
Research Team and Collaborators
  • PISCO
  • LTER Network
  • San Diego Supercomputer Center
  • Arizona State University
  • University of Kansas
  • University of North Carolina
  • OBFS Network
  • UC NRS
  • Sandy Andelman
  • Chad Berkley
  • Matthew Brooke
  • John Harris
  • Dan Higgins
  • Matt Jones
  • Jim Reichman
  • Mark Schildhauer
  • Jing Tao

3
What is Ecoinformatics?
Data Acquisition
Integration
Storage, archiving
Distributed Access
Results
4
Ecoinformatics
  • The Goal to develop technology tools and
    services to enable more efficient acquisition,
    integration, and analysis of ecological data
  • Specific Challenges
  • An Approach to Technology Solutions (KNB)
  • Future Directions
  • a Science Environment for Ecological Knowledge,
    SEEK

5
Status of Ecological Data
  • Highly dispersed
  • Different individuals, organizations, and
    locations
  • Extreme heterogeneity
  • in Form, Content, and Meaning
  • Lack of Documentation (metadata)
  • Lack of metadata overall
  • Many standards in use, many custom types
  • Implementations are not modular

6
Data are Highly Dispersed
  • Data are distributed among
  • Independent researcher holdings
  • Research station collections
  • LTER Network (24 sites)
  • Org. of Biological Field Stations (160 sites)
  • Univ. Cal Natural Reserve System (36 sites)
  • Agency databases
  • Museum databases

7
Data are physically dispersed
Visitors to NCEAS
Field Stations in North America
8
Data are very heterogeneous
  • Population survey
  • Experimental
  • Taxonomic survey
  • Behavioral
  • Meteorological
  • Oceanographic
  • Hydrology
  • Syntax
  • (format)
  • Schema (organization)
  • Semantics (meaning/methods)

9
Thematic heterogeneity due to Vast Scope of
Ecology
Biosphere
Abiotic
Biomes
Communities
Organisms
Genes
10
Classifying Data Heterogeneity
  • Syntax (format)
  • Schema (organization)
  • Semantics (knowledge/meaning/methods)

11
Data Lacking in Documentation
  • Majority of ecological data undocumented
  • Lack information on syntax, structure and
    semantics of data
  • Impossible to understand data without contacting
    the original researchers even then memoriescan
    fail, individuals retire or expire
  • Documentation conventions widely vary
  • Requires large time investment to understand each
    data set

12
Summary of Technical Challenges
  • Because of
  • Data dispersion
  • Data heterogeneity
  • Lack of documentation
  • Integration and synthesis are limited to a manual
    process
  • --difficult to scale integration efforts up to
    large numbers of data sets

13
Solutions
  • Standardized measurements
  • Changes needed in culture, training
  • Technology development- metadata, data servers,
    desktop tools

14
Ecoinformatics Research Objectives
  • Enhance access to ecological and environmental
    data
  • Promote data sharing re-use
  • Enable national data discovery
  • Provide access to research stations data
    resources
  • Maintain local autonomy for data management
  • Synthesis and Analysis
  • Promote cross-cutting analysis
  • Taxonomic, Spatial, Temporal, Conceptual
    integration of data
  • Data preservation
  • Long term data description
  • Provide archiving capabilities

15
Functional breakdown for Analysis
  • Data discovery
  • Data access
  • Data storage/archive
  • Data interpretation
  • Quality assessment
  • Data Conversion Integration
  • Analysis Modeling
  • Visualization

16
KNB Development Projects(Knowledge Network for
Biocomplexity)
  • Ecological Metadata Language (EML)
  • Prospective standard for ecological metadata
  • Metacat
  • A freely available database for storing metadata
  • Morpho
  • A freely available tool for creating metadata

17
KNB Overview
Metadata (EML)
Data
Client
Server
Morpho
Morpho
Metacat
Web Browser
Web Browser
Metacat
18
KNB Development Projects
  • Ecological Metadata Language (EML)
  • Metacat
  • Morpho

19
Why the big buzz about Metadata
  • Metadata are the basis for the next generation
    of the Web
  • The Semantic Web is a web of data, in some
    ways like a global database The driver for the
    Semantic Web is metadata --Tim Berners-Lee,
    father of the Web
  • Digital Library Community Era of Metadata
    1998-200? Carol Mandel, Digital Librarian

20
Central Role of Metadata
  • What are metadata?
  • Data documentation
  • Ownership, attribution, structure, contents,
    methods, quality, etc.
  • Critical for addressing data heterogeneity issues
  • Critical for developing extensible systems
  • Critical for long-term data preservation
  • Allows advanced services to be built

21
Data just numbers
  • 072998 29.5 17.0
  • 073098 29.7 6.1
  • 073198 29.1 0

22
Data Metadata numbers context
  • Date Temp (C) Precip. (mm)
  • Obs. 1 072998 29.5 17.0
  • Obs. 2 073098 29.7 6.1
  • Obs. 3 073198 29.1 0

23
Data Integration ? synthesis
A
B
C
24
Rules of Thumb (Michener 2000)
  • the more comprehensive the metadata, the greater
    the longevity (and value) of the data
  • structured metadata can greatly facilitate data
    discovery, encourage best metadata practices
    and support data and metadata use by others
  • metadata implementation takes time!!!
  • start implementing metadata for new data
    collection efforts and then prioritize legacy
    and ongoing data sets that are of greatest
    benefit to the broadest user community

25
EML 2.0a formal ecological metadata specification
  • eml-resource -- Basic resource info
  • eml-dataset -- Data set info
  • eml-literature -- Citation info
  • eml-software -- Software info
  • eml-party -- People and Organizations
  • eml-entity -- Data entity (table) info
  • eml-attribute -- Attribute (variable) info
  • eml-constraint -- Integrity constraints
  • eml-physical -- Physical format info
  • eml-access -- Access control
  • eml-distribution -- Distribution info
  • eml-project -- Research project info
  • eml-coverage -- Geographic, temporal and
    taxonomic coverage
  • eml-protocol -- Methods and QA/QC

26
KNB Development Projects
  • Ecological Metadata Language (EML)
  • Metacat
  • Morpho

27
Metacat metadata storage
  • Metadata storage, search, presentation
  • Schema independent supports arbitrary XML types
  • Multiple metadata standards
  • Ecological Metadata Language
  • NBII Biological Data Profile
  • Data storage preservation
  • Replication
  • Flexible access control system
  • National distributed directory service
  • Strong version control
  • Configurable web interface (XSLT)

28
Metacat network
SEV
NRS Metacat
OBFS
AND
SEV Metacat
NCEAS Metacat
CAP
LTER Metacat
Key
Metacat Catalog
Morpho clients
Web clients
SDSC Metacat
Site metadata system
XML output filter
29
Web interface
30
KNB Development Projects
  • Ecological Metadata Language (EML)
  • Metacat
  • Morpho

31
Morpho Window to the KNB
32
Morpho Features
  • Guided Metadata creation
  • Wizards editor
  • Automatically extract metadata during data import
  • Search all metadata structured free text
  • Contribute to KNB
  • Windows, Mac, Linux
  • Multiple metadata standards
  • EML
  • NBII Biological Data Profile
  • Extensible
  • Standalone (non-networked) mode

33
Objectives of the KNB SEEK
  • National network for ecological data
  • Data discovery
  • Data access
  • Data interpretation
  • Enable advanced services
  • Quality management
  • Data integration thru advanced queries
  • Visualization and analysis

34
Solutions
  • KNB
  • Ecological Metadata Language (EML)
  • Metacat -- flexible metadata database
  • Morpho -- data management for ecologists
  • SEEK (partners include NCEAS, KU, SDSC,
    LTER Netw Offc, CAP, Napier Univ., UVM, UNC)
  • Unified Portal to Ecological Data (ECOGRID)
  • Quality Assurance engine
  • Semantic Query Processor
  • Data integration and Analytical Pipelines

35
SEEK addressing semantic integration
Ontologies
EcoGrid
One-stop access to ecological and environmental
data
Semantic Mediation
Data integration using logic-based reasoning
Science Environment for Ecological Knowledge
Analysis and Modeling Pipelines
Analysis workflows using semantic mediation
36
Quality Assessment
  • Integrity constraint checking
  • Data type checking
  • Metadata completeness
  • Data entry errors
  • Outlier detection
  • Check assertions about data
  • e.g., trees dont shrink
  • e.g., sea urchins do

37
Semantic metadata
  • Describes the relationship between measurements
    and ecologically relevant concepts
  • Drawn from a controlled vocabulary
  • Ontology for ecological measurements

38
Representing ontologies
  • OWL Web Ontology Language
  • CKML Conceptual Knowledge Markup Language
  • RDF Resource Description Framework

39
Ecological Ontologies
40
Semantic Data Discovery
  • Knowledge of SQL or database languages is a
    barrier to data access and re-use
  • SELECT dsname FROM dslist WHERE meas_type LIKE
    pop_den AND location GBNPP AND common_name
    barnacles
  • Semantic Queries allow scientists to express
    data queries in familiar scientific terms
  • What data sets contain population density
    estimates for barnacles in Glacier Bay National
    Park and Preserve?
  • Functionality enabled through semantic metadata

41
Data Integration
Semantic Metadata
Data
Researcher Decisions



Integrated Data Set
42
Re-using data from the KNB
  • Goal support visualization analysis
  • Scalability--
  • Efficiently process more data from investigators
  • Broader Spatial extent, longer temporal extent,
    robust taxonomic extent
  • Analytical Pipelines (Monarch prototype)
  • Flexible tool for exploratory analysis of data
  • Directly process data in the network
  • Utilize powerful analytical environments (SAS,
    Matlab, R, )
  • Analysis audit trail
  • Reproduce analyses
  • Communicate about analyses
  • Automate new analyses based on earlier ones

43
Analysis Pipelines
Runtime Data Binding
44
Scaling Analysis and Modeling
45
Data Acquisition (Jalama prototype)
  • Application to assist in data collection
  • Capture relevant metadata (e.g., EML) during
    initial data collection
  • Encourage good informatics practice via
    automating design of field data forms
  • Integration with Metadata and Data storage
    frameworks (e.g., Metacat)

46
Ecoinformatics Solutions!
Integration MORPHO
Data Acquisition JALAMA
Storage, archiving ECOGRID
Distributed Access METACAT
Analysis Viz MONARCH
47
Fin
http//knb.ecoinformatics.org
Write a Comment
User Comments (0)
About PowerShow.com