GCE Data Toolbox metadatabased tools for automated data processing and analysis PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: GCE Data Toolbox metadatabased tools for automated data processing and analysis


1
GCE Data Toolbox -- metadata-based tools for
automated data processing and analysis
  • Wade Sheldon
  • University of Georgia
  • GCE-LTER

2
Rationale
  • Data processing, quality control, data analysis
    and metadata generation traditionally carried out
    as separate activities, often in different time
    frames using different technologies
  • Problems
  • Metadata may not reflect all processing steps
  • Much routine data analysis done w/o Q/C, metadata
  • No economy of scale leads to one-off
    solutions
  • Metadata generation should ideally occur
    throughout the data cycle and inform data
    analysis

3
Design Goals
  • Develop Integrated Storage Standard
  • Tabular Data
  • QA/QC Information
  • Metadata (overall data set columns/attributes)
  • Develop Software to Support Standard
  • Code Library/API
  • User Interfaces
  • Apply Technology to Acquire, Manage, Distribute
    GCE-LTER Data
  • Explore Use as Prototype Technology for
    Metadata-based Data Processing, Synthesis

4
Storage Standard
  • Developed Using MATLAB
  • Local expertise, large scientific user base
  • Cross-platform (Win32, Solaris, nix, Mac OS/x)
  • Rapid development environment
  • Supports multiple interfaces (interactive command
    line, batch-mode scripts, GUI, WWW)
  • Good interoperability with other technologies
    (Java, PERL, SQL)
  • Defined GCE Data Structure Spec. (based on
    MATLAB/C structures)
  • Structure with 17 named fields
  • Specific content rules for each field (software
    validation)
  • Combines data, metadata, QA/QC, processing history

5
Storage Standard
GCE Data Structure Specification (v1.1)
6
Software GCE Data Toolbox
  • Core Function Library
  • Create, Validate Structures
  • Import Data, Metadata (ASCII, MATLAB, SQL)
  • Manipulate Data, Metadata (unit conversions,
    add/delete/update)
  • Export Data, Metadata (various formats)
  • Dynamic, Rule-base QA/QC Flagging
  • Self-documenting Processing
  • Operation Logging (Processing History)
  • Transparent Metadata Creation/Updating
  • Dynamic (JIT) Metadata Generation for Columns
  • Support for Metadata Templating
  • Application of Boilerplate Metadata based on
    Parameter Matching
  • Supports Rapid Documentation of Routine Data
    Sources

7
Software GCE Data Toolbox
  • Support for Analysis
  • Descriptive Statistics, Reports
  • Visualization, Mapping
  • Support for Synthesis
  • Composite Data Set Creation
  • Multiple Data Set Merge/Concatenation
  • Relational Join
  • Metadata Content Meshing
  • Data Set Summarization
  • Statistical Data Reduction/Re-sampling
  • Data Set Standardization
  • Unit Conversions (automatic, interactive)
  • Template-based Semantic Mapping
  • Automatic Semantic Mediation (prototype stage)

8
Software User Interfaces
  • Unattended Batch Mode Processing
  • Interactive Command Line Processing (conventional
    MATLAB UI)
  • Full help text for each function
  • Well-defined input/output arguments
  • GUI Applications
  • Standard Forms, Dialogs, Controls
  • No MATLAB Experience Required
  • WWW MATLAB Web Server
  • HTML Forms, Querystring Input
  • HTML Pages and/or Static File Output

9
Command-Line Interface
10
GUI Applications
11
WWW Interface
12
Current Applications
  • Automated Data Processing
  • Direct data import from data logger files, WWW
    data sources (USGS), SQL queries
  • Automatic metadata creation (templates, data
    mining)
  • Rule-based QA/QC flagging
  • Data Set Packaging
  • Batch processing to create/update data, metadata
    products
  • On-demand generation of data, metadata, stat
    reports in custom formats (end-user scripts, GUI
    applications, WWW forms)

13
Current Applications
  • Data Exploration/Analysis by PIs
  • Descriptive Statistics based on attribute
    metadata
  • Visualization with Interactive Filtering
    (Frequency Histograms, 2D Plots, Map Plots)
  • Data Reduction/Re-sampling to Provide Customized
    Data at Various Scales
  • Aggregated Statistics
  • Binned Statistics
  • Query/Filtering (sub-selection)

14
Current Applications
  • Data Harvesting (GCE)
  • USGS Data (WWW real-time, daily, finalized data)
  • Campbell Scientific Data Arrays (post-processing
    triggered after LoggerNet Retrieval)
  • Sea-Bird Hydrographic Data
  • USGS Data Harvesting Service for HydroDB
  • Weekly harvest for 31 stations/7 LTER Sites
  • Automatic Resampling, Unit Conversions, Q/C

15
Availability
  • Description, Screen-shots, Fully-functional
    Toolbox Available on WWW
  • http//gce-lter.marsci.uga.edu/lter/research/tool
    s/data_toolbox.htm
  • Requires MATLAB 5.3, 6.0, 6.5 (any platform)
  • Public Version Compiled
  • Source Code Requests Considered on Case-by-Case
    Basis

16
Future Development Plans
  • EML 2.0 Support
  • Metadata-mediated Data Set Integration
  • Unit conversions
  • Re-sampling
  • More WWW Interface Development
Write a Comment
User Comments (0)
About PowerShow.com