Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory - PowerPoint PPT Presentation

About This Presentation
Title:

Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory

Description:

Title: PowerPoint Presentation Author: Wade & Joan Sheldon Last modified by: Wade Sheldon Created Date: 4/4/2001 2:52:56 AM Document presentation format – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 17
Provided by: wade71
Category:

less

Transcript and Presenter's Notes

Title: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory


1
Efforts to Link Ecological Metadata with
Bacterial Gene Sequences at theSapelo Island
Microbial Observatory
  • Wade M. Sheldon
  • Mary Ann Moran
  • James T. Hollibaugh

2
Genetic Sequence Databases
  • Major informatics success story
  • Large repositories for nucleotide sequences (e.g.
    GenBank/EMBL/NDDJ 16M)
  • Automated and web-based data submission -
    required as part of publication process
  • Standardized alignment/search tools support use
    for classification
  • Numerous environmental sequences ecologists
    now using to study biogeography, community
    structure, eco-physiology

3
Problems with GenBank
  • Metadata voluntary limited in scope
  • Title (definition), authors, key words, comments,
    literature citation
  • Many sequences unpublished, undescribed
  • Quality control standards poorly enforced
  • No direct way to provide links to ancillary data
    (URLs not officially supported, often removed)
  • Very inefficient and often impossible for
    investigators to obtain ecological context
    information, even from journals
  • Comparisons of matched taxa by traits not possible

4
Consequence
  • Tremendous amount of bacterial sequence data
    relevant to microbial ecologists
  • No established interface

5
Example Insufficient Metadata
6
Sapelo Island Microbial Observatory
(http//simo.marsci.uga.edu)
  • MObs NSF-funded network of sites or "microbial
    observatories" established to discover novel
    microorganisms, microbial consortia, communities,
    activities and other novel properties, and to
    study their roles in diverse environments
  • Projects supported are expected to establish or
    participate in an established, Internet-accessible
    knowledge network to disseminate the information
    resulting from these activities
  • SIMO - Investigating the diversity of
    prokaryotes, their physiological and genetic
    characteristics, and their biogeochemical
    activities in a salt marsh/estuarine ecosystem in
    the southeastern U.S.
  • Knowledge networks
  • GenBank
  • GCE-LTER IS
  • SIMO 16S rRNA Database

7
SIMO 16S rRNA Database
  • Purpose LIMS, research tool, data dissemination
  • Designed to store sequence data and all
    supporting SIMO research information
  • Hierarchical structure modeled after research
    workflow
  • Metadata on site geography, sample collection,
    all methodology, personnel, ancillary
    measurements
  • Extensive content control, error checking
  • Links to information in external databases (RDP
    II, GenBank, GCE-LTER)
  • Queries by phylogenic and/or ecological
    characteristics

8
Conceptual Diagram of the SIMO Database
9
List-based data entry linked to metadata tables
10
Controlled vocabulary supports finely-targeted
queriesAutomatic hyperlinks provide links to
tasks
11
List-based queries also simplify public interface
12
Phylogenetic and ecological characteristics
combined dynamically to create overview and query
interface
13
SIMO Metadata
  • Metadata primarily stored in managed lists,
    linked to records by foreign key fields
  • Scalable design details can be added
    independently without altering data records
  • Complete metadata for sequences generated by
    relational joins
  • Links to external metadata in GCE-LTER database
    adds site geography, research history, long-term
    environmental characteristics

14
Metadata Standards
  • No existing standard for environmental sequence
    metadata
  • Sequence formats (FASTA, BIOML, BSML) designed
    for data parsing, sequence annotation
  • SIMO metadata currently displayed in summary form
    on sequence detail pages
  • Exploring adopting emerging standards like EML

15
Sequence Details
16
Future Directions
  • Incorporating batch upload features for library
    submissions
  • Integrating database with RDP SeqMatch Agent
    programs for automatic phylogenetic analysis,
    sequence annotation
  • Provide full metadata in formatted/printable and
    parsable ASCII formats (XML)
  • Participate in Entrez Link-Out to provide links
    to SIMO sequence entries from GenBank
Write a Comment
User Comments (0)
About PowerShow.com