Databases and Sample Curation - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Databases and Sample Curation

Description:

SCOR WG115. Databases and Sample Curation. Darren Stevens, Philip C. Reid, Roy Lowry, Mark Costello ... Through the Internet, people are discovering and ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 31
Provided by: dpst
Category:

less

Transcript and Presenter's Notes

Title: Databases and Sample Curation


1
Databases and Sample Curation
  • Darren Stevens, Philip C. Reid,
  • Roy Lowry, Mark Costello

2
People of the Earth .
  • A powerful global conversation has begun.
  • Through the Internet, people are discovering
    and inventing new ways to share relevant
    knowledge with blinding speed. As a direct
    results, markets are getting smarter faster than
    companies. non-profits, public institutions,
    libraries,

3
Comment
  • By 2007, software systems will be developed and
    maintained through collaborative development
    environments, consisting of thousands of moving
    parts that are never turned off. This will
    transform the programming environment as we know
    it today... -- Grady Booch, chief scientist,
    IBM's Rational software group, Boulder, Colo.

4
Comment
  • A world in which we construct software in the
    same manner as we construct automobiles is not
    that far off. Assembly will be the operative
    word. Over the next three to five years,
    best-of-breed software components, developed
    internally or discovered externally, will form
    the building blocks, and Web services will be the
    glue that binds the components together to form
    new applications. -- Fergal Mullen, venture
    capitalist, Highland Capital Partners, Lexington,
    Mass.

5
Two sections
  • Electronic storage
  • Physical plankton sample storage

6
Electronic Storage
  • Access
  • Metadata Standards
  • Interoperability
  • Comparability
  • Usability
  • Archiving

7
A Sea of Acronyms
  • AJAX - Asynchronous Javascript and XML
  • SOAP - Simple Object Access Protocol
  • API - Application Programming Interface
  • XML - eXtensible Mark-up language
  • HTML - HyperText Mark-up Language
  • SSI - Server Side includes
  • SOA - Service Orient Architecture
  • SQL - Structured Query Language
  • XMLHTTP - Part of AJAX
  • DOM - Document Object Model
  • XHTML - Extensible HyperText Markup Language
  • DiGIR - Distributed Generic Information Retrieval
  • ODBC - Open Database Connectivity
  • ASP - Active Server Page
  • .NET - dot NET
  • PHP - Hypertext Preprocessor (Personal Home Page)

8
Access
  • Immediate access is expected
  • Provided by the World Wide Web
  • Individual organisation
  • Centralised Database
  • Distributed Database

9
Centralised Database
  • Examples
  • FishBase, AlgaeBase, COPEPOD, ITIS
  • Advantages
  • single data structure
  • Disadvantages
  • heavy technical, scientific and financial burden
    on an organisation

10
Centralised Database
SQL Server
Client
Results
OLTP
Query
OLAP
Relational Database Management System
Client Application
11
Distributed Database
  • Examples
  • OBIS, GBIF, Species 2000
  • Advantages
  • funding costs spread across all data providers
  • data dynamic and maintained at source by those
    best qualified
  • data ownership issues minimised custodian has
    control over access

12
Distributed Database
  • Disadvantages
  • speed of response can decrease with network
    growth
  • data may be off-line
  • data quality varies between sources
  • metadata needs to be developed in parallel
  • lack of feedback

13
Distributed Database
SQL Server
Client
XML
Oracle
Query
Client Application
MYSQL
Results
14
EurOBIS Model
Internet
Flanders Marine Institute
Oracle DB
ERMS
DiGIR Software
XML over HTTP
Match taxon
Parse XML records and insert into cache
XML over HTTP
EurOBIS cache
Access DB
PHP Crawler
DiGIR Software
XML over HTTP
MySQL DB
DiGIR Software
Reproduced from Vanhoorne B et al 2004.
15
Metadata Standards
  • Standards to describe datasets need to be
    established
  • Global Change Metadata Standard (GCMS)
  • Federal Geographic Data Committee (FGDC)
  • Expansion for marine biology and ecology data
  • OBIS produced OBIS Discovery Metadata

16
OBIS Discovery Metadata Fields
17
Interoperability
  • XML
  • DiGIR (OBIS)
  • Web portal
  • SCHEMA (Darwin Core expanded by OBIS)

18
Comparability
  • Standards
  • World Ocean Database Plankton
  • Common Base-unit Value
  • /m3 for zooplankton
  • /ml for phytoplankton
  • /ul for bacterioplankton
  • OBrien 2004
  • Standard grid sizes

19
Usability
  • Mapping Tools
  • WinCPR
  • Easily interpreted graphics
  • Web Services
  • Google Maps

20
SAHFOS WinCPR v1.0
  • Annual abundance of C. finmarchicus in 1958 1997

21
SAHFOS WinCPR v1.0
22
Archiving
  • Biggest problems
  • Stored media can deteriorate
  • Hardware or software to access data not available
  • Financial burden
  • Solution urgently required

1989
1982
23
Ocean Biodiversity Informatics conference
statement Hamburg 1st December 2004
  • We note that increased availability and sharing
    of data
  • is good scientific practice and necessary for
    advancement of science,
  • enables greater understanding through more data
    being available from different places and times,
  • improves quality control due to better data
    organisation, and discovery of errors during
    analysis,
  • secures data from loss.

24
Data requests funding
25
Publications using CPR data
26
Physical plankton sample storage
  • Cost
  • Management
  • Why bother?

27
Cost
  • Expensive
  • Ongoing
  • VITAL

28
Management
  • Look outside (Geological surveys, museums, art
    galleries)
  • Maintain access
  • Bar-coding (supermarkets)

29
Why Bother?
  • Reanalysis
  • Never analysed
  • New technologies and techniques

30
Summary
  • Accessibility
  • Metadata
  • Interoperability
  • Need to archive physical samples
Write a Comment
User Comments (0)
About PowerShow.com