BODHI, A Biodiversity Database Plantform - PowerPoint PPT Presentation

About This Presentation
Title:

BODHI, A Biodiversity Database Plantform

Description:

Global and local efforts. GBIF (Global Biodiversity Information Facility) Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000] BODHI. 4. Bio-diversity Data ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 19
Provided by: srik6
Category:

less

Transcript and Presenter's Notes

Title: BODHI, A Biodiversity Database Plantform


1
BODHI,A Bio-diversity Database Pla(n)tform
  • Jayant Haritsa
  • Database Systems Lab
  • Supercomputer Education and Research Centre
  • Indian Institute of Science

2
Team
  • B. J. Srikanta (next talk)
  • Prof. Madhav GadgilProf. V. Nanjundiah(Centre
    for Ecological Sciences, IISc)
  • Several Masters Students
  • Funded by DBT

3
Motivation
  • GATT Patent Laws
  • To be in place by 2005
  • Loss
  • Neem
  • Basmati (estimated export value Rs. 1,198 crore)
  • Turmeric
  • Global and local efforts
  • GBIF (Global Biodiversity Information Facility)
  • Karnataka Bio-diversity Board Deccan Herald -
    Aug 26 2000

4
Bio-diversity Data
  • Taxonomy of species
  • Phenetic (physical) characteristics
  • Phylogenetic (evolutionary) characteristics
  • Habitat / Spatial distribution
  • Political Layout
  • Geographic Layout
  • Biospheres
  • Genetic information
  • Bio-molecular sequences
  • Structural information

5
MULTI-DOMAIN QUERY
  • Retrieve all plant species that share a common
    habitat, have identical Inflorescence
    characteristics, and have a DNA sequence within
    BLAST score of 80, with respect to
    Michelia-champa.

6
Difficulties
  • Complex range of data types
  • sets, hierarchies, aggregations, sequences,
    geometries, maps, audio, images
  • Multidimensional data
  • spatial (latitude, longitude, elevation)
    toproteins (hundreds of coordinates)
  • Computationally-intensive operators
  • species relationships, spatial distributions,
    sequence alignments, ...

7
Current Solutions
  • Small-Scale
  • MS-Access / FoxPro / Excel / ...
  • Pentium PCs
  • Large-Scale
  • RDBMS Oracle / DB2 / Informix / Sybase /
  • Unix servers Sun / SGI / IBM / HP / ...

8
Limitations
  • RDBMS approach of the world is a flat
    collection of tables with simple attributes
  • suits financial applications,
  • NOT scientific (biological) applications
  • In particular, taxonomic / spatial / sequence /
    multimedia data modeling and processingare very
    cumbersome and coarse

9
Limitations (contd)
  • Spatial and other applications are not within the
    database kernel but are connected externally.
    E.g. Many GIS systems have ArcInfo and MS-Access
    hooked up in a black-box manner. Or,
    Blast/FASTA utilizing sequence files generated
    from Oracle.
  • Problem Slow and ugly!

10
Is there Hope?
  • Object-Oriented DBMS
  • Natural for biological applications
  • High-performance data access methods
  • Path Dictionary Index, Multi-key Type
    Index,Pyramid Tree, ...
  • High-performance specialized operators
  • spatial join, data mining, sequence processing,
  • XML HTML Semantics

11
Goals of BODHI
  • Seamless integration of taxonomic, spatial and
    genomic data using OO technology
  • Latest access methods and operatorsfor all three
    types of data
  • Utilize XML for data exchange
  • Low-cost (ideally, free!)

12
Architecture of BODHI
Client Interface Framework
Query Processor

Object Operations
Genome Operations
Spatial Operations
Spatial Indexes
Object Indexes
Genome Indexes
Genome Model
Spatial Model
Taxonomy Model
Spatial Services
Object Services
Sequence Services
OBJECT STORAGE MANAGER
13
Implementation of BODHI
Client Interface Framework
?DB

Inheritance Aggregation
Alignment BLAST, FASTA
Overlaps, Contains,Closest, Within
R-tree, Hilbert-Rtree
Multi-Key Type, Path-Dictionary
??? Indexes (next talk)
Country, State, City, River, Road
Species, Genera, Family, Order
DNA, Protein
Spatial Services
Object Services
Sequence Services
Basic Types (Point, Line, Polygon, Sets,
Sequences, ...)
SHORE MICRO-KERNEL
14
Implementation Layout
15
Query Flow
16
Project Status
  • Prototype (minus Client Interface Framework) is
    operational since last month !
  • Platform PIII-700MHz running Redhat Linux.
  • For Code, contact bodhi_at_dsl.serc.iisc.ernet.in

17
Performance Evaluation
  • SEQUOIA 2000 spatial benchmark Competitive with
    Paradise GIS from Wisconsin
  • Taxonomy Spatial Queries Reasonably fast
  • But Genomics slows things down a lot due to
    absence of indexes (next talk)

18
More details
  • Design and Implementation of a Biodiversity
    Information System,Proc. of Intl. Conf. On
    Management of Data (COMAD), Pune, December 2000
  • The Building of BODHI, A Bio-diversity Database
    System,TechRep-2001-02, DSL/SERC, IISc
  • Available at http//dsl.serc.iisc.ernet.in

19
End of Talk
Write a Comment
User Comments (0)
About PowerShow.com