Title: BODHI, A Biodiversity Database Plantform
1BODHI,A Bio-diversity Database Pla(n)tform
- Jayant Haritsa
- Database Systems Lab
- Supercomputer Education and Research Centre
- Indian Institute of Science
2Team
- B. J. Srikanta (next talk)
- Prof. Madhav GadgilProf. V. Nanjundiah(Centre
for Ecological Sciences, IISc) - Several Masters Students
- Funded by DBT
3Motivation
- GATT Patent Laws
- To be in place by 2005
- Loss
- Neem
- Basmati (estimated export value Rs. 1,198 crore)
- Turmeric
- Global and local efforts
- GBIF (Global Biodiversity Information Facility)
- Karnataka Bio-diversity Board Deccan Herald -
Aug 26 2000
4Bio-diversity Data
- Taxonomy of species
- Phenetic (physical) characteristics
- Phylogenetic (evolutionary) characteristics
- Habitat / Spatial distribution
- Political Layout
- Geographic Layout
- Biospheres
- Genetic information
- Bio-molecular sequences
- Structural information
5MULTI-DOMAIN QUERY
- Retrieve all plant species that share a common
habitat, have identical Inflorescence
characteristics, and have a DNA sequence within
BLAST score of 80, with respect to
Michelia-champa.
6Difficulties
- Complex range of data types
- sets, hierarchies, aggregations, sequences,
geometries, maps, audio, images - Multidimensional data
- spatial (latitude, longitude, elevation)
toproteins (hundreds of coordinates) - Computationally-intensive operators
- species relationships, spatial distributions,
sequence alignments, ...
7Current Solutions
- Small-Scale
- MS-Access / FoxPro / Excel / ...
- Pentium PCs
- Large-Scale
- RDBMS Oracle / DB2 / Informix / Sybase /
- Unix servers Sun / SGI / IBM / HP / ...
8Limitations
- RDBMS approach of the world is a flat
collection of tables with simple attributes
- suits financial applications,
- NOT scientific (biological) applications
- In particular, taxonomic / spatial / sequence /
multimedia data modeling and processingare very
cumbersome and coarse
9Limitations (contd)
- Spatial and other applications are not within the
database kernel but are connected externally.
E.g. Many GIS systems have ArcInfo and MS-Access
hooked up in a black-box manner. Or,
Blast/FASTA utilizing sequence files generated
from Oracle. - Problem Slow and ugly!
10Is there Hope?
- Object-Oriented DBMS
- Natural for biological applications
- High-performance data access methods
- Path Dictionary Index, Multi-key Type
Index,Pyramid Tree, ... - High-performance specialized operators
- spatial join, data mining, sequence processing,
- XML HTML Semantics
11Goals of BODHI
- Seamless integration of taxonomic, spatial and
genomic data using OO technology - Latest access methods and operatorsfor all three
types of data - Utilize XML for data exchange
- Low-cost (ideally, free!)
12Architecture of BODHI
Client Interface Framework
Query Processor
Object Operations
Genome Operations
Spatial Operations
Spatial Indexes
Object Indexes
Genome Indexes
Genome Model
Spatial Model
Taxonomy Model
Spatial Services
Object Services
Sequence Services
OBJECT STORAGE MANAGER
13Implementation of BODHI
Client Interface Framework
?DB
Inheritance Aggregation
Alignment BLAST, FASTA
Overlaps, Contains,Closest, Within
R-tree, Hilbert-Rtree
Multi-Key Type, Path-Dictionary
??? Indexes (next talk)
Country, State, City, River, Road
Species, Genera, Family, Order
DNA, Protein
Spatial Services
Object Services
Sequence Services
Basic Types (Point, Line, Polygon, Sets,
Sequences, ...)
SHORE MICRO-KERNEL
14Implementation Layout
15Query Flow
16Project Status
- Prototype (minus Client Interface Framework) is
operational since last month ! - Platform PIII-700MHz running Redhat Linux.
- For Code, contact bodhi_at_dsl.serc.iisc.ernet.in
17Performance Evaluation
- SEQUOIA 2000 spatial benchmark Competitive with
Paradise GIS from Wisconsin - Taxonomy Spatial Queries Reasonably fast
- But Genomics slows things down a lot due to
absence of indexes (next talk)
18More details
- Design and Implementation of a Biodiversity
Information System,Proc. of Intl. Conf. On
Management of Data (COMAD), Pune, December 2000 - The Building of BODHI, A Bio-diversity Database
System,TechRep-2001-02, DSL/SERC, IISc - Available at http//dsl.serc.iisc.ernet.in
19End of Talk