ENABLE project notes - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

ENABLE project notes

Description:

Queries distributed across directories of collaborating services ... Simple client program methods for computable use of directories ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 18
Provided by: dong167
Category:

less

Transcript and Presenter's Notes

Title: ENABLE project notes


1
ENABLE project notes
  • Don Gilbert, gilbertd_at_indiana.edu
  • Sept 2003

2
Biology information access projects
  • Bio-info archiving and distribution
  • IUBio Archive, http//iubio.bio.indiana.edu/ --
    public molecular biology data / software archive
  • Bio-Mirrors, http//www.bio-mirror.net/ --
    Sequence and related biology databanks
  • Genome information systems
  • FlyBase, http//flybase.bio.indiana.edu/ --
    genome infosystem of Drosophila fruitfly
  • euGenes, http//eugenes.org/ -- infosystem for 8
    important eukaryotes with 180,000 genes
  • Bio-Data Grids
  • http//iubio.bio.indiana.edu/grid/ --
    experimental distributed computing

3
(No Transcript)
4
(No Transcript)
5
FlyBase and euGenes
6
ENABLE data sets access
  • Data sets and access to data
  • Back-end architecture and protocols
  • Bio-Grids and Bio-Directories

7
Major Bio Databanks
from EBI (www.ebi.ac.uk), Sept. 2002
8
Constellation of Bio-Data (SRS - Lion Bioscience)
9
ENABLE access architecture
10
Data Access Needs
  • Computable genome data access
  • -- Page scraping and bulk files not enough
  • -- Internet search retrieval of all genome
    objects distributed among many sources
  • -- Simple, flexible client program model
  • -- Efficient for high volumes (105 objects gt1 GB
    sizes)

11
Directories of Genome Data
  • Directories are a necessary step for bio grids
  • "broad and shallow" directories federate the
    "narrow and deep" databases
  • Bio-Data Access Tools
  • SRS, Sequence Retrieval System Entrez AceDB
    Genome relational databases (Ensembl, FlyBase,
    WormBase) IBM DiscoveryLink BioDAS BioMoby
  • Directory services for data access
  • Layer onto access tools for common
    query/retrieval
  • LDAP mature, efficient for high volumes, query
    distributed directories works well with
    bio-access tools
  • Web Services XML messages over Web wide
    industry support , standards are in progress

12
Directory components
13
Bio Directory Needs
  • Build on existing technology for finding
    distributed objects
  • Efficient for millions of objects, by the
    gigabyte and terabyte
  • Queries distributed across directories of
    collaborating services
  • Support existing and new bioinformatics data
    access (relational dbs, object and XML dbs, SRS,
    Entrez, AceDB)
  • Simple client program methods for computable use
    of directories
  • Flexible, common schema for describing objects
  • Replicate directories and objects among
    bioinformatics centers
  • Peer-to-peer directories for collaborative
    projects
  • Strong authentication and security for data
    access

14
Directory Standards
  • Open Grid Services Architechture (OGSA)
  • SOAP based query support for XML-SQL, Xpath,
    Xquery.
  • Data Access project http//www.ogsa-dai.org.uk/
  • Lightweight Directory Access (LDAP)
  • Robust system for distributed search and
    retrieval
  • Object-centric, optimized for efficient read
    operations
  • Hierarchical, distributed and replicated in
    nature
  • Life Sciences ID (LSID)
  • new standard for bio-object naming, with LDAP and
    WebServices implementations
  • Moby project web services repository system

15
Directory Tests
  • Design and test distributed access with LDAP and
    Web Services
  • SRS backend for efficient search/retrieval from
    GenBank, SwissProt/TrEMBL, LocusLink, Medline,
    many others
  • Find fetch 20,000 to 1.2 million objects
  • LDAP is 10x faster than WebServices
  • Tests in progress for IUBio, FlyBase

16
Directory Tests
17
ENABLE biodata access issues
  • Basic Web-Services and LDAP access working in
    testing form not stable nor finalized
  • Bio-Data categorization, schema, and meta-data
    for directories needs work
  • Grid (OGSA), OAI, other interfaces to be developed

Directory tests at http//iubio.bio.indiana.edu/bi
ogrid/directories/
Write a Comment
User Comments (0)
About PowerShow.com