Title: XMLBased Grid Data System for Bioinformatics Development
1XML-Based Grid Data System for Bioinformatics
Development
- Noppadon Khiripet, Ph.D
- Wasinee Rungsarityotin, MS
- Chularat Tanprasert, Ph.D
- Royol Chitradon Ph.D
National Electronics and Computer Technology
Center, Thailand
2APBioNet Resources
- BioDatabases
- - BioMirrors, PDB Mirror, SRS
- BioComputing
- Beowulf Linux cluster, BioNavigator, Vector NTI,
- BioTraining
- S Life Science Informatics Alliance, Hypercourse
in Online Bioinformatics - Etc.
3APBioNet BioDatabases
4Whats missing ?
I have a bioinformatics question. How do I search
the answer ?
5Sample Question
What proteins in rice have families with a size
greater than twenty members with at least one
known structure, whose corresponding gene
expression is activated under dry conditions, and
that are involved in interactions with at least
two other proteins ?
6Query Mechanism
Whats the communication protocol ?
7eXtensible Markup Language (XML)
- Standard language of data exchange
- Very flexible for defining complex data
structures - Many supported tools such as XSL, DOM, Perl, and
JAVA - Less overhead when transform data from one to the
other formats
8XML-Based Grid Data System for Bioinformatics
- lt?xml version"1.0" encoding"UTF-8" ?gt
- lt!DOCTYPE METASERVER_QUERYgt
- ltSEARCHgt
- ltkeygtproteinlt/keygt
- ltcriteria"family size"gt20lt/criteriagt
- ltcriteria"known structure"gt1lt/criteriagt
- ltcriteria"gene expression condition"gtdrylt/cri
teriagt - lt!-- This quality will be matched to a
quantity equivalent to being dry --gt - ltcriteria"protein interactions"gt2lt/criteriagt
- lt/SEARCHgt
9Grid Data System
- Grid technologies such as Globus enables sharing
of geographically distributed content - Create a virtual resource of biological data
- Our interest overlaps with the Commodity Grid
Project - Exporting Grid technologies to our applications
10Motivating Example Alliance Science Portal, CoG
Kit
- What we have learned about Commodity Grid
- Access and communicate with a variety of
information sources - Ability to include remote computational resources
- Performance guarantees
- Portable user interfaces
11Basic Integrated Grid Architecture
Gene finding
Protein Prediction
Applications
Applications Toolkits
Data Grid
Remote Computation
Portals
Grid Services (Middleware)
Protocols, Authentication, Policy, Resource
Management, Discovery, Events, etc.
Grid Fabric (Resources)
Storage Networks, Computer, Display Devices, etc.
and associated local services
12How To Use Grid?
Web Interface
Remote Data analysis service
Applications Toolkits
Genbank, EMBL, BLASTDB, etc..
Commodity Grid Toolkits
Grid Services Fabric
Low-level Grid Technologies
13CoG Mapping to Grid fabric and services
14Our Application Rice
- Building a genomic framework for research in rice
- Providing information and computational resources
In collaboration with DNA Technology
Laboratory, Kasetsart University and
Computational Biology Research Group, University
of Washington
15Our focus on Grid technology
- Low-level utility components
- Build Data cache/replication as an application
toolkit in the integrated Grid architecture - Application-specific GUI components
- A query interface for the genomic framework for
rice research on the top layer (application) of
Grid architecture
16Questions ?