Title: The Encyclopedia of Life Project Large Scale Gridenabled Proteome Annotation
1The Encyclopedia of Life ProjectLarge Scale
Grid-enabled Proteome Annotation
-
- Wilfred Li
- Integrative Biosciences
- San Diego Supercomputer Center
- UCSD
2Encyclopedia Of Life Project
- High quality functional and 3-D structure
assignment using iGAP - Grid-enabled bioinformatics applications
- Optimization
- Dedicated and external grid resources
- Integrative biological data warehouse
- Web services consumer
- Distributed database and data mining
- Advanced query environment
- Open Notebook
- Web services provider
3Tools
- Protein Annotation Toolbox
- A C library for structural annotation of
proteins - Integrative Genome Annotation Pipeline
- A wrapper for bioinformatics applications
- AppLeS Parameter Sweep Template
- A grid application environment
- Bioinformatics Workflow Management System
- A generic WMS for bioinformatics applications
4Protein sequences
structure info
sequence info
Prediction of signal peptides (SignalP,
PSORT) transmembrane (TMHMM, PSORT) coiled
coils (COILS) low complexity regions (SEG)
NR, PFAM
SCOP, PDB
Step 1
Building FOLDLIB PDB chains SCOP domains PDP
domains CE matches PDB vs. SCOP 90 sequence
non-identical minimum size 25 aa coverage (90,
gaps lt30, endslt30)
Structural assignment of domains by WU-BLAST
Step 2
Structural assignment of domains by PSI-BLAST
profiles on FOLDLIB
Step 3
Structural assignment of domains by 123D on
FOLDLIB
Step 4
Functional assignment by PFAM, NR assignments
FOLDLIB
Step 5
Domain location prediction by sequence
Step 6
Data Warehouse
5Growth of Non-Redundant (NR) Database
6SCOP Superfamily Distribution in Arabidopsis
7Coverage of Structural Information
8Grid Middleware
MDS/NWS/Ganglia
SCP/GASS/SRB/FTP
SSH/GRAM/GASS PBS/Loadleveler/Condor
9APST Software Architecture
10APST Support of Batch Resources
11(No Transcript)
12(No Transcript)
13BWMS
14(No Transcript)
15Encyclopedia of Life A Global Collaborative
Project
BeSC
TiTech
CNIC
JLU
SDSC/US
BII
YMU/NTU
UFCG
MU
16Reassemble proteome, Data replication
iGAP Workflow
PAT-NR 1000 Genomes
Proteome Specific Benchmarking
iGAP Prestaging Execution Monitoring
Only unique sequences are processed
DBMS
iGAP WMS
17GridSpeed Architecture
18(No Transcript)
19GridSpeed and EOL
20Data Storage
21Compute Hosts
22Application Information
23Defining Parameters
24(No Transcript)
25(No Transcript)
26GridMonitor
27EOL Workflow Web Interface
28(No Transcript)
29EOL Book Interface
30EOL Workflow
BWMS
Users
36,164 proteins selected from 73 Proteomes
annotated during SC03
Status
Output
Tasks
Japst (Grail Lab)
Local Cluster
Grid Speed (Titech)
Prediction
Loading
Genome Database
GridMonitor (BII)
External Data Source
Web Services
Status update
Job Status Database
iGAP tasks
31EOL User Interfaces
- Workflow user interface
- Task distribution using APST (GRAIL lab, UCSD)
- Task submission from the web using GridSpeed
(TiTech) - Graphical interface to Job status database (BII)
- Job status database engine (SDSC)
- Annotation user interface
- Relational database backend
- Integration of additional resources
- Query session tracking
- Annotation quality validated in several studies
- iGAP user interface
- Java applications (jAPST)
- Command line
- For more info, visit http//eol.sdsc.edu
32PRAGMA Partners
- BII (Singapore)
- Grid middleware deployment
- Workflow web interface
- http//blast.bii-sg.org8090/eol/
- Resource sharing
- Viper cluster-BII
- Scientific Exchange
- Titech (Japan)
- Condor pool
- GridSpeed
- A grid portal environment for speedy deployment
of applications - http//www.gridspeed.org
- New partners
- Australia, Brazil, Ireland, China.
33SDSC/UCSD Partners
- Integrative Biosciences Department
- ROCKS
- Rocks 3.1 Grid Roll
- The EOL cluster
- http//saxicolous.sdsc.edu/ganglia/
- DAKS
- DataStar
- Advanced Database Laboratory
- SRB/Data Matrix
- UCSD
- Life Sciences Initiatives
- Campus collaborations
34Acknowledgement
Acknowledgement
- SDSC
- Fran Berman
- Director
- Philip E. Bourne
- Mark Miller
- Project Coordinator
- Ilya N. Shindyalov
- CE
- Greg Quinn
- Web service
- Coleman Mosley
- Vicente Reyes
- Robert Byrnes
- Kim Baldrige
- iCC Director
- Jerry Greenberg
- CE portal
- SDSC
- Philip Papadoplous
- Rocks
- Mason Katz
- Greg Bruno
- Chaitan Baru
- David Archbell
- Adam Birnbaum
- UCSD
- Peter Arzberger
- PRAGMA
- Henri Casanova
- Jim Hayes
- Ceres Inc.
- Nickolai Alexandrov
- 123D
- Richard Flavell
35Acknowledgment
- BII, Singapore
- Larry Ang
- Kishore Sakharkar
- Arun Krishnan
- Atif Shahab
- Other BII members
- Titech, Japan
- Satoshi Matsuoka
- Toyotaro Suzumura
- Kouji Tanaka
- University of Monash, Australia
- David Abramson
- Colin Enticott
- Univ. Federal de Campina Grande, Brazil
- Zane Cirne,
- Eliane Cristina de Araujo
- Queen's University, UK
- Terence J Harmer
- David R Simpson